**1. What is Linear Regression?**¶

- Linear regression helps us predict one thing (like house price) based on another (like house size). It finds a straight line that best fits the data.

**2. The Equation of Linear Regression**¶

The equation looks like this:

$Y = mX + b$

Where:

- ( Y ) is what you’re trying to predict (like price),
- ( X ) is the input (like size),
- ( m ) is the slope (how much Y changes when X changes),
- ( b ) is the intercept (the value of Y when X is zero).

**3. Collect Data**¶

- You need some data to work with. For example:
- X could be the size of houses (in square feet),
- Y could be the price of those houses.

**4. Plot the Data**¶

- Before doing anything, you can plot a graph of the data. This helps you see if there’s a trend (like bigger houses costing more).

**5. Split Your Data**¶

- Divide your data into two parts:
**Training data**: Used to train the model.**Test data**: Used to check how well the model works on new data.

**6. Train the Model (Fit the Line)**¶

- Now, use the training data to teach the model. It will find the best line (with the best slope and intercept) that fits the data.

**7. Check the Line’s Equation**¶

After training, you’ll get values for:

**Slope (m)**: How steep the line is.**Intercept (b)**: Where the line crosses the Y-axis. For example, if the line’s equation is:

$Price = 200 \times \text{Size} + 30,000$

This means for every 1 square foot increase in house size, the price increases by $200.

**8. Make Predictions**¶

- Now that the line is ready, you can use it to predict house prices for new house sizes. Just plug the new house size into the equation.

**9. Test the Model**¶

- Use the test data to see how well your model is doing. You compare the predicted prices with the actual prices to see how close they are.

**10. Check for Errors**¶

- You measure the difference between the actual prices and the predicted prices. A smaller difference means the model is good.
- Common ways to measure this are:
**Mean Squared Error (MSE)**: Shows how far off the predictions are.**R-squared (R²)**: Tells you how well the line fits the data (higher is better).

**11. Analyze the Errors (Residuals)**¶

- Residuals are the differences between actual and predicted values. You can check these to see if your model missed any patterns.

**12. Use the Model for Future Predictions**¶

- Once you’ve trained and tested the model, you can use it to predict prices for new houses.

# Let’s review example step by step.¶

#### Import the Libraries

In [1]:

```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
```

#### Make a list or Read the Data

In [2]:

```
L = [[1,3.5],[2,4],[3,4.5],[4,4],[5,5]]
print(L)
```

[[1, 3.5], [2, 4], [3, 4.5], [4, 4], [5, 5]]

#### Convert list into DataFrame

In [3]:

```
df = pd.DataFrame(L,columns=['x','y'])
df
```

Out[3]:

x | y | |
---|---|---|

0 | 1 | 3.5 |

1 | 2 | 4.0 |

2 | 3 | 4.5 |

3 | 4 | 4.0 |

4 | 5 | 5.0 |

In [4]:

```
x = df['x'].values
x
```

Out[4]:

array([1, 2, 3, 4, 5], dtype=int64)

In [5]:

```
y = df['y'].values
y
```

Out[5]:

array([3.5, 4. , 4.5, 4. , 5. ])

#### Plot scatter on x and y

In [6]:

```
plt.scatter(x,y)
plt.show()
```

#### Formula to find slope & intercept

#### Calculate mean of x and y

In [7]:

```
mean_x = x.mean()
mean_x
```

Out[7]:

3.0

In [8]:

```
mean_y = y.mean()
mean_y
```

Out[8]:

4.2

### Slope¶

In [9]:

```
(x[0] - mean_x) * (y[0]-mean_y)
```

Out[9]:

1.4000000000000004

In [10]:

```
(x[0] - mean_x)**2
```

Out[10]:

4.0

In [11]:

```
Num = 0
Den = 0
for i in range(len(x)):
Num = Num + (x[i] - mean_x) * (y[i]-mean_y)
Den = Den + (x[i] - mean_x)**2
m = Num/Den
print(m)
```

0.3

### Intercept¶

In [12]:

```
c = mean_y - m*mean_x
c
```

Out[12]:

3.3000000000000003

### Check Prediction of y

In [13]:

```
y
```

Out[13]:

array([3.5, 4. , 4.5, 4. , 5. ])

In [14]:

```
x
```

Out[14]:

array([1, 2, 3, 4, 5], dtype=int64)

#### Formula to prediction of y: yp = mx+c

In [15]:

```
yp = m*x[0]+c
yp
```

Out[15]:

3.6

In [16]:

```
y_pred = []
for i in range(len(x)):
yp = m*x[i]+c
y_pred.append(yp)
print(y_pred)
```

[3.6, 3.9000000000000004, 4.2, 4.5, 4.800000000000001]

#### Plot Scatter x and y

#### Plot line graph using x and y prediction

In [17]:

```
plt.scatter(x,y)
plt.plot(x,y_pred,color='k')
plt.show()
```

#### Calculate ESS

In [18]:

```
(y_pred[0]-mean_y)**2
```

Out[18]:

0.3600000000000001

#### Calculate RSS

In [19]:

```
(y[0]-mean_y)**2
```

Out[19]:

0.49000000000000027

#### Formula to find R2: RSS/TSS

In [20]:

```
Num = 0
Den = 0
for i in range(len(y)):
Num = Num + (y_pred[i]-mean_y)**2
Den = Den + (y[i]-mean_y)**2
R2 = Num/Den
R2
```

Out[20]:

0.6923076923076928

#### Check Accuracy

In [21]:

```
Acc = R2*100
Acc
```

Out[21]:

69.23076923076928

#### Check Prediction of y

In [22]:

```
yp = m*10+c
yp
```

Out[22]:

6.300000000000001

### Using Scikit-Learn¶

In [23]:

```
df.iloc[:,:1].values
```

Out[23]:

array([[1], [2], [3], [4], [5]], dtype=int64)

### or¶

In [24]:

```
x = df.x.values.reshape(-1,1)
x
```

Out[24]:

array([[1], [2], [3], [4], [5]], dtype=int64)

In [25]:

```
y
```

Out[25]:

array([3.5, 4. , 4.5, 4. , 5. ])

#### Put Algorithm

In [26]:

```
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(x,y)
```

Out[26]:

LinearRegression()

**In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.**

On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

LinearRegression()

#### Find co-efficent

In [27]:

```
reg.coef_
```

Out[27]:

array([0.3])

#### Find intercept

In [28]:

```
reg.intercept_
```

Out[28]:

3.3000000000000003

#### Check accuracy

In [29]:

```
reg.score(x,y)*100
```

Out[29]:

69.23076923076925

#### Predict y

In [30]:

```
y_pred = reg.predict(x)
y_pred
```

Out[30]:

array([3.6, 3.9, 4.2, 4.5, 4.8])

#### Plot Scatter using x and y

#### Plot line graph using x and y prediction

In [31]:

```
plt.scatter(x,y)
plt.plot(x,y_pred)
plt.show()
```

#### Predict future value

In [32]:

```
reg.predict([[10]])
```

Out[32]:

array([6.3])