1. What is Linear Regression?¶

Linear regression helps us predict one thing (like house price) based on another (like house size). It finds a straight line that best fits the data.

2. The Equation of Linear Regression¶

The equation looks like this:

$Y = mX + b$

Where:
- ( Y ) is what you’re trying to predict (like price),
- ( X ) is the input (like size),
- ( m ) is the slope (how much Y changes when X changes),
- ( b ) is the intercept (the value of Y when X is zero).

3. Collect Data¶

You need some data to work with. For example:
- X could be the size of houses (in square feet),
- Y could be the price of those houses.

4. Plot the Data¶

Before doing anything, you can plot a graph of the data. This helps you see if there’s a trend (like bigger houses costing more).

5. Split Your Data¶

Divide your data into two parts:
- Training data: Used to train the model.
- Test data: Used to check how well the model works on new data.

6. Train the Model (Fit the Line)¶

Now, use the training data to teach the model. It will find the best line (with the best slope and intercept) that fits the data.

7. Check the Line’s Equation¶

After training, you’ll get values for:
- Slope (m): How steep the line is.
- Intercept (b): Where the line crosses the Y-axis. For example, if the line’s equation is:
$Price = 200 \times \text{Size} + 30,000$

This means for every 1 square foot increase in house size, the price increases by $200.

8. Make Predictions¶

Now that the line is ready, you can use it to predict house prices for new house sizes. Just plug the new house size into the equation.

9. Test the Model¶

Use the test data to see how well your model is doing. You compare the predicted prices with the actual prices to see how close they are.

10. Check for Errors¶

You measure the difference between the actual prices and the predicted prices. A smaller difference means the model is good.
Common ways to measure this are:
- Mean Squared Error (MSE): Shows how far off the predictions are.
- R-squared (R²): Tells you how well the line fits the data (higher is better).

11. Analyze the Errors (Residuals)¶

Residuals are the differences between actual and predicted values. You can check these to see if your model missed any patterns.

12. Use the Model for Future Predictions¶

Once you’ve trained and tested the model, you can use it to predict prices for new houses.

Let’s review example step by step.¶

Import the Libraries

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Make a list or Read the Data

In [2]:

L = [[1,3.5],[2,4],[3,4.5],[4,4],[5,5]]
print(L)

[[1, 3.5], [2, 4], [3, 4.5], [4, 4], [5, 5]]

Convert list into DataFrame

In [3]:

df = pd.DataFrame(L,columns=['x','y'])
df

Out[3]:

	x	y
0	1	3.5
1	2	4.0
2	3	4.5
3	4	4.0
4	5	5.0

In regression, the features column contains the independent variables or predictors that are used to model and predict the dependent variable (or outcome). These features provide the input data that the regression algorithm analyzes to understand and predict the relationships between variables.¶

Choose the column that contains variables you believe will influence and help predict the target data.

In [4]:

x = df['x'].values
x

Out[4]:

array([1, 2, 3, 4, 5], dtype=int64)

Select the column you want to predict, known as the target variable, which is the outcome you aim to model using the other features.

In [5]:

y = df['y'].values
y

Out[5]:

array([3.5, 4. , 4.5, 4. , 5. ])

Plot scatter on x and y

In [6]:

plt.scatter(x,y)
plt.show()

No description has been provided for this image

Formula to find slope & intercept

Calculate mean of x and y

In [7]:

mean_x = x.mean()
mean_x

Out[7]:

3.0

In [8]:

mean_y = y.mean()
mean_y

Out[8]:

4.2

Slope¶

In [9]:

(x[0] - mean_x) * (y[0]-mean_y)

Out[9]:

1.4000000000000004

In [10]:

(x[0] - mean_x)**2

Out[10]:

4.0

In [11]:

Num = 0
Den = 0
for i in range(len(x)):
    Num = Num + (x[i] - mean_x) * (y[i]-mean_y)
    Den = Den + (x[i] - mean_x)**2
m = Num/Den
print(m)

0.3

Intercept¶

In [12]:

c = mean_y - m*mean_x
c

Out[12]:

3.3000000000000003

Check Prediction of y

In [13]:

Out[13]:

array([3.5, 4. , 4.5, 4. , 5. ])

In [14]:

Out[14]:

array([1, 2, 3, 4, 5], dtype=int64)

Formula to prediction of y: yp = mx+c

In [15]:

yp = m*x[0]+c
yp

Out[15]:

3.6

In [16]:

y_pred = []
for i in range(len(x)):
    yp = m*x[i]+c
    y_pred.append(yp)

print(y_pred)

[3.6, 3.9000000000000004, 4.2, 4.5, 4.800000000000001]

Plot Scatter x and y

Plot line graph using x and y prediction

In [17]:

plt.scatter(x,y)
plt.plot(x,y_pred,color='k')
plt.show()

Calculate ESS

In [18]:

(y_pred[0]-mean_y)**2

Out[18]:

0.3600000000000001

Calculate RSS

In [19]:

(y[0]-mean_y)**2

Out[19]:

0.49000000000000027

Formula to find R2: RSS/TSS

In [20]:

Num = 0
Den = 0
for i in range(len(y)):
    Num = Num + (y_pred[i]-mean_y)**2
    Den = Den + (y[i]-mean_y)**2

R2 = Num/Den
R2

Out[20]:

0.6923076923076928

Check Accuracy

In [21]:

Acc = R2*100
Acc

Out[21]:

69.23076923076928

Check Prediction of y

In [22]:

yp = m*10+c
yp

Out[22]:

6.300000000000001

Using Scikit-Learn¶

Choose the column that contains variables you believe will influence and help predict the target data.

In [23]:

df.iloc[:,:1].values

Out[23]:

array([[1],
       [2],
       [3],
       [4],
       [5]], dtype=int64)

or¶

In [24]:

x = df.x.values.reshape(-1,1)
x

Out[24]:

array([[1],
       [2],
       [3],
       [4],
       [5]], dtype=int64)

Select the column you want to predict, known as the target variable, which is the outcome you aim to model using the other features.

In [25]:

Out[25]:

array([3.5, 4. , 4.5, 4. , 5. ])

Put Algorithm

In [26]:

from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(x,y)

Out[26]:

LinearRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Find co-efficent

In [27]:

reg.coef_

Out[27]:

array([0.3])

Find intercept

In [28]:

reg.intercept_

Out[28]:

3.3000000000000003

Check accuracy

In [29]:

reg.score(x,y)*100

Out[29]:

69.23076923076925

Predict y

In [30]:

y_pred = reg.predict(x)
y_pred

Out[30]:

array([3.6, 3.9, 4.2, 4.5, 4.8])

Plot Scatter using x and y

Plot line graph using x and y prediction

In [31]:

plt.scatter(x,y)
plt.plot(x,y_pred)
plt.show()

Predict future value

In [32]:

reg.predict([[10]])

Out[32]:

array([6.3])

Linear Regression

Related Articles

1. What is Linear Regression?¶

2. The Equation of Linear Regression¶

3. Collect Data¶

4. Plot the Data¶

5. Split Your Data¶

6. Train the Model (Fit the Line)¶

7. Check the Line’s Equation¶

8. Make Predictions¶

9. Test the Model¶

10. Check for Errors¶

11. Analyze the Errors (Residuals)¶

12. Use the Model for Future Predictions¶

Let’s review example step by step.¶

Import the Libraries

Make a list or Read the Data

Convert list into DataFrame

In regression, the features column contains the independent variables or predictors that are used to model and predict the dependent variable (or outcome). These features provide the input data that the regression algorithm analyzes to understand and predict the relationships between variables.¶

Choose the column that contains variables you believe will influence and help predict the target data.

Select the column you want to predict, known as the target variable, which is the outcome you aim to model using the other features.

Plot scatter on x and y

Formula to find slope & intercept

Calculate mean of x and y

Slope¶

Intercept¶

Check Prediction of y

Formula to prediction of y: yp = mx+c

Plot Scatter x and y

Plot line graph using x and y prediction

Calculate ESS

Calculate RSS

Formula to find R2: RSS/TSS

Check Accuracy

Check Prediction of y

Using Scikit-Learn¶

Choose the column that contains variables you believe will influence and help predict the target data.

or¶

Select the column you want to predict, known as the target variable, which is the outcome you aim to model using the other features.

Put Algorithm

Find co-efficent

Find intercept

Check accuracy

Predict y

Plot Scatter using x and y

Plot line graph using x and y prediction

Predict future value

Related

About Machine Learning

Check Also

Leave a Reply Cancel reply