1. What is Polynomial Regression?¶
- Polynomial Regression is a type of regression analysis that models the relationship between a dependent variable (like house price) and one or more independent variables (like size) as an nth degree polynomial. It helps capture relationships that are not linear (straight lines).
2. Why Use Polynomial Regression?¶
- Sometimes, the relationship between the variables is curved, not straight. Polynomial regression can fit curves to the data, making it more flexible than simple linear regression.
3. The Polynomial Equation¶
- The equation looks like this:
[
Y = \beta_0 + \beta_1X + \beta_2X^2 + \beta_3X^3 + … + \beta_nX^n
]
Where:
- ( Y ): The target variable you want to predict (like price).
- ( X ): The independent variable (like size).
- ( \beta_0 ): The intercept (the starting value of Y when X is zero).
- ( \beta_1, \beta_2, … ): Coefficients that show how much Y changes as X changes, with different powers of X.
4. Collect Your Data¶
- Gather the dataset you want to analyze. For example, if you’re predicting house prices:
- Features might include size (square feet), number of bedrooms, etc.
- The target variable is the price of the house.
5. Split the Data¶
- Divide your dataset into:
- Training set: For training the polynomial regression model.
- Test set: For checking how well the model performs on new data.
6. Prepare the Data¶
- To use polynomial regression, you need to create polynomial features from your independent variables. This means you’ll generate new features that are powers of the original features.
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2) # Change degree for more curves
X_poly = poly.fit_transform(X_train) # Create polynomial features for training data
7. Train the Polynomial Regression Model¶
- Use a machine learning library to create and train the polynomial regression model with your training data.
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_poly, Y_train) # Train the model with polynomial features
8. Make Predictions¶
- For predictions, you also need to create polynomial features for your test set or new data.
X_test_poly = poly.transform(X_test) # Transform test data
Y_pred = model.predict(X_test_poly) # Make predictions
9. Evaluate the Model¶
- Check how well your model performed by comparing the predicted values to the actual values using metrics like:
- Mean Squared Error (MSE): Measures the average of the squares of the errors (how far off your predictions are).
- R-squared (R²): Indicates how well the model explains the variability in the target variable.
from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(Y_test, Y_pred)
r2 = r2_score(Y_test, Y_pred)
print("Mean Squared Error:", mse)
print("R-squared:", r2)
10. Visualize the Results (Optional)¶
- You can visualize the polynomial regression curve to see how well it fits the data. This helps to understand the model better.
import matplotlib.pyplot as plt
import numpy as np
# Create a range of values for X for plotting
X_range = np.linspace(min(X_train), max(X_train), 100).reshape(-1, 1)
X_range_poly = poly.transform(X_range) # Transform the range for plotting
Y_range_pred = model.predict(X_range_poly) # Predictions for the range
plt.scatter(X_train, Y_train, color='blue') # Original data points
plt.plot(X_range, Y_range_pred, color='red') # Polynomial curve
plt.title('Polynomial Regression')
plt.xlabel('Size')
plt.ylabel('Price')
plt.show()
11. Conclusion¶
- Summarize how well the polynomial regression model performed and discuss the shape of the curve. Mention how polynomial regression can capture more complex relationships compared to linear regression.
Let’s review example step by step.¶
Polynomial Regression¶
Import the Libraries
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Make a list or Read Data
In [2]:
l = [[1,45],[2,51],[3,60],[4,80],[5,110],[6,150],[7,200],[8,240]]
l
Out[2]:
[[1, 45], [2, 51], [3, 60], [4, 80], [5, 110], [6, 150], [7, 200], [8, 240]]
Covert List into DataFrame
In [3]:
df = pd.DataFrame(l,columns=['x','y'])
df
Out[3]:
x | y | |
---|---|---|
0 | 1 | 45 |
1 | 2 | 51 |
2 | 3 | 60 |
3 | 4 | 80 |
4 | 5 | 110 |
5 | 6 | 150 |
6 | 7 | 200 |
7 | 8 | 240 |
Put the value of x
In [4]:
x = df.iloc[:,:1].values
x
Out[4]:
array([[1], [2], [3], [4], [5], [6], [7], [8]], dtype=int64)
Put the value of y
In [5]:
y = df.iloc[:,1].values
y
Out[5]:
array([ 45, 51, 60, 80, 110, 150, 200, 240], dtype=int64)
Plot scatter x and y
In [6]:
plt.scatter(x,y)
plt.show()
Put algorithm
In [7]:
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(x,y)
Out[7]:
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
Predict y
In [8]:
y_pred = reg.predict(x)
y_pred
Out[8]:
array([ 16.58333333, 45.27380952, 73.96428571, 102.6547619 , 131.3452381 , 160.03571429, 188.72619048, 217.41666667])
In [9]:
y
Out[9]:
array([ 45, 51, 60, 80, 110, 150, 200, 240], dtype=int64)
Plot scatter x and y
Plot line x and y predict
In [10]:
plt.scatter(x,y)
plt.plot(x,y_pred)
plt.show()
Check Accuracy
In [11]:
reg.score(x,y)*100
Out[11]:
92.65161550496813
check the future value of y
In [12]:
reg.predict([[2]])
Out[12]:
array([45.27380952])
Using Polynomial Regression¶
In [13]:
x
Out[13]:
array([[1], [2], [3], [4], [5], [6], [7], [8]], dtype=int64)
In [14]:
y
Out[14]:
array([ 45, 51, 60, 80, 110, 150, 200, 240], dtype=int64)
Put Algorithm
In [15]:
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
X = poly.fit_transform(x)
X
Out[15]:
array([[ 1., 1., 1.], [ 1., 2., 4.], [ 1., 3., 9.], [ 1., 4., 16.], [ 1., 5., 25.], [ 1., 6., 36.], [ 1., 7., 49.], [ 1., 8., 64.]])
Put Algorithm
In [16]:
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(X,y)
Out[16]:
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
Predict y
In [17]:
y_pred = reg.predict(X)
y_pred
Out[17]:
array([ 44.33333333, 49.23809524, 62.07142857, 82.83333333, 111.52380952, 148.14285714, 192.69047619, 245.16666667])
In [18]:
y
Out[18]:
array([ 45, 51, 60, 80, 110, 150, 200, 240], dtype=int64)
Plot scatter x and y
Plot line x and y predict
In [19]:
plt.scatter(x,y)
plt.plot(x,y_pred)
plt.show()
Check accuracy
In [20]:
reg.score(X,y)*100
Out[20]:
99.72728224054805
check the future value of y
In [21]:
val = poly.transform([[6.5]])
val
Out[21]:
array([[ 1. , 6.5 , 42.25]])
In [22]:
reg.predict(val)
Out[22]:
array([169.42559524])