1. What is Support Vector Regression?¶
- SVR is a type of machine learning method used to predict continuous values (like prices) based on input features (like size, number of rooms). It’s based on the concept of support vectors, which help in defining the best prediction model.
2. Basic Idea of SVR¶
- Instead of trying to find the best line (like in linear regression), SVR tries to find a “tube” around a line that can fit most of the data points within a certain margin (called epsilon, ε). The goal is to minimize the errors while keeping the model simple.
3. How SVR Works¶
- Epsilon Tube: Imagine a tube around the prediction line. Data points that fall inside this tube are not considered errors. Only points outside this tube are counted as errors.
- Support Vectors: The data points that lie outside the tube are called support vectors. These points are crucial because they influence the position of the tube and the prediction line.
4. Collect Your Data¶
- Gather the dataset you want to work with. For example, if you’re predicting house prices:
- Features might include size, number of bedrooms, and age of the house.
- The target variable is the price of the house.
5. Split the Data¶
- Divide your data into:
- Training set: For training the model.
- Test set: For checking how well the model works on new data.
6. Choose the Kernel Function¶
- SVR can use different types of kernel functions to transform the input data. Common kernels include:
- Linear: Straight line; good for simple relationships.
- Polynomial: Curved line; captures more complex relationships.
- Radial Basis Function (RBF): A flexible function that can fit many shapes.
7. Train the SVR Model¶
- Use a machine learning library to create and train the SVR model with your training data.
from sklearn.svm import SVR
# Create the SVR model
model = SVR(kernel='rbf') # Use RBF kernel for flexibility
model.fit(X_train, Y_train) # Train the model
8. Make Predictions¶
- Use the trained model to make predictions on your test set or new data.
Y_pred = model.predict(X_test)
9. Evaluate the Model¶
- Check how well your model performed by comparing the predicted values to the actual values using metrics like:
- Mean Squared Error (MSE): Shows how far off your predictions are.
- R-squared (R²): Indicates how well the model explains the variability of the target variable.
from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(Y_test, Y_pred)
r2 = r2_score(Y_test, Y_pred)
print("Mean Squared Error:", mse)
print("R-squared:", r2)
10. Analyze Errors¶
- Look at the errors (differences between predicted and actual values). This helps you see if your model is performing well or if it needs adjustments.
11. Use the Model for Future Predictions¶
- After validating your model, you can use it to predict values for new inputs.
new_data = [[size, bedrooms, age]] # Example new data
predictions = model.predict(new_data)
12. Conclusion¶
- Summarize how well the SVR model performed and discuss the results. Mention how the support vectors influenced the predictions.
Let’s review example step by step.¶
Import the Libraries
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Make a list or Read Data
In [2]:
l = [[1,45],[2,51],[3,60],[4,80],[5,110],[6,150],[7,200],[8,240]]
l
Out[2]:
[[1, 45], [2, 51], [3, 60], [4, 80], [5, 110], [6, 150], [7, 200], [8, 240]]
In [3]:
df = pd.DataFrame(l,columns=['x','y'])
df
Out[3]:
x | y | |
---|---|---|
0 | 1 | 45 |
1 | 2 | 51 |
2 | 3 | 60 |
3 | 4 | 80 |
4 | 5 | 110 |
5 | 6 | 150 |
6 | 7 | 200 |
7 | 8 | 240 |
In [4]:
x = df.iloc[:,:1].values
x
Out[4]:
array([[1], [2], [3], [4], [5], [6], [7], [8]], dtype=int64)
In [5]:
y = df.iloc[:,1].values
y
Out[5]:
array([ 45, 51, 60, 80, 110, 150, 200, 240], dtype=int64)
Plot scatter x and y
In [6]:
plt.scatter(x,y)
plt.show()
Put Algorithm
In [8]:
from sklearn.svm import SVR
reg = SVR(kernel='rbf')
reg.fit(x,y)
Out[8]:
SVR()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
SVR()
Predict y
In [9]:
y_pred = reg.predict(x)
y_pred
Out[9]:
array([92.58372724, 92.117258 , 92.58298256, 94.04747182, 95.95252818, 97.41701744, 97.882742 , 97.41627276])
In [10]:
y
Out[10]:
array([ 45, 51, 60, 80, 110, 150, 200, 240], dtype=int64)
Plot scatter x and y
Plot line x and y predict
In [11]:
plt.scatter(x,y)
plt.plot(x,y_pred)
plt.show()
Check accuracy
In [12]:
reg.score(x,y)*100
Out[12]:
-4.342009418562265
In [13]:
reg.predict([[6.5]])
Out[13]:
array([97.78455023])
Standardization¶
In [15]:
x
Out[15]:
array([[1], [2], [3], [4], [5], [6], [7], [8]], dtype=int64)
In [16]:
y
Out[16]:
array([ 45, 51, 60, 80, 110, 150, 200, 240], dtype=int64)
Plot scatter x and y
In [17]:
plt.scatter(x,y)
plt.show()
Put Algorithm
We use a standard scaler to normalize features by removing the mean and scaling them to unit variance, which helps improve the performance of many machine learning algorithms.¶
In [18]:
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
sc_y = StandardScaler()
In [19]:
X = sc_x.fit_transform(x)
X
Out[19]:
array([[-1.52752523], [-1.09108945], [-0.65465367], [-0.21821789], [ 0.21821789], [ 0.65465367], [ 1.09108945], [ 1.52752523]])
In [20]:
Y = sc_y.fit_transform(y.reshape(-1,1)).reshape(-1)
Y
Out[20]:
array([-1.05424509, -0.96639133, -0.83461069, -0.54176484, -0.10249605, 0.48319567, 1.21531031, 1.80100203])
Plot scatter x and y
In [21]:
plt.scatter(X,Y)
plt.show()
Put algorithm
In [22]:
from sklearn.svm import SVR
reg = SVR(kernel='rbf')
reg.fit(X,Y)
Out[22]:
SVR()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
SVR()
Predict y
In [23]:
y_pred = reg.predict(X)
y_pred
Out[23]:
array([-0.95434326, -0.93171546, -0.73438987, -0.44178931, -0.00517628, 0.58239451, 1.11521213, 1.28275184])
Plot scatter x and y
Plot line x and y predict
In [24]:
plt.scatter(X,Y)
plt.plot(X,y_pred)
plt.show()
Predict future value of y
In [25]:
val = sc_x.transform([[6.5]])
val
Out[25]:
array([[0.87287156]])
In [26]:
val = reg.predict(val)
val
Out[26]:
array([0.87621202])
In [27]:
val = sc_y.inverse_transform([val])
val
Out[27]:
array([[176.84117556]])
In [28]:
round(val[0][0])
Out[28]:
177
In [ ]: