Support Vector Classification

Support Vector

What is SVC?

The Support Vector Classifier (SVC) is a method used in machine learning to classify data into different groups. It helps to find the best line (or boundary) that separates different classes of data.

How SVC Works¶

Hyperplane:
- Imagine a line in 2D or a flat surface in 3D that divides different groups of data points. This line (in 2D) or surface (in 3D) is called a hyperplane.
- SVC tries to find the best hyperplane that separates the classes.
Support Vectors:
- Support vectors are the data points that are closest to the hyperplane. They are important because they help to define the position of the hyperplane.
Maximizing the Margin:
- SVC aims to maximize the distance between the closest points of each class (the support vectors) and the hyperplane. A larger distance (or margin) means the model is likely to make better predictions.
Soft Margin and Hard Margin:
- Hard Margin: This means no data points can be on the wrong side of the hyperplane. It works best when the data can be perfectly separated.
- Soft Margin: This allows some points to be on the wrong side of the hyperplane, which helps when the data is mixed up. You can control how much you allow this with a parameter called C.

Steps to Use SVC¶

Collect Data:
- Gather data that has labels (like “spam” or “not spam” for emails).
Choose a Kernel Function:
- Decide how to separate the data. Common options include:
  - Linear: Good for straight-line separations.
  - Polynomial: Useful for more curved separations.
  - RBF (Radial Basis Function): Good for very complex shapes.

Train the Model:

Use a library to train the SVC model on your training data. This step teaches the model how to classify new data.

from sklearn.svm import SVC

model = SVC(kernel='linear')  # Using a linear kernel
model.fit(X_train, Y_train)  # Training the model

Make Predictions:
- After training, you can use the model to predict classes for new data.
```
Y_pred = model.predict(X_test)  # Predicting classes
```

Evaluate the Model:

Check how well the model did by comparing its predictions to the actual labels using accuracy and confusion matrix.

from sklearn.metrics import accuracy_score, confusion_matrix

accuracy = accuracy_score(Y_test, Y_pred)  # How accurate is the model?
cm = confusion_matrix(Y_test, Y_pred)  # What mistakes did it make?

print("Accuracy:", accuracy)
print("Confusion Matrix:\n", cm)

Why Use SVC?¶

Good for High-Dimensional Data: It can handle data with many features well.
Handles Non-Linear Data: It can work with data that isn’t easily separated by a straight line.
Less Overfitting: By focusing on the support vectors and maximizing the margin, it’s less likely to fit too closely to the training data.

Conclusion¶

The Support Vector Classifier is a powerful tool for classifying data into different groups. It finds the best line or surface to separate the classes while focusing on the important points (support vectors). This helps in making accurate predictions, especially in complex situations.

If you have any more questions or need anything clarified, just let me know!

Let’s review example step by step.¶

1. Importing Libraries¶

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

NumPy: For numerical operations.
Pandas: For data manipulation and analysis.
Matplotlib: For data visualization.

2. Loading the Dataset¶

In [2]:

data = pd.read_csv('Social_Network_Ads.csv')
data

Out[2]:

	User ID	Gender	Age	EstimatedSalary	Purchased
0	15624510	Male	19	19000	0
1	15810944	Male	35	20000	0
2	15668575	Female	26	43000	0
3	15603246	Female	27	57000	0
4	15804002	Male	19	76000	0
…	…	…	…	…	…
395	15691863	Female	46	41000	1
396	15706071	Male	51	23000	1
397	15654296	Female	50	20000	1
398	15755018	Male	36	33000	0
399	15594041	Female	49	36000	1

400 rows × 5 columns

This line reads the dataset from a CSV file into a Pandas DataFrame named data.

3. Preparing Features and Target Variables¶

In [3]:

X = data.iloc[:, 2:4].values  # Features: Age and Salary
y = data.iloc[:, 4].values      # Target variable: Purchased (Yes/No)

X: Contains the feature columns (Age and Salary).
y: Contains the target variable indicating whether the user purchased the product.

4. Visualizing the Data¶

In [4]:

plt.title("Social Network Ads by Age and Salary")
plt.xlabel("Age")
plt.ylabel("Salary")

plt.scatter(X[y==0, 0], X[y==0, 1], label='No')  # Users who did not purchase
plt.scatter(X[y==1, 0], X[y==1, 1], label='Yes')  # Users who purchased

plt.legend()
plt.show()

No description has been provided for this image

This section visualizes the data points by plotting Age against Salary.
Different colors represent users who did not purchase and those who did.

5. Splitting the Dataset into Training and Test Sets¶

In [5]:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

The dataset is split into training (75%) and testing (25%) sets to evaluate the model’s performance.

6. Feature Scaling¶

In [6]:

from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
X_train = sc_x.fit_transform(X_train)  # Fit and transform the training data
X_test = sc_x.transform(X_test)        # Transform the test data

StandardScaler is used to standardize the features, ensuring they have a mean of 0 and a standard deviation of 1, which is important for SVC performance.

7. Creating and Training the Support Vector Classifier¶

In [7]:

from sklearn.svm import SVC
classifier = SVC(kernel='rbf')  # Initialize the SVC with a Radial Basis Function kernel
classifier.fit(X_train, y_train)  # Train the classifier

Out[7]:

SVC()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

The Support Vector Classifier is initialized with the RBF kernel (a popular choice) and trained using the training data.

8. Making Predictions¶

In [8]:

y_pred = classifier.predict(X_test)  # Predicting the test set results

Predictions are made for the test set using the trained model.

9. Evaluating the Model¶

In [9]:

classifier.score(X_test, y_test) * 100  # Model accuracy in percentage

Out[9]:

93.0

In [10]:

from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred) * 100  # Accuracy score calculation

Out[10]:

93.0

In [11]:

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)  # Confusion matrix
cm

Out[11]:

array([[64,  4],
       [ 3, 29]], dtype=int64)

Model Accuracy: The overall accuracy of the model on the test set is calculated.
Confusion Matrix: This matrix shows the performance of the classifier by comparing predicted and actual values.

10. Visualizing the Decision Boundary¶

In [12]:

X_set, y_set = X_test, y_test  # Using the test set for visualization

In [13]:

plt.title("Social Network Ads by Age and Salary")
plt.xlabel("Age")
plt.ylabel("Salary")

X1 = np.arange(X_set[:, 0].min()-1, X_set[:, 0].max()+1, 0.01)  # Range for Age
X2 = np.arange(X_set[:, 1].min()-1, X_set[:, 1].max()+1, 0.01)  # Range for Salary

xx, yy = np.meshgrid(X1, X2)  # Create a grid of values
X3 = np.array([xx.ravel(), yy.ravel()]).T  # Combine grid values

zz = classifier.predict(X3).reshape(xx.shape)  # Predictions for the grid
plt.contourf(xx, yy, zz)  # Plotting the decision boundary

plt.scatter(X_set[y_set == 0, 0], X_set[y_set == 0, 1], label='No')  # Users who did not purchase
plt.scatter(X_set[y_set == 1, 0], X_set[y_set == 1, 1], label='Yes')  # Users who purchased

plt.legend()
plt.show()

This section visualizes the decision boundary of the classifier.
A grid is created to show areas where the model predicts different outcomes (purchased vs. not purchased).

Conclusion¶

This code effectively demonstrates the process of using a Support Vector Classifier to predict user behavior based on age and salary. It includes data loading, preprocessing, model training, prediction, evaluation, and visualization of results. If you have any questions or need further clarification on any part, feel free to ask!

Machine Learning Tutorials, Courses and Certifications

Support Vector Classification

Related Articles

How SVC Works¶

Steps to Use SVC¶

Why Use SVC?¶

Conclusion¶

Let’s review example step by step.¶

1. Importing Libraries¶

2. Loading the Dataset¶

3. Preparing Features and Target Variables¶

4. Visualizing the Data¶

5. Splitting the Dataset into Training and Test Sets¶

6. Feature Scaling¶

7. Creating and Training the Support Vector Classifier¶

8. Making Predictions¶

9. Evaluating the Model¶

10. Visualizing the Decision Boundary¶

Conclusion¶

Related

About Machine Learning

Check Also

Introduction to XGBoost Classifier

Leave a Reply Cancel reply

OpenCV Python Project for Bus Detection from an Image

Multiple Linear Regression:

Microsoft AI Classroom Series Assessment Answers

Polynomial Regression

Support Vector Regression

Business Intelligence & Data Analytics Certification

Mathematical Optimization for Business Problems Cognitive class Final Exam Answers:-

Support Vector Classification

Accuracy Score

Introduction to XGBoost Regression

OpenCV Python Project for Bus Detection from an Image

OpenCV Python Project for Vehicle Detection From an Image

OpenCV Python Project for Vehicle Detection in a Video frame

Airline Quality Service

Airport Quality Service