Accuracy Score is a metric used to evaluate the performance of a classification model. It measures the percentage of correct predictions made by the model out of all predictions. In other words, it tells you how often the model correctly classifies data points.
The formula for accuracy is:
${Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}$
Example:¶
If a model correctly predicts 90 out of 100 data points, the accuracy score would be:
${Accuracy} = \frac{90}{100} = 0.90 \text{ or } 90\%$
Usefulness:¶
Accuracy is easy to understand and use, but it can be misleading if the data is imbalanced. For example, if a model predicts 95% of cases as one class in a dataset where 95% of the data belongs to that class, it might still achieve high accuracy despite poor performance on the minority class.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('Social_Network_Ads.csv')
data
User ID | Gender | Age | EstimatedSalary | Purchased | |
---|---|---|---|---|---|
0 | 15624510 | Male | 19.0 | 19000.0 | 0 |
1 | 15810944 | Male | 35.0 | 20000.0 | 0 |
2 | 15668575 | Female | 26.0 | 43000.0 | 0 |
3 | 15603246 | Female | 27.0 | 57000.0 | 0 |
4 | 15804002 | Male | 19.0 | 76000.0 | 0 |
… | … | … | … | … | … |
395 | 15691863 | Female | 46.0 | 41000.0 | 1 |
396 | 15706071 | Male | 51.0 | 23000.0 | 1 |
397 | 15654296 | Female | 50.0 | 20000.0 | 1 |
398 | 15755018 | Male | 36.0 | 33000.0 | 0 |
399 | 15594041 | Female | 49.0 | 36000.0 | 1 |
400 rows × 5 columns
X = data.iloc[:, [2, 3]].values
y = data.iloc[:, 4].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)
KNeighborsClassifier()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KNeighborsClassifier()
y_pred = classifier.predict(X_test)
y_pred
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1], dtype=int64)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
cm
array([[64, 4], [ 3, 29]], dtype=int64)
# Visualising the Training set results
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
Z =classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape)
plt.contourf(X1, X2, Z)
plt.scatter(X_set[y_set == 0, 0], X_set[y_set == 0, 1],label = 0)
plt.scatter(X_set[y_set == 1, 0], X_set[y_set == 1, 1],label = 1)
plt.title('K-Nearest Neighbors (K-NN) (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
from sklearn.metrics import accuracy_score
The confusion matrix for sklearn is as follows:
- TN | FP
- FN | TP
cm
array([[64, 4], [ 3, 29]], dtype=int64)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
accuracy
0.93