Thursday , September 12 2024

K Nearest Neighbor Classification – KNN

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm used for classification and regression tasks. In this example, I’ll provide a step-by-step guide for implementing KNN classification in Python using Scikit-Learn:

Step 1: Import Libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix

Step 2: Prepare Your Data
Make sure your dataset contains features (X) and the corresponding target labels (y). Ensure your data is in a NumPy array or a DataFrame.

Step 3: Split Data into Training and Testing Sets
Split your data into training and testing sets to evaluate the model’s performance.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Step 4: Choose the Value of K (Number of Neighbors)
You need to choose the value of K, which represents the number of nearest neighbors used to classify a data point. You can experiment with different values to find the best K for your dataset.

Step 5: Create the KNN Classifier

k = 5  # Example value for K (you can experiment with different values)
classifier = KNeighborsClassifier(n_neighbors=k)

Step 6: Train the KNN Classifier

classifier.fit(X_train, y_train)

Step 7: Make Predictions

y_pred = classifier.predict(X_test)

Step 8: Evaluate the Model
Evaluate the model’s performance using classification metrics such as accuracy, precision, recall, F1-score, and the confusion matrix.

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f'Accuracy: {accuracy}')
print(f'Precision: {precision}')
print(f'Recall: {recall}')
print(f'F1-Score: {f1}')

confusion = confusion_matrix(y_test, y_pred)
print('Confusion Matrix:')
print(confusion)

Step 9: Visualize Results (Optional)
Depending on the number of features in your dataset, you can visualize the decision boundary to understand how the KNN classifier separates different classes.

# Example visualization for a two-feature dataset
plt.scatter(X_test[y_test == 0][:, 0], X_test[y_test == 0][:, 1], color='red', label='Class 0')
plt.scatter(X_test[y_test == 1][:, 0], X_test[y_test == 1][:, 1], color='blue', label='Class 1')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('K-Nearest Neighbors Classifier (K=5)')
plt.legend()
plt.show()

Remember to experiment with different values of K and evaluate the model’s performance using cross-validation techniques to find the best K for your specific dataset. Additionally, data preprocessing and feature scaling can be essential for improving KNN’s performance.

About Machine Learning

Check Also

Naive Bayes Classification

Naive Bayes is a simple yet effective supervised machine learning algorithm commonly used for classification …

Leave a Reply

Your email address will not be published. Required fields are marked *