K-Fold Cross Validation

K-Fold Cross Validation is a method used in machine learning to assess the performance of a model by partitioning the data into K equal subsets (or folds). Here’s an outline of the process:

Steps:

Data Splitting: The dataset is divided into K equal subsets or “folds.”
Training and Validation: The model is trained on K-1 folds and validated on the remaining fold. This process is repeated K times, with a different fold being used as the validation set each time.
Performance Averaging: After all K iterations, the model’s performance metrics (such as accuracy, precision, or F1 score) from each fold are averaged to provide an overall evaluation.

Key Benefits:

More Reliable Estimation: By testing on multiple validation sets, K-Fold Cross Validation provides a better estimate of a model’s true performance on unseen data.
Efficient Use of Data: It makes full use of the dataset since every observation is used for both training and validation.
Reduces Bias and Variance: It reduces the variability of performance estimates that might occur if only a single train-test split is used.

Variants:

Stratified K-Fold: Ensures that each fold maintains the same class distribution as the entire dataset, which is particularly useful for imbalanced datasets.
Leave-One-Out Cross Validation (LOO): A special case where each fold consists of a single data point, leading to K being equal to the number of data points.

This technique is widely used to evaluate models across various machine learning tasks because it helps ensure that the model generalizes well to unseen data.

Let’s review Practically K-Fold Cross Validation

In [1]:

from sklearn.model_selection import KFold
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

In [2]:

iris = load_iris()
X = iris.data
y = iris.target

In [3]:

model = LogisticRegression()

In [4]:

kfold = KFold(n_splits=5,shuffle=True,random_state=42)

In [5]:

scores = []
for train,test in kfold.split(X):
    X_train, X_test = X[train],X[test]
    y_train, y_test = y[train],y[test]
    
    model.fit(X_train,y_train)
    y_pred = model.predict(X_test)
    
    accuracy = accuracy_score(y_test,y_pred)
    scores.append(accuracy)

In [6]:

scores

Out[6]:

[1.0, 1.0, 0.9333333333333333, 0.9666666666666667, 0.9666666666666667]

In [7]:

for fold,accuracy in enumerate(scores):
    print(fold+1,accuracy*100)

1 100.0
2 100.0
3 93.33333333333333
4 96.66666666666667
5 96.66666666666667

In [8]:

avg_accuracy = sum(scores)/len(scores) * 100
avg_accuracy

Out[8]:

97.33333333333334

Grid Search using K-Fold¶

In [9]:

from sklearn.model_selection import GridSearchCV, KFold
from sklearn.datasets import load_iris
from sklearn.svm import SVC

In [10]:

iris = load_iris()
X = iris.data
y = iris.target

In [11]:

kfold = KFold(n_splits=5,shuffle=True,random_state=42)

In [12]:

svm = SVC()

In [13]:

grid_param = {'C':[0.1,1,10],'gamma':[0.01,0.1,1],'kernel':['linear','poly','sigmoid','rbf']}

In [14]:

gs = GridSearchCV(estimator=svm,param_grid = grid_param,cv=kfold)

In [15]:

gs.fit(X,y)

Out[15]:

GridSearchCV(cv=KFold(n_splits=5, random_state=42, shuffle=True),
             estimator=SVC(),
             param_grid={'C': [0.1, 1, 10], 'gamma': [0.01, 0.1, 1],
                         'kernel': ['linear', 'poly', 'sigmoid', 'rbf']})

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [16]:

gs.best_params_

Out[16]:

{'C': 10, 'gamma': 0.01, 'kernel': 'rbf'}

In [17]:

gs.best_score_

Out[17]:

0.9800000000000001

In [18]:

svm = SVC(kernel='rbf',C=10,gamma=0.01)
svm.fit(X,y)

Out[18]:

SVC(C=10, gamma=0.01)

In [19]:

svm.score(X,y)*100

Out[19]:

98.0

In [ ]:

Machine Learning Tutorials, Courses and Certifications

K-Fold Cross Validation

Related Articles

Steps:

Key Benefits:

Variants:

Grid Search using K-Fold¶

Related

About Machine Learning

Check Also

Microsoft Shopping Advertising Certification Exam Answers

Leave a Reply Cancel reply

Multiple Linear Regression:

Microsoft AI Classroom Series Assessment Answers

Polynomial Regression

Support Vector Regression

Decision Tree Regression

Python Sets

Cloud Conference – App Security and Threat Modeler Lab Cognitive Class Exam Answers:-

Decision Making

Digital Analytics & Regression Cognitive class Exam Answers:-

IBM Block chain Foundation Developer cognitive class Exam Answers:-

OpenCV Python Project for Bus Detection from an Image

OpenCV Python Project for Vehicle Detection From an Image

OpenCV Python Project for Vehicle Detection in a Video frame

Airline Quality Service

Airport Quality Service