Grid search k fold cross validation

Machine Learning October 4, 2024 Grid Search, Machine Learning Comments Off 2,035 Views

grid search k fold cross validation

Grid Search with K-Fold Cross-Validation is a technique used in machine learning to help you find the best settings (hyperparameters) for your model. Here’s a simple breakdown:

1. Grid Search¶

What it is: Imagine you want to bake a cake, but you’re unsure about the best recipe. You have different ingredients (like flour, sugar, and baking time) that you can change. Grid search tests out all possible combinations of these ingredients to find the best cake.
In Machine Learning: Similarly, in grid search, you define a set of hyperparameters (like the learning rate, number of trees, etc.) and the algorithm tries out every combination of these parameters to see which one works best for your model.

2. K-Fold Cross-Validation¶

What it is: Think of it as sharing a pizza among friends. Instead of giving the entire pizza to one person and seeing if they like it, you cut it into equal slices (or “folds”) and share it with everyone. This way, each person gets a chance to try a slice and provide feedback.
In Machine Learning: K-fold cross-validation divides your dataset into “k” equal parts (or folds). For each fold:
- The model is trained on the remaining “k-1” parts.
- Then, it’s tested on the fold that was set aside.
This process is repeated “k” times, so every part of the data gets used for both training and testing. This helps ensure that the model performs well across different subsets of data.

3. Combining Both¶

When you combine grid search with k-fold cross-validation:

First, for each combination of hyperparameters in the grid search, the model is trained and validated using k-fold cross-validation.
Then, the best combination of hyperparameters is chosen based on how well the model performed across all the folds.

Why Use Them?¶

Better Accuracy: By trying out many combinations and validating them across different parts of the dataset, you increase the chances of finding a model that performs well.
Robustness: It helps in ensuring that the model’s performance is consistent and not just a fluke based on one particular split of the data.

Summary¶

In short, grid search with k-fold cross-validation is like experimenting with different cake recipes while making sure that every taste tester gives feedback from different slices of the pizza. This helps you find the best recipe (or model) that works well in general, not just for one specific case.

Let’s review Practically¶

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:

data = pd.read_csv('Social_Network_Ads.csv')
data

Out[2]:

	User ID	Gender	Age	EstimatedSalary	Purchased
0	15624510	Male	19.0	19000.0	0
1	15810944	Male	35.0	20000.0	0
2	15668575	Female	26.0	43000.0	0
3	15603246	Female	27.0	57000.0	0
4	15804002	Male	19.0	76000.0	0
…	…	…	…	…	…
395	15691863	Female	46.0	41000.0	1
396	15706071	Male	51.0	23000.0	1
397	15654296	Female	50.0	20000.0	1
398	15755018	Male	36.0	33000.0	0
399	15594041	Female	49.0	36000.0	1

400 rows × 5 columns

In [3]:

X = data.iloc[:,2:4].values
y = data.iloc[:,4].values

In [4]:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25,random_state=0)

In [5]:

from sklearn.preprocessing import StandardScaler
ss_x = StandardScaler()
X_train = ss_x.fit_transform(X_train)
X_test = ss_x.transform(X_test)

In [6]:

plt.scatter(X_train[:,0],X_train[:,1])
plt.show()

No description has been provided for this image

In [7]:

plt.scatter(X_train[y_train==0,0],X_train[y_train==0,1])
plt.scatter(X_train[y_train==1,0],X_train[y_train==1,1])
plt.show()

In [8]:

plt.scatter(X_test[y_test==0,0],X_test[y_test==0,1])
plt.scatter(X_test[y_test==1,0],X_test[y_test==1,1])
plt.show()

In [9]:

from sklearn.svm import SVC
classifier = SVC(kernel='linear',random_state=0)
classifier.fit(X_train,y_train)

Out[9]:

SVC(kernel='linear', random_state=0)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [10]:

y_pred = classifier.predict(X_test)
y_pred

Out[10]:

array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1], dtype=int64)

In [11]:

classifier.score(X_test,y_test)*100

Out[11]:

90.0

In [12]:

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test,y_pred)
cm

Out[12]:

array([[66,  2],
       [ 8, 24]], dtype=int64)

Cross Validation¶

In [13]:

from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator=classifier, X = X_train, y = y_train, cv=10)
accuracies

Out[13]:

array([0.76666667, 0.8       , 0.73333333, 0.83333333, 0.73333333,
       0.66666667, 0.83333333, 0.93333333, 0.96666667, 0.86666667])

In [14]:

accuracies.mean()

Out[14]:

0.8133333333333335

In [15]:

accuracies.std()

Out[15]:

0.08844332774281068

In [16]:

0.8133333333333335 + 0.08844332774281068

Out[16]:

0.9017766610761442

In [17]:

0.8133333333333335 - 0.08844332774281068

Out[17]:

0.7248900055905227

Grid Search¶

In [18]:

from sklearn.model_selection import GridSearchCV
parameters = [{'C':[1,10,100,1000],'kernel':['linear']},
              {'C':[1,10,100,1000],'kernel':['rbf'],'gamma':[0.6,0.7,0.8,0.9,1.0,1.1,1.2,1.3]}
             ]

In [19]:

gs = GridSearchCV(estimator=classifier,param_grid=parameters,scoring='accuracy',cv=10)
gs = gs.fit(X_train,y_train)

In [20]:

gs.best_score_

Out[20]:

0.9133333333333333

In [21]:

gs.best_params_

Out[21]:

{'C': 1, 'gamma': 1.2, 'kernel': 'rbf'}

In [22]:

from sklearn.svm import SVC
classifier = SVC(kernel='rbf',random_state=0,C=1,gamma=1.2)
classifier.fit(X_train,y_train)

Out[22]:

SVC(C=1, gamma=1.2, random_state=0)

In [23]:

classifier.score(X_test,y_test)*100

Out[23]:

93.0

Machine Learning Tutorials, Courses and Certifications

Grid search k fold cross validation

Related Articles

1. Grid Search¶

2. K-Fold Cross-Validation¶

3. Combining Both¶

Why Use Them?¶

Summary¶

Let’s review Practically¶

Cross Validation¶

Grid Search¶

Related

About Machine Learning

Check Also

Introduction to XGBoost Classifier

Multiple Linear Regression:

Microsoft AI Classroom Series Assessment Answers

Polynomial Regression

Support Vector Regression

Decision Tree Regression

Business Intelligence & Data Analytics Certification

Python Loops

Introduction to NumPy

Practically – ANN

FUNDAMENTALS OF DIGITAL MARKETING: MODULE 17 Quiz Answers

OpenCV Python Project for Bus Detection from an Image

OpenCV Python Project for Vehicle Detection From an Image

OpenCV Python Project for Vehicle Detection in a Video frame

Airline Quality Service

Airport Quality Service