LDA

1. What is Dimensionality Reduction?¶

In Machine Learning and Statistic, Dimensionality Reduction the process of reducing the number of random variables under consideration via obtaining a set of principal variables. It can be divided into feature selection and feature extraction.

We will deal with two main algorithms in Dimensionality Reduction¶

Principle Component Analysis (PCA)
Linear Discriminant Analysis (LDA)

2: How Dimensionality Reduction Algorithms Work?¶

2.1: Principal Component Analysis (PCA).¶

2.1.1: What is Principle Component Analysis (PCA)?¶

If you’ have worked with a lot of variables before, you know this can present problems. Do you understand the relationship between each variable? Do you have so many variables that you are in danger of overwriting your model to your data or that you might be violating the assumptions of whichever modeling tactic you’re using?

You might ask the question “how do I take all of the variables. I’ve collected and focused on only a few of them? In technical terms, you want to “reduce the dimension of your feature space. By reducing the dimension of your feature space, you have fewer relationships between variables to consider and less likely to overheat your model.

Somewhat unsurprisingly, reducing the dimension of the feature space is called “dimensionality reduction” There are many ways to achieve dimensionality reduction, but most of the techniques fall into one of two classes.

Feature Elimination
Feature extraction

Feature Elimination: we reduce the feature space by elimination feature. The advantages of the feature elimination method include simplicity and maintainability features. We’ve also eliminated any benefits those dropped variables would bring.

Feature Extraction: PCA is a technique for feature extraction. So it combines our input variables in a specific way, then we can drop the “least important” variables while still retaining the most valuable parts of all the variables.

LDA (Linear Discriminant Analysis) Scikit-Learn¶

Linear Discriminant Analysis is a supervised algorithm as it takes the class label into consideration. It is a way to reduce ‘dimensionality’ while at the same time preserving as much of the class discrimination information as possible.
LDA helps you find the boundaries around clusters of classes. It projects your data points on a line so that your clusters are as separated as possible, with each cluster having a relative (close) distance to a centroid.

In order to perform LDA we need to do the following:

LDA Steps¶

Let’s perform the LDA on wine dataset and analyse the graph:

Loading the wine dataset to the memory and performing feature extraction.
Scaling the dataset – Can use min-max normalization for scaling the dataset with mean zero and a unit standard deviation.
Apply LDA with inbuilt LDA function in sklearn
from sklearn.discriminant_analysis import LDA
lda = LDA(n_components=2)
X_feature_reduced = lda.fit(X).transform(X)

To conclude, PCA performs better in case where number of samples per class is less.
Whereas LDA works better with large dataset having multiple classes; class separability is an important factor while reducing dimensionality.

Let’s review Practically ¶

In [1]:

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:

# Importing the dataset
dataset = pd.read_csv('Wine.csv')
dataset.head()

Out[2]:

	Alcohol	Malic_Acid	Ash	Ash_Alcanity	Magnesium	Total_Phenols	Flavanoids	Nonflavanoid_Phenols	Proanthocyanins	Color_Intensity	Hue	OD280	Proline	Customer_Segment
0	14.23	1.71	2.43	15.6	127	2.80	3.06	0.28	2.29	5.64	1.04	3.92	1065	1
1	13.20	1.78	2.14	11.2	100	2.65	2.76	0.26	1.28	4.38	1.05	3.40	1050	1
2	13.16	2.36	2.67	18.6	101	2.80	3.24	0.30	2.81	5.68	1.03	3.17	1185	1
3	14.37	1.95	2.50	16.8	113	3.85	3.49	0.24	2.18	7.80	0.86	3.45	1480	1
4	13.24	2.59	2.87	21.0	118	2.80	2.69	0.39	1.82	4.32	1.04	2.93	735	1

In [3]:

X = dataset.iloc[:, 0:13].values
y = dataset.iloc[:, 13].values

In [4]:

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [5]:

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [6]:

# Applying LDA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components = 2)
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)

In [7]:

# Fitting Logistic Regression to the Training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

Out[7]:

LogisticRegression(random_state=0)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [8]:

# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred

Out[8]:

array([1, 3, 2, 1, 2, 2, 1, 3, 2, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 1, 2, 1,
       1, 2, 2, 2, 2, 2, 2, 3, 1, 1, 2, 1, 1, 1], dtype=int64)

In [9]:

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
cm

Out[9]:

array([[14,  0,  0],
       [ 0, 16,  0],
       [ 0,  0,  6]], dtype=int64)

In [10]:

# Visualising the Training set results
X_set, y_set = X_train, y_train

plt.scatter(X_set[y_set == 1, 0], X_set[y_set == 1, 1],label = 0,color='k')
plt.scatter(X_set[y_set == 2, 0], X_set[y_set == 2, 1],label = 1,color='red')
plt.scatter(X_set[y_set == 3, 0], X_set[y_set == 3, 1],label = 2,color='green')


X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))


Z =classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape)
print(Z)

plt.contourf(X1, X2, Z,alpha=0.2)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.legend()
plt.show()

[[2 2 2 ... 2 2 2]
 [2 2 2 ... 2 2 2]
 [2 2 2 ... 2 2 2]
 ...
 [1 1 1 ... 3 3 3]
 [1 1 1 ... 3 3 3]
 [1 1 1 ... 3 3 3]]

No description has been provided for this image

In [11]:

# Visualising the Training set results
X_set, y_set = X_test, y_test

plt.scatter(X_set[y_set == 1, 0], X_set[y_set == 1, 1],label = 0,color='k')
plt.scatter(X_set[y_set == 2, 0], X_set[y_set == 2, 1],label = 1,color='red')
plt.scatter(X_set[y_set == 3, 0], X_set[y_set == 3, 1],label = 2,color='green')


X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))


Z =classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape)
print(Z)

plt.contourf(X1, X2, Z,alpha=0.2)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.legend()
plt.show()

[[2 2 2 ... 2 2 2]
 [2 2 2 ... 2 2 2]
 [2 2 2 ... 2 2 2]
 ...
 [1 1 1 ... 3 3 3]
 [1 1 1 ... 3 3 3]
 [1 1 1 ... 3 3 3]]

In [ ]:

Machine Learning Tutorials, Courses and Certifications

LDA

Related Articles

1. What is Dimensionality Reduction?¶

We will deal with two main algorithms in Dimensionality Reduction¶

2: How Dimensionality Reduction Algorithms Work?¶

2.1: Principal Component Analysis (PCA).¶

2.1.1: What is Principle Component Analysis (PCA)?¶

LDA (Linear Discriminant Analysis) Scikit-Learn¶

LDA Steps¶

Let’s review Practically ¶

Related

About Machine Learning

Leave a Reply Cancel reply

OpenCV Python Project for Bus Detection from an Image

Multiple Linear Regression:

Microsoft AI Classroom Series Assessment Answers

Polynomial Regression

Support Vector Regression

Missing Data In Pandas – Data Science Tutorials

Bing Accreditation Exam Answers 2023

Web Filter Quiz Answers NSE 2 Information Security Awareness Fortinet

Operators

ChatGPT Certification

OpenCV Python Project for Bus Detection from an Image

OpenCV Python Project for Vehicle Detection From an Image

OpenCV Python Project for Vehicle Detection in a Video frame

Airline Quality Service

Airport Quality Service