1. What is Dimensionality Reduction?¶
In Machine Learning and Statistic, Dimensionality Reduction the process of reducing the number of random variables under consideration via obtaining a set of principal variables. It can be divided into feature selection and feature extraction.
We will deal with two main algorithms in Dimensionality Reduction¶
- Principle Component Analysis (PCA)
- Linear Discriminant Analysis (LDA)
2: How Dimensionality Reduction Algorithms Work?¶
2.1: Principal Component Analysis (PCA).¶
2.1.1: What is Principle Component Analysis (PCA)?¶
If you’ have worked with a lot of variables before, you know this can present problems. Do you understand the relationship between each variable? Do you have so many variables that you are in danger of overwriting your model to your data or that you might be violating the assumptions of whichever modeling tactic you’re using?
You might ask the question “how do I take all of the variables. I’ve collected and focused on only a few of them? In technical terms, you want to “reduce the dimension of your feature space. By reducing the dimension of your feature space, you have fewer relationships between variables to consider and less likely to overheat your model.
Somewhat unsurprisingly, reducing the dimension of the feature space is called “dimensionality reduction” There are many ways to achieve dimensionality reduction, but most of the techniques fall into one of two classes.
- Feature Elimination
- Feature extraction
Feature Elimination: we reduce the feature space by elimination feature. The advantages of the feature elimination method include simplicity and maintainability features. We’ve also eliminated any benefits those dropped variables would bring.
Feature Extraction: PCA is a technique for feature extraction. So it combines our input variables in a specific way, then we can drop the “least important” variables while still retaining the most valuable parts of all the variables.
LDA (Linear Discriminant Analysis) Scikit-Learn¶
- Linear Discriminant Analysis is a supervised algorithm as it takes the class label into consideration. It is a way to reduce ‘dimensionality’ while at the same time preserving as much of the class discrimination information as possible.
- LDA helps you find the boundaries around clusters of classes. It projects your data points on a line so that your clusters are as separated as possible, with each cluster having a relative (close) distance to a centroid.
In order to perform LDA we need to do the following:
LDA Steps¶
Let’s perform the LDA on wine dataset and analyse the graph:
- Loading the wine dataset to the memory and performing feature extraction.
- Scaling the dataset – Can use min-max normalization for scaling the dataset with mean zero and a unit standard deviation.
- Apply LDA with inbuilt LDA function in sklearn
- from sklearn.discriminant_analysis import LDA
- lda = LDA(n_components=2)
- X_feature_reduced = lda.fit(X).transform(X)
To conclude, PCA performs better in case where number of samples per class is less.
Whereas LDA works better with large dataset having multiple classes; class separability is an important factor while reducing dimensionality.
Let’s review Practically ¶
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Wine.csv')
dataset.head()
Alcohol | Malic_Acid | Ash | Ash_Alcanity | Magnesium | Total_Phenols | Flavanoids | Nonflavanoid_Phenols | Proanthocyanins | Color_Intensity | Hue | OD280 | Proline | Customer_Segment | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 14.23 | 1.71 | 2.43 | 15.6 | 127 | 2.80 | 3.06 | 0.28 | 2.29 | 5.64 | 1.04 | 3.92 | 1065 | 1 |
1 | 13.20 | 1.78 | 2.14 | 11.2 | 100 | 2.65 | 2.76 | 0.26 | 1.28 | 4.38 | 1.05 | 3.40 | 1050 | 1 |
2 | 13.16 | 2.36 | 2.67 | 18.6 | 101 | 2.80 | 3.24 | 0.30 | 2.81 | 5.68 | 1.03 | 3.17 | 1185 | 1 |
3 | 14.37 | 1.95 | 2.50 | 16.8 | 113 | 3.85 | 3.49 | 0.24 | 2.18 | 7.80 | 0.86 | 3.45 | 1480 | 1 |
4 | 13.24 | 2.59 | 2.87 | 21.0 | 118 | 2.80 | 2.69 | 0.39 | 1.82 | 4.32 | 1.04 | 2.93 | 735 | 1 |
X = dataset.iloc[:, 0:13].values
y = dataset.iloc[:, 13].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Applying LDA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components = 2)
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)
# Fitting Logistic Regression to the Training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)
LogisticRegression(random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression(random_state=0)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred
array([1, 3, 2, 1, 2, 2, 1, 3, 2, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 3, 1, 1, 2, 1, 1, 1], dtype=int64)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
cm
array([[14, 0, 0], [ 0, 16, 0], [ 0, 0, 6]], dtype=int64)
# Visualising the Training set results
X_set, y_set = X_train, y_train
plt.scatter(X_set[y_set == 1, 0], X_set[y_set == 1, 1],label = 0,color='k')
plt.scatter(X_set[y_set == 2, 0], X_set[y_set == 2, 1],label = 1,color='red')
plt.scatter(X_set[y_set == 3, 0], X_set[y_set == 3, 1],label = 2,color='green')
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
Z =classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape)
print(Z)
plt.contourf(X1, X2, Z,alpha=0.2)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.legend()
plt.show()
[[2 2 2 ... 2 2 2] [2 2 2 ... 2 2 2] [2 2 2 ... 2 2 2] ... [1 1 1 ... 3 3 3] [1 1 1 ... 3 3 3] [1 1 1 ... 3 3 3]]
# Visualising the Training set results
X_set, y_set = X_test, y_test
plt.scatter(X_set[y_set == 1, 0], X_set[y_set == 1, 1],label = 0,color='k')
plt.scatter(X_set[y_set == 2, 0], X_set[y_set == 2, 1],label = 1,color='red')
plt.scatter(X_set[y_set == 3, 0], X_set[y_set == 3, 1],label = 2,color='green')
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
Z =classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape)
print(Z)
plt.contourf(X1, X2, Z,alpha=0.2)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.legend()
plt.show()
[[2 2 2 ... 2 2 2] [2 2 2 ... 2 2 2] [2 2 2 ... 2 2 2] ... [1 1 1 ... 3 3 3] [1 1 1 ... 3 3 3] [1 1 1 ... 3 3 3]]