Dimensionality reduction

4 min readJul 7, 2020

Dimensionality reduction is the process of reducing the number of random variables under consideration, by obtaining a set of principal variables. It can be divided into feature selection and feature extraction.

some of the benefits of applying dimensionality reduction to a dataset:

Less dimensions lead to less computation/training time
Some algorithms do not perform well when we have a large dimensions. So reducing these dimensions needs to happen for the algorithm to be useful.
It takes care of multicollinearity by removing redundant features

Components of Dimensionality Reduction:

Feature selection

Feature extraction : This reduces the data in a high dimensional space to a lower dimension space.

Common Dimensionality Reduction Techniques

Principal component analysis (PCA) is a statistical procedure that orthogonally transforms the original n numeric dimensions of a dataset into a new set of n dimensions called principal components.

“ PCA orthogonally transform the m dim feature of dataset into n dim feature of dataset ”

features are orthogonal means that they are uncorrelated to each other…. PCA keeps the principle component based on Explained variance. Applying PCA to dataset loses its interpretability. If interpretability of the results is important for your analysis, PCA is not the transformation that you should apply.

A principal component is a linear combination of the original variables
Principal components are extracted in such a way that the first principal component explains maximum variance in the dataset
Second principal component tries to explain the remaining variance in the dataset and is uncorrelated to the first principal component
Third principal component tries to explain the variance which is not explained by the first two principal components and so on

Note : we need to always normalize dataset before performing PCA because the transformation is dependent on scale. if the features were measured on different scales and we want to assign equal importance to all features.If you don’t, the features that are on the largest scale would dominate your new principal components

It involves the following steps:

Standardize the n-dimensional dataset.
Construct the covariance matrix of the data.
Decompose the covariance matrix into its eigenvectors and eigenvalues.
Select k eigenvectors that correspond to the k largest eigenvalues, where k is the dimensionality of the new feature subspace ( k≤n ).
Construct a matrix W from the “top” k eigenvectors.
Transform the n-dimensional input dataset x using the projection matrix W to obtain the new k-dimensional feature subspace.

Continued from the previous section for principal component analysis, in this section we’ll standardize the data, construct the covariance matrix, obtain the eigenvalues and eigenvectors of the covariance matrix, and sort the eigenvalues by decreasing order to rank the eigenvectors.\

Let’s Code principal component analysis

Loading the Wine dataset

In [2]:

df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data',header=None)df_wine.head()

Standardize the data

In [5]:

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.fit_transform(X_test)

Construct the covariance matrix

In [7]:

covariant_matrix = np.cov(X_train_std.T)
covariant_matrix

Decompose the covariance matrix into its eigenvectors and eigenvalues

In [9]:

eigen_values, eigen_vectors = np.linalg.eig(covariant_matrix)
eigen_values, eigen_vectors[::5]

Principle Components with respect to Explained Variance Ratio

In [12]:

tot = sum(eigen_values)
var_exp = [(i / tot) for i in sorted(eigen_values, reverse=True)]
cum_var_exp = np.cumsum(var_exp)bar(range(1,14), var_exp, alpha=0.5, align='center',
                  label='individual explained variance')
step(range(1,14), cum_var_exp, where='mid',
                  label='cumulative explained variance')
ylabel('Explained variance ratio')
xlabel('Principal components')
legend(loc='best')
show

Next, we collect the two eigenvectors that correspond to the two largest values to capture about 60 percent of the variance in this dataset.

In [15]:

w= np.hstack((eigen_pairs[0][1][:, np.newaxis], eigen_pairs[1][1][:, np.newaxis]))w

In the same way, we can transform the entire 124×13 training dataset onto the two principal components by calculating the matrix dot product:

In [18]:

X_train_std[0].dot(w)
X_train_pca = X_train_std.dot(w)
X_train_std.shape, w.shape, X_train_pca.shape

Visualization of transformed Wine training set in a two-dimensional scatterplot

In [20]:

colors = ['r', 'b', 'g']
markers = ['s', 'x', 'o']
for l, c, m in zip(np.unique(y_train), colors, markers):
    scatter(X_train_pca[y_train==l, 0], X_train_pca[y_train==l, 1],
            c=c, label=l, marker=m)
xlabel('PC 1')
ylabel('PC 2')
legend(loc='lower left')
show()

Github repository

Dimensionality reduction

Components of Dimensionality Reduction:

Let’s Code principal component analysis

Written by Pradeep Dhote

No responses yet