Dimensionality reduction
Dimensionality reduction is the process of reducing the number of random variables under consideration, by obtaining a set of principal variables. It can be divided into feature selection and feature extraction.
some of the benefits of applying dimensionality reduction to a dataset:
- Less dimensions lead to less computation/training time
- Some algorithms do not perform well when we have a large dimensions. So reducing these dimensions needs to happen for the algorithm to be useful.
- It takes care of multicollinearity by removing redundant features
Components of Dimensionality Reduction:
Feature selection
Feature extraction : This reduces the data in a high dimensional space to a lower dimension space.
Common Dimensionality Reduction Techniques
Principal component analysis (PCA) is a statistical procedure that orthogonally transforms the original n numeric dimensions of a dataset into a new set of n dimensions called principal components.
“ PCA orthogonally transform the m dim feature of dataset into n dim feature of dataset ”
features are orthogonal means that they are uncorrelated to each other…. PCA keeps the principle component based on Explained variance. Applying PCA to dataset loses its interpretability. If interpretability of the results is important for your analysis, PCA is not the transformation that you should apply.
- A principal component is a linear combination of the original variables
- Principal components are extracted in such a way that the first principal component explains maximum variance in the dataset
- Second principal component tries to explain the remaining variance in the dataset and is uncorrelated to the first principal component
- Third principal component tries to explain the variance which is not explained by the first two principal components and so on
Note : we need to always normalize dataset before performing PCA because the transformation is dependent on scale. if the features were measured on different scales and we want to assign equal importance to all features.If you don’t, the features that are on the largest scale would dominate your new principal components
It involves the following steps:
- Standardize the n-dimensional dataset.
- Construct the covariance matrix of the data.
- Decompose the covariance matrix into its eigenvectors and eigenvalues.
- Select k eigenvectors that correspond to the k largest eigenvalues, where k is the dimensionality of the new feature subspace ( k≤n ).
- Construct a matrix W from the “top” k eigenvectors.
- Transform the n-dimensional input dataset x using the projection matrix W to obtain the new k-dimensional feature subspace.
Continued from the previous section for principal component analysis, in this section we’ll standardize the data, construct the covariance matrix, obtain the eigenvalues and eigenvectors of the covariance matrix, and sort the eigenvalues by decreasing order to rank the eigenvectors.\
Let’s Code principal component analysis
Loading the Wine dataset
In [2]:
df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data',header=None)df_wine.head()
Standardize the data
In [5]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.fit_transform(X_test)
Construct the covariance matrix
In [7]:
covariant_matrix = np.cov(X_train_std.T)
covariant_matrix
Decompose the covariance matrix into its eigenvectors and eigenvalues
In [9]:
eigen_values, eigen_vectors = np.linalg.eig(covariant_matrix)
eigen_values, eigen_vectors[::5]
Principle Components with respect to Explained Variance Ratio
In [12]:
tot = sum(eigen_values)
var_exp = [(i / tot) for i in sorted(eigen_values, reverse=True)]
cum_var_exp = np.cumsum(var_exp)bar(range(1,14), var_exp, alpha=0.5, align='center',
label='individual explained variance')
step(range(1,14), cum_var_exp, where='mid',
label='cumulative explained variance')
ylabel('Explained variance ratio')
xlabel('Principal components')
legend(loc='best')
show
Next, we collect the two eigenvectors that correspond to the two largest values to capture about 60 percent of the variance in this dataset.
In [15]:
w= np.hstack((eigen_pairs[0][1][:, np.newaxis], eigen_pairs[1][1][:, np.newaxis]))w
In the same way, we can transform the entire 124×13 training dataset onto the two principal components by calculating the matrix dot product:
In [18]:
X_train_std[0].dot(w)
X_train_pca = X_train_std.dot(w)
X_train_std.shape, w.shape, X_train_pca.shape
Visualization of transformed Wine training set in a two-dimensional scatterplot
In [20]:
colors = ['r', 'b', 'g']
markers = ['s', 'x', 'o']
for l, c, m in zip(np.unique(y_train), colors, markers):
scatter(X_train_pca[y_train==l, 0], X_train_pca[y_train==l, 1],
c=c, label=l, marker=m)
xlabel('PC 1')
ylabel('PC 2')
legend(loc='lower left')
show()