Principle Component Analysis Tutorial

  • date 27th February, 2020 |
  • by Prwatech |


Principal Component Analysis Tutorial


Principal Component Analysis Tutorial, in this Tutorial one can learn types of principal component analysis. Are you the one who is looking for the best platform which provides information about know working principle of pca,Applications of Principal Component Analysis? Or the one who is looking forward to taking the advanced Data Science Certification Course with Machine Learning from India’s Leading Data Science Training institute? Then you’ve landed on the Right Path.


With the advancements within the field of Machine Learning and computer science, it’s become essential to grasp the basics behind such technologies. This blog on Principal Component Analysis will facilitate your understand the concepts behind dimensionality reduction and the way it may be wont to house high dimensional data.


The Below mentioned Principal Component Analysis Tutorial will help to Understand the detailed information about what is PCA in machine learning, so Just follow all the tutorials of India’s Leading Best Data Science Training institute in Bangalore and Be a Pro Data Scientist or Machine Learning Engineer.


What is PCA in Machine Learning?


Suppose you have to deal with the dataset based on GDP (Gross Domestic Production) of any country.  While processing on data you will come across so many fields, which we call as features or variables. Now the question arises that, how we can take all variables and select only few of them for further processing? There may be problem of overfitting the model. So we have to reduce dimensionality of that data so that it will be easier and we can have smaller amount of relationships between variables to consider. So, reducing dimensions of feature space is nothing but ‘Dimensionality Reduction.


Types of Principal Component Analysis


There are different methods to achieve dimensionality reduction. Mostly used methods are categorized into:


Features Elimination

Features Extraction


Feature Elimination


As name suggests, in this method the features are eliminated which are having very low significance. Here only the best features are selected as per domain requirement. The rest features are eliminated. Simplicity and maintaining interpretability are the advantages for variables in this method.


Feature Extraction


Suppose we have N number of independent variables. In feature extraction, we create N “new” independent variables, where each newly generated independent variable is a combination of each of the N “old” independent variables. However, the newly created variables have arranged in a specific order based on how well they predict dependent variable. While following these methods, we always eliminate or remove the features which are ‘least important’. And there ‘Dimensionality reduction’ comes in a picture. As we ordered the new variables by how well they predict our dependent variable, we are well known about which variable is the most important and least important. Still we are keeping the most valuable parts of our old variables, even when we drop one or more of these “new” variables.


When PCA can be used?


If we want to reduce the number of variables, but we are unable to identify which variables to completely remove from consideration then PCA is best option. Even, there are some cases where we can make independent variables less interpretable and we have to ensure those variables are independent on each other also.


Principal Component Analysis (PCA) is a numerical procedure which uses an orthogonal alteration. It converts a set of correlated variables to a set of uncorrelated variables. Investigative data analysis and predictive models use PCA as effective tool. It is also called a general factor analysis, as a line of best fit is determined by regression.


Working Principle of PCA



In simple words, PCA takes a dataset with a lot of dimensions and flattens it to 2 or 3 dimensions. It tries to find a meaningful way to flatten the data by focusing on the things that are different between independent variables.


The image above shows the example of transformation of a high dimensional data i.e. 3 dimensional data to low dimensional i.e. 2 dimensional data using PCA. Before moving to actual concept, let’s see some terminologies related with PCA.


Dimensionality:  It is simply the number of features, or the number of columns present in our dataset. We can consider it as a number of random variables in a dataset


Correlation:  It displays how strongly two variables are related to each other. The value ranges for -1 to +1. Positive indicates that if one variable increases, the other will increase as well, while negative indicates the other decreases on increasing the other. And the modulus value of indicates the strength of relation.


Orthogonal:  Uncorrelated to every other, i.e., correlation between any pair of variables is 0.


Eigenvectors:  Let’s consider a non-zero vector v. Let’s take a square matrix A. So it is an eigenvector of a A, if Av is a scalar multiple of v. Or simply:



Av = ƛv


Here, v is the eigenvector

ƛ is the eigenvalue associated with it.


Covariance Matrix: This matrix consists of covariance between the pairs of variables. The covariance between i-th and j-th variable is nothing but this (i,j)th element.


Principal Components


A normalized linear combination of the original predictors in a data set is called a principal component is. In image above, the principle components are indicated by PC1 and PC2. We can fit the data into two axes, which are nothing but these principle components i.e. PCs.


PC1 is the first principle axis that spans the most variation, where as PC2 is the second principle axis which spans the second most variation. Means PC1 will capture the directions where most of the variation is present and PC2 captures the direction with second most variation.


The PCs are essentially the linear combinations of the original variables, the weights vector in this combination is actually the eigenvector found which in turn satisfies the principle of least squares.


The PCs are orthogonal in nature.

As we move from the 1st PC to last one, the variation present in PCs decrease.

Sometimes in regression, outlier detection problems these least important PCs are useful.

Implementing PCA on a 2-D Dataset


Step 1: Normalize the data:


First step is to normalize data that we have so that PCA works properly. This is done by subtracting respective means from numbers in the respective column. Let’s consider we have two dimensions X and Y. All X will be 𝔁- and all Y will be 𝒚-. This produces a dataset whose mean is zero.


Step 2: Calculate the covariance matrix


Since the dataset we took is 2-dimensional, this will give result in a 2×2 Covariance matrix.

Please note that

Var[X1] = Cov[X1,X1]

Var[X2] = Cov[X2,X2].


Step 3: Calculate the eigenvalues and eigenvectors:


Next step is to calculate eigenvalues and eigenvectors for covariance matrix. For a matrix A, ƛ is an eigenvalue which is a solution of the characteristic equation:


det( ƛI – A ) = 0



I is the identity matrix of the same dimension as matrix A. It is required condition for the matrix subtraction .

‘det’ is the determinant of the matrix.

For each eigenvalue ƛ, a corresponding eigenvector v, can be found by solving:


( ƛI – A )v = 0


Step 4: Selecting components and forming a feature vector:


We order eigenvalues from largest to smallest so that it gives us components in order or significance. Here comes dimensionality reduction part. If we have a data with n variables, then we have corresponding n eigenvalues and eigenvectors. It turns out that eigenvector corresponding to highest eigenvalue is principal component of dataset and it is our call as to how many eigenvalues we choose to move ahead our analysis with. To reduce dimensions, we choose first p eigenvalues and ignore rest. We do lose out some information in process, but if eigenvalues are small, we do not lose much.


Next we will form a feature vector which is a matrix. This is matrix of vectors with the eigenvectors which we want to proceed with. Since we have just 2 dimensions in running example, we can either choose one corresponding to greater eigenvalue or simply take both.


Feature Vector = (eig1, eig2)


Step 5: Forming Principal Components


This is final step where we actually form principal components using all math we did till here. For same, we take transpose of feature vector and left-multiply it with transpose of scaled version of original dataset.


Forming Principal Components



NewData = Matrix consisting of the principal components,

Feature Vector = matrix we formed using eigenvectors we chose to keep, and

Scaled Data is scaled version of original dataset

Where T denotes transpose of matrices.


If we go back to theory of eigenvalues and eigenvectors, we will see that, essentially, eigenvectors provide us with information about patterns in data. In this example, if we plot eigenvectors on scatterplot of data, we find that principal eigenvector actually fits well with data. The other one, being perpendicular to it, does not carry much information and hence, we are at not at much loss when deprecating it, hence reducing the dimension.


All the eigenvectors of a matrix are orthogonal i.e. perpendicular to each other. So, in PCA, what we do is represents or transforms the original dataset using these orthogonal (perpendicular) eigenvectors instead of representing on normal x and y axis. We have now classified our data points as a combination of contributions from both x and y. The difference lies when we actually disregard one or many eigenvectors, hence, reducing dimension of dataset. Otherwise, in case, we take all eigenvectors in account, we are just transforming coordinates and hence, not serving purpose.


Applications of Principal Component Analysis


PCA is predominantly used as type of a dimensionality reduction technique in domains like facial recognition, computer vision and image compression. It is also used for determining patterns in data of high dimension in fields of finance, data mining, bioinformatics, psychology, etc.


Step by Step Implementation of PCA using Python


Import required libraries


import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

%matplotlib inline


Import dataset


from sklearn.datasets import load_breast_cancer





Step by Step Implementation of PCA using Python


Normalize DataSet using Standard Scalar


from sklearn.preprocessing import StandardScaler

scl= StandardScaler()


from sklearn.decomposition import PCA





Import and implement PCA


#a larger plot


plt.scatter (x_pca[:,0],x_pca[:,1],c=cancer[‘target’],cmap=’viridis’)

# Labeling to axes

plt.xlabel(‘First Principle component’)

plt.ylabel(‘Second Principle Component’)




Import and implement PCA


Display Dimension reduced






Display Dimension reduced


We hope you understand Principal Component Analysis Tutorial .Get success in your career as a Data Scientist by being a part of the Prwatech, India’s leading Data Science training institute in Bangalore.



Quick Support

image image