My takeaway series follow a Q&A format to explain AI concepts at three levels:
Anyone with general knowledge can understand them.
For anyone who wants to dive into the code implementation details of the concept.
For anyone who wants to understand the mathematics behind the technique.
Principal Component Analysis (PCA) is a dimensionality reduction technique. It finds the principal components in high-dimensional data and projects the data onto a lower-dimensional space. In AI context, PCA is an unsupervised learning algorithm.
PCA is a data dimensionality reduction technique.
Inputs:
- A dataset with
samples and features, represented as a matrix , where each row corresponds to a sample and each column corresponds to a feature with dimension.
Outputs:
- A transformed dataset with
samples and principal components, represented as a matrix , where .
PCA is done by SVD (Singular Value Decomposition). The steps are as follows:
- Standardize the data: Subtract the mean and divide by the standard deviation for each feature to ensure that each feature contributes equally to the analysis.
- Perform SVD on the standardized data matrix:
where
- Choose the top
columns of corresponding to the largest singular values in :
- Transform the data: Project the original standardized data onto the new feature space using the chosen principal components.
Each principal component
- The feature projected onto the first principal component
has the largest variance, the feature projected onto the second principal component has the second largest variance, and so on. - The transformed feature are statistically uncorrelated to each other, that is, when a feature increases or decreases, the other features do not change in a predictable way.
These two properties make the new features:
- More informative and less redundant, because the features carrying more information (less variance) are preserved first, and less informative (more variance) features are discarded.
- More independent, because the features are uncorrelated.
In PCA, the new coordinate system is defined by the principal components

Standardizing the data is crucial before applying PCA because PCA is sensitive to the scale of the features. If the features are on different scales, PCA may give more importance to features with larger scales, leading to biased results. Standardization ensures that each feature contributes equally to the analysis by transforming them to have zero mean and unit variance.
SVD is a mathematical technique that decomposes a matrix into three other matrices. For a given matrix
where:
is an orthogonal matrix whose columns are the left singular vectors of . is a diagonal matrix whose diagonal entries are the singular values of . is an orthogonal matrix whose columns are the right singular vectors of .
The singular values in
To compute the SVD of a matrix
- Compute the covariance matrix of
:
- Compute the eigenvalues and eigenvectors of the covariance matrix
:
Form the matrix
using the normalized eigenvectors as columns.Compute the singular values
as the square roots of the eigenvalues:
Form the diagonal matrix
using the singular values .Compute the left singular vectors
using the relationship:
- The SVD of
is then given by: