Lesson 3: PCA and Dimensionality Reduction
Estimated time: 45-55 minutes
Learning Objectives
- Compute the covariance matrix of a data set
- Find principal components as eigenvectors of the covariance matrix
- Determine how much variance each principal component explains
- Project data onto fewer dimensions while preserving maximum information
The Problem: High-Dimensional Data
Real-world data often has many variables (features). With 100 features, you have data in R^100 -- impossible to visualize. PCA finds a small number of new axes that capture most of the variation in the data.
Principal Component Analysis (PCA): An orthogonal transformation that converts possibly correlated variables into linearly uncorrelated variables called principal components, ordered so that the first few retain most of the variation.
Step 1: Center the Data
Subtract the mean of each variable so the data is centered at the origin. If X is an n x p data matrix (n observations, p variables), compute the mean of each column and subtract it.
Worked Example 1: Centering
Data points in R^2: (1, 2), (3, 4), (5, 6). Means: x-bar = 3, y-bar = 4.
Centered data: (-2, -2), (0, 0), (2, 2).
Step 2: The Covariance Matrix
Sample Covariance Matrix: S = (1/(n-1)) X_c^T X_c, where X_c is the centered data matrix. S is a symmetric p x p matrix.
Diagonal entries of S = variances of each variable. Off-diagonal = covariances between pairs.
Worked Example 2
Centered data matrix X_c = [-2 -2; 0 0; 2 2].
X_c^T X_c = [(-2)^2+0+4, (-2)(-2)+0+4; same, same] = [8 8; 8 8].
S = (1/2)[8 8; 8 8] = [4 4; 4 4]. Both variables have variance 4 and covariance 4 (perfectly correlated).
Step 3: Eigendecomposition
The principal components are the eigenvectors of S, and the eigenvalues tell you how much variance each component explains.
Worked Example 3: Finding Principal Components
S = [4 4; 4 4]. Eigenvalues: lambda^2 - 8*lambda = lambda(lambda - 8) = 0. lambda_1 = 8, lambda_2 = 0.
lambda_1 = 8: eigenvector (1/sqrt(2))(1, 1). This is PC1 -- the direction of maximum variance.
lambda_2 = 0: eigenvector (1/sqrt(2))(1, -1). This is PC2 -- zero variance in this direction (data is perfectly on a line).
Variance explained by PC1: 8/(8+0) = 100%. We can reduce to 1 dimension with no information loss!
Variance Explained
Proportion of Variance: The proportion of total variance explained by the k-th principal component is:
lambda_k / (lambda_1 + lambda_2 + ... + lambda_p)
The cumulative proportion of the first k components measures how much information is retained when projecting to k dimensions.
Worked Example 4: How Many Components?
A dataset with 4 variables has covariance eigenvalues: 10, 5, 2, 0.5. Total = 17.5.
PC1: 10/17.5 = 57.1%. PC2: 5/17.5 = 28.6%. Cumulative: 85.7%.
PC3: 2/17.5 = 11.4%. Cumulative: 97.1%. PC4: 2.9%.
Two components capture 85.7% of variance; three capture 97.1%. A common threshold is 90-95%, suggesting 3 components suffice here.
Step 4: Project the Data
To reduce from p to k dimensions, project each data point onto the first k principal components.
Projection: If W = [v1 | v2 | ... | vk] contains the first k eigenvectors as columns, the reduced data is Y = X_c * W. Each row of Y gives the k-dimensional representation of that data point.
Worked Example 5
From Example 3, project onto PC1 = (1/sqrt(2))(1, 1).
Point (-2, -2): score = (-2, -2) · (1/sqrt(2), 1/sqrt(2)) = -4/sqrt(2) = -2*sqrt(2).
Point (0, 0): score = 0. Point (2, 2): score = 2*sqrt(2).
1D representation: {-2*sqrt(2), 0, 2*sqrt(2)}. All the variation is captured on this one axis.
Connection to SVD
PCA via SVD: If X_c = U Sigma V^T, then the principal components are the columns of V, and the singular values relate to eigenvalues of the covariance matrix by lambda_i = sigma_i^2 / (n-1).
In practice, SVD is the preferred numerical method for computing PCA.
Check Your Understanding
1. Why do we center the data before PCA?
2. Eigenvalues of a covariance matrix are 6, 3, 1. What percent of variance does PC1 explain?
3. Why must the covariance matrix be symmetric and positive semi-definite?
Key Takeaways
- PCA procedure: center data → covariance matrix → eigendecomposition → project
- Principal components = eigenvectors of covariance matrix, ordered by eigenvalue
- Eigenvalue = variance explained by that component
- Choose k components to retain desired fraction of total variance (typically 90-95%)
- SVD is the practical computational method for PCA
- PCA is the foundation of many data science and machine learning techniques