Save or print this lesson:

← Module 8 HomeLesson 3 of 4Next Lesson →

Lesson 3: PCA and Dimensionality Reduction

Estimated time: 45-55 minutes

Learning Objectives

Compute the covariance matrix of a data set
Find principal components as eigenvectors of the covariance matrix
Determine how much variance each principal component explains
Project data onto fewer dimensions while preserving maximum information

The Problem: High-Dimensional Data

Real-world data often has many variables (features). With 100 features, you have data in R^100 -- impossible to visualize. PCA finds a small number of new axes that capture most of the variation in the data.

Principal Component Analysis (PCA): An orthogonal transformation that converts possibly correlated variables into linearly uncorrelated variables called principal components, ordered so that the first few retain most of the variation.

Step 1: Center the Data

Subtract the mean of each variable so the data is centered at the origin. If X is an n x p data matrix (n observations, p variables), compute the mean of each column and subtract it.

Worked Example 1: Centering

Data points in R^2: (1, 2), (3, 4), (5, 6). Means: x-bar = 3, y-bar = 4.

Centered data: (-2, -2), (0, 0), (2, 2).

Step 2: The Covariance Matrix

Sample Covariance Matrix: S = (1/(n-1)) X_c^T X_c, where X_c is the centered data matrix. S is a symmetric p x p matrix.

Diagonal entries of S = variances of each variable. Off-diagonal = covariances between pairs.

Worked Example 2

Centered data matrix X_c = [-2 -2; 0 0; 2 2].

X_c^T X_c = [(-2)^2+0+4, (-2)(-2)+0+4; same, same] = [8 8; 8 8].

S = (1/2)[8 8; 8 8] = [4 4; 4 4]. Both variables have variance 4 and covariance 4 (perfectly correlated).

Step 3: Eigendecomposition

The principal components are the eigenvectors of S, and the eigenvalues tell you how much variance each component explains.

Worked Example 3: Finding Principal Components

S = [4 4; 4 4]. Eigenvalues: lambda^2 - 8*lambda = lambda(lambda - 8) = 0. lambda_1 = 8, lambda_2 = 0.

lambda_1 = 8: eigenvector (1/sqrt(2))(1, 1). This is PC1 -- the direction of maximum variance.

lambda_2 = 0: eigenvector (1/sqrt(2))(1, -1). This is PC2 -- zero variance in this direction (data is perfectly on a line).

Variance explained by PC1: 8/(8+0) = 100%. We can reduce to 1 dimension with no information loss!

Variance Explained

Proportion of Variance: The proportion of total variance explained by the k-th principal component is:

lambda_k / (lambda_1 + lambda_2 + ... + lambda_p)

The cumulative proportion of the first k components measures how much information is retained when projecting to k dimensions.

Worked Example 4: How Many Components?

A dataset with 4 variables has covariance eigenvalues: 10, 5, 2, 0.5. Total = 17.5.

PC1: 10/17.5 = 57.1%. PC2: 5/17.5 = 28.6%. Cumulative: 85.7%.

PC3: 2/17.5 = 11.4%. Cumulative: 97.1%. PC4: 2.9%.

Two components capture 85.7% of variance; three capture 97.1%. A common threshold is 90-95%, suggesting 3 components suffice here.

Step 4: Project the Data

To reduce from p to k dimensions, project each data point onto the first k principal components.

Projection: If W = [v1 | v2 | ... | vk] contains the first k eigenvectors as columns, the reduced data is Y = X_c * W. Each row of Y gives the k-dimensional representation of that data point.

Worked Example 5

From Example 3, project onto PC1 = (1/sqrt(2))(1, 1).

Point (-2, -2): score = (-2, -2) · (1/sqrt(2), 1/sqrt(2)) = -4/sqrt(2) = -2*sqrt(2).

Point (0, 0): score = 0. Point (2, 2): score = 2*sqrt(2).

1D representation: {-2*sqrt(2), 0, 2*sqrt(2)}. All the variation is captured on this one axis.

Connection to SVD

PCA via SVD: If X_c = U Sigma V^T, then the principal components are the columns of V, and the singular values relate to eigenvalues of the covariance matrix by lambda_i = sigma_i^2 / (n-1).

In practice, SVD is the preferred numerical method for computing PCA.

Check Your Understanding

1. Why do we center the data before PCA?

Answer: Centering ensures we are analyzing variance around the mean, not distance from the origin. Without centering, the first PC would point toward the mean, not the direction of maximum variation.

2. Eigenvalues of a covariance matrix are 6, 3, 1. What percent of variance does PC1 explain?

Answer: 6/(6+3+1) = 6/10 = 60%.

3. Why must the covariance matrix be symmetric and positive semi-definite?

Answer: Symmetric because cov(X,Y) = cov(Y,X). Positive semi-definite because variance is nonneg: for any vector w, w^T S w = var(w^T X) ≥ 0. This guarantees all eigenvalues are nonneg.

Key Takeaways

PCA procedure: center data → covariance matrix → eigendecomposition → project
Principal components = eigenvectors of covariance matrix, ordered by eigenvalue
Eigenvalue = variance explained by that component
Choose k components to retain desired fraction of total variance (typically 90-95%)
SVD is the practical computational method for PCA
PCA is the foundation of many data science and machine learning techniques

Next Lesson

Computer Graphics Transformations.

Start Lesson 4

Module Home

Module 8 Home