Lesson 2: Introduction to the Singular Value Decomposition (SVD)
Estimated time: 45-55 minutes
Learning Objectives
- State the SVD theorem: A = U Sigma V^T
- Compute the SVD for small matrices step by step
- Understand the geometric interpretation: rotate, stretch, rotate
- Connect singular values to eigenvalues of A^T A
What Is the SVD?
Singular Value Decomposition: Every m x n matrix A can be factored as A = U * Sigma * V^T, where:
- U is m x m orthogonal (columns are left singular vectors)
- Sigma is m x n diagonal (entries are singular values sigma_1 ≥ sigma_2 ≥ ... ≥ 0)
- V is n x n orthogonal (columns are right singular vectors)
Unlike diagonalization, the SVD exists for every matrix -- not just square ones, and not just diagonalizable ones.
Singular Values and Vectors
Singular Values: The singular values of A are sigma_i = sqrt(lambda_i), where lambda_i are the eigenvalues of A^T A (arranged in decreasing order).
Right singular vectors: Columns of V are the eigenvectors of A^T A (orthonormal).
Left singular vectors: Columns of U are the eigenvectors of A*A^T (orthonormal), or computed from u_i = (1/sigma_i) A v_i.
Computing the SVD: Step by Step
Step 1: Compute A^T A and find its eigenvalues and eigenvectors. The eigenvectors (orthonormalized) form V.
Step 2: The singular values are the square roots of the eigenvalues of A^T A.
Step 3: Compute U columns: u_i = (1/sigma_i) A v_i for each nonzero singular value. Extend to a full orthonormal set if needed.
Worked Example 1: SVD of a 2x2 Matrix
A = [3 0; 0 2].
Step 1: A^T A = [9 0; 0 4]. Eigenvalues: 9 and 4. V = I = [1 0; 0 1].
Step 2: sigma_1 = 3, sigma_2 = 2. Sigma = [3 0; 0 2].
Step 3: u1 = (1/3)A(1,0)^T = (1,0). u2 = (1/2)A(0,1)^T = (0,1). U = I.
A = I [3 0; 0 2] I^T. For diagonal matrices, the SVD is trivial!
Worked Example 2: SVD of a Rectangular Matrix
A = [1 1; 0 1; 1 0]. (3x2 matrix)
Step 1: A^T A = [1 0 1; 1 1 0][1 1; 0 1; 1 0] = [2 1; 1 2]. Eigenvalues: 3 and 1.
lambda = 3: v1 = (1/sqrt(2))(1, 1). lambda = 1: v2 = (1/sqrt(2))(1, -1).
Step 2: sigma_1 = sqrt(3), sigma_2 = 1.
Step 3: u1 = (1/sqrt(3)) A v1 = (1/sqrt(3))(1/sqrt(2))(2, 1, 1) = (1/sqrt(6))(2, 1, 1).
u2 = A v2 = (1/sqrt(2))(0, 1, -1) = wait, let me recompute. u2 = (1/1)(1/sqrt(2))(0, -1, 1) = (1/sqrt(2))(0, -1, 1).
u3 is chosen to complete the orthonormal basis for R^3: u3 = (1/sqrt(3))(1, -1, -1) (after orthogonalization).
Geometric Interpretation
The SVD reveals that every linear transformation can be decomposed into three simpler steps:
Geometry: A maps the unit sphere to an ellipse. The singular values are the semi-axis lengths of the ellipse. The right singular vectors are the input directions, and the left singular vectors are the output directions.
Properties and Connections
Key Properties:
- rank(A) = number of nonzero singular values
- ||A|| (operator norm) = sigma_1 (the largest singular value)
- ||A||_F (Frobenius norm) = sqrt(sigma_1^2 + sigma_2^2 + ...)
- The singular values of A are the square roots of the eigenvalues of A^T A
Low-Rank Approximation
Truncating the SVD to the first k singular values gives the best rank-k approximation of A.
Eckart-Young Theorem: A_k = sigma_1 u1 v1^T + sigma_2 u2 v2^T + ... + sigma_k uk vk^T is the closest rank-k matrix to A (in both Frobenius and operator norms).
Worked Example 3: Rank-1 Approximation
A = [3 0; 0 2]. sigma_1 = 3, u1 = (1,0), v1 = (1,0).
Best rank-1 approximation: A_1 = 3 * (1,0)^T * (1,0) = [3 0; 0 0].
Error: ||A - A_1||_F = 2 (the second singular value). The approximation captures 3/sqrt(13) is about 83% of the information.
Check Your Understanding
1. What are the three factors in the SVD and what are their properties?
2. How do you find the singular values of A?
3. A has singular values 5, 3, 0. What is the rank of A?
Key Takeaways
- SVD: A = U Sigma V^T exists for every matrix
- Singular values = sqrt(eigenvalues of A^T A)
- Geometric meaning: rotate, stretch, rotate
- rank(A) = number of nonzero singular values
- Truncated SVD gives the best low-rank approximation
- Foundation for data compression, noise reduction, and recommender systems