Lesson 4: Least Squares Problems

Estimated time: 45-55 minutes

Learning Objectives

Understand why overdetermined systems (more equations than unknowns) typically have no exact solution
Derive and solve the normal equations A^T A x-hat = A^T b
Apply least squares to find the best-fit line through data points
Connect least squares to projection: x-hat minimizes ||b - Ax||

The Problem: Inconsistent Systems

When a system Ax = b has more equations than unknowns (overdetermined), there is usually no exact solution. Instead, we find the vector x-hat that makes Ax-hat as close to b as possible.

Least Squares Solution: x-hat minimizes ||b - Ax|| over all x. Equivalently, Ax-hat = proj_{col(A)}(b) -- the projection of b onto the column space of A.

The Normal Equations

The residual b - Ax-hat must be orthogonal to the column space of A. This means A^T(b - Ax-hat) = 0, which gives:

Normal Equations: A^T A x-hat = A^T b.

If A has linearly independent columns, then A^T A is invertible and x-hat = (A^T A)^{-1} A^T b.

Worked Example 1

Solve the least squares problem for Ax = b where A = [1 1; 1 2; 1 3], b = (1, 1, 3).

A^T A = [1 1 1; 1 2 3][1 1; 1 2; 1 3] = [3 6; 6 14].

A^T b = [1 1 1; 1 2 3](1, 1, 3)^T = [5; 12].

Solve [3 6; 6 14][x1; x2] = [5; 12]. From row 1: 3x1 + 6x2 = 5. Row 2 - 2*Row 1: 2x2 = 2, so x2 = 1. Then x1 = (5-6)/3 = -1/3.

Least squares solution: x-hat = (-1/3, 1).

Best-Fit Line

Given data points (t1, y1), ..., (tn, yn), the best-fit line y = c0 + c1*t minimizes the sum of squared residuals.

Setup: Let A = [1 t1; 1 t2; ...; 1 tn] and b = (y1, ..., yn). Solve A^T A x-hat = A^T b for x-hat = (c0, c1).

Worked Example 2: Best-Fit Line

Data: (1, 1), (2, 1), (3, 3). Find the best-fit line y = c0 + c1*t.

A = [1 1; 1 2; 1 3], b = (1, 1, 3). This is exactly Example 1!

x-hat = (-1/3, 1). Best-fit line: y = -1/3 + t.

Check predictions: t=1: y=2/3, t=2: y=5/3, t=3: y=8/3. Residuals: 1/3, -2/3, 1/3.

Sum of squared residuals: 1/9 + 4/9 + 1/9 = 6/9 = 2/3. No other line gives a smaller total.

Geometric Interpretation

The least squares solution projects b onto col(A). The residual e = b - Ax-hat is perpendicular to col(A).

Worked Example 3

From Example 1: Ax-hat = [1 1; 1 2; 1 3](-1/3, 1)^T = (2/3, 5/3, 8/3).

Residual: e = b - Ax-hat = (1/3, -2/3, 1/3).

Check e ⊥ col(A): A^T * e = [1 1 1; 1 2 3](1/3, -2/3, 1/3)^T = [0; 0]. Confirmed!

The least squares error: ||e|| = sqrt(1/9 + 4/9 + 1/9) = sqrt(6/9) = sqrt(6)/3.

Fitting Other Models

Least squares is not limited to lines. Any model that is linear in the parameters can be fit this way.

Worked Example 4: Best-Fit Parabola

Data: (0, 1), (1, 0), (2, 1), (3, 4). Fit y = c0 + c1*t + c2*t^2.

A = [1 0 0; 1 1 1; 1 2 4; 1 3 9], b = (1, 0, 1, 4).

A^T A = [4 6 14; 6 14 36; 14 36 98]. A^T b = [6; 15; 41].

Solving gives approximately c0 = 1, c1 = -1.5, c2 = 0.75.

Best-fit parabola: y = 1 - 1.5t + 0.75t^2.

Connection to Statistics

Linear Regression: The least squares method is the mathematical foundation of linear regression in statistics. The normal equations A^T A x-hat = A^T b give the ordinary least squares (OLS) estimator.

In statistics notation: X^T X beta-hat = X^T y, where X is the design matrix and beta-hat contains the estimated coefficients.

Check Your Understanding

1. Write the normal equations for the least squares problem Ax = b.

Answer: A^T A x-hat = A^T b.

2. Data points: (0, 2), (1, 1), (2, 4). Set up the design matrix A and vector b for the best-fit line y = c0 + c1*t.

Answer: A = [1 0; 1 1; 1 2], b = (2, 1, 4). Then solve A^T A x = A^T b.

3. Why must the residual b - Ax-hat be orthogonal to col(A)?

Answer: Because Ax-hat is the projection of b onto col(A), and by the orthogonal decomposition theorem, the residual must lie in col(A)-perp.

Key Takeaways

Normal equations: A^T A x-hat = A^T b gives the least squares solution
x-hat minimizes ||b - Ax|| (the sum of squared residuals)
Best-fit line: A = [1 t1; ...; 1 tn], solve for (intercept, slope)
Geometric view: Ax-hat = proj_{col(A)}(b), and b - Ax-hat is in col(A)-perp
Can fit any model linear in parameters (lines, polynomials, etc.)
This is the mathematical foundation of linear regression

Practice

Test your skills with 10 problems.

Practice Problems

Module Home

Module 7 Home