Learn Without Walls
← Module 7 HomeLesson 4 of 4Practice Problems →

Lesson 4: Least Squares Problems

Estimated time: 45-55 minutes

Learning Objectives

The Problem: Inconsistent Systems

When a system Ax = b has more equations than unknowns (overdetermined), there is usually no exact solution. Instead, we find the vector x-hat that makes Ax-hat as close to b as possible.

Least Squares Solution: x-hat minimizes ||b - Ax|| over all x. Equivalently, Ax-hat = proj_{col(A)}(b) -- the projection of b onto the column space of A.

The Normal Equations

The residual b - Ax-hat must be orthogonal to the column space of A. This means A^T(b - Ax-hat) = 0, which gives:

Normal Equations: A^T A x-hat = A^T b.

If A has linearly independent columns, then A^T A is invertible and x-hat = (A^T A)^{-1} A^T b.

Worked Example 1

Solve the least squares problem for Ax = b where A = [1 1; 1 2; 1 3], b = (1, 1, 3).

A^T A = [1 1 1; 1 2 3][1 1; 1 2; 1 3] = [3 6; 6 14].

A^T b = [1 1 1; 1 2 3](1, 1, 3)^T = [5; 12].

Solve [3 6; 6 14][x1; x2] = [5; 12]. From row 1: 3x1 + 6x2 = 5. Row 2 - 2*Row 1: 2x2 = 2, so x2 = 1. Then x1 = (5-6)/3 = -1/3.

Least squares solution: x-hat = (-1/3, 1).

Best-Fit Line

Given data points (t1, y1), ..., (tn, yn), the best-fit line y = c0 + c1*t minimizes the sum of squared residuals.

Setup: Let A = [1 t1; 1 t2; ...; 1 tn] and b = (y1, ..., yn). Solve A^T A x-hat = A^T b for x-hat = (c0, c1).

Worked Example 2: Best-Fit Line

Data: (1, 1), (2, 1), (3, 3). Find the best-fit line y = c0 + c1*t.

A = [1 1; 1 2; 1 3], b = (1, 1, 3). This is exactly Example 1!

x-hat = (-1/3, 1). Best-fit line: y = -1/3 + t.

Check predictions: t=1: y=2/3, t=2: y=5/3, t=3: y=8/3. Residuals: 1/3, -2/3, 1/3.

Sum of squared residuals: 1/9 + 4/9 + 1/9 = 6/9 = 2/3. No other line gives a smaller total.

Geometric Interpretation

The least squares solution projects b onto col(A). The residual e = b - Ax-hat is perpendicular to col(A).

Worked Example 3

From Example 1: Ax-hat = [1 1; 1 2; 1 3](-1/3, 1)^T = (2/3, 5/3, 8/3).

Residual: e = b - Ax-hat = (1/3, -2/3, 1/3).

Check e ⊥ col(A): A^T * e = [1 1 1; 1 2 3](1/3, -2/3, 1/3)^T = [0; 0]. Confirmed!

The least squares error: ||e|| = sqrt(1/9 + 4/9 + 1/9) = sqrt(6/9) = sqrt(6)/3.

Fitting Other Models

Least squares is not limited to lines. Any model that is linear in the parameters can be fit this way.

Worked Example 4: Best-Fit Parabola

Data: (0, 1), (1, 0), (2, 1), (3, 4). Fit y = c0 + c1*t + c2*t^2.

A = [1 0 0; 1 1 1; 1 2 4; 1 3 9], b = (1, 0, 1, 4).

A^T A = [4 6 14; 6 14 36; 14 36 98]. A^T b = [6; 15; 41].

Solving gives approximately c0 = 1, c1 = -1.5, c2 = 0.75.

Best-fit parabola: y = 1 - 1.5t + 0.75t^2.

Connection to Statistics

Linear Regression: The least squares method is the mathematical foundation of linear regression in statistics. The normal equations A^T A x-hat = A^T b give the ordinary least squares (OLS) estimator.

In statistics notation: X^T X beta-hat = X^T y, where X is the design matrix and beta-hat contains the estimated coefficients.

Check Your Understanding

1. Write the normal equations for the least squares problem Ax = b.

Answer: A^T A x-hat = A^T b.

2. Data points: (0, 2), (1, 1), (2, 4). Set up the design matrix A and vector b for the best-fit line y = c0 + c1*t.

Answer: A = [1 0; 1 1; 1 2], b = (2, 1, 4). Then solve A^T A x = A^T b.

3. Why must the residual b - Ax-hat be orthogonal to col(A)?

Answer: Because Ax-hat is the projection of b onto col(A), and by the orthogonal decomposition theorem, the residual must lie in col(A)-perp.

Key Takeaways

Practice

Test your skills with 10 problems.

Practice Problems

Module Home

Module 7 Home