15  Overdetermined Systems and Least Squares

Approximating the solution to an overdetermined system where \(m > n\) and no exact solution exists.

15.1 Least Squares Problem

NoteDefinition: Overdetermined System

\(\bA\bx = \bb\) with \(\bA \in \fR^{m \times n}, m > n\). Usually \(\bb \notin \mathcal{R}(\bA)\).

NoteDefinition: Least Squares Problem

Find \(\hat{\bx}\) that minimizes the Euclidean residual: \[ \begin{align} \hat{\bx} = \arg\min_{\bx \in \fR^n} \|\bA\bx - \bb\|_2^2. \end{align} \] Geometry: \(\bA\hat{\bx}\) is the orthogonal projection of \(\bb\) onto \(\mathcal{R}(\bA)\).

15.2 Normal Equations

NoteTheorem: Normal Equations

The minimizer satisfies \(\bA^T\bA\hat{\bx} = \bA^T\bb\).

  • Condition: \(\bA^T\br = \bzero\) (Residual must be normal to the column space).

  • Unique Solution: \(\hat{\bx} = (\bA^T\bA)^{-1}\bA^T\bb\) if \(\bA\) has full column rank.

Tip

Conditioning of the Normal Equations. The condition number of the normal equations matrix satisfies \(\kappa_2(\bA^T\bA) = \kappa_2(\bA)^2\). Squaring the condition number can lead to a significant loss of precision. For example, if \(\kappa_2(\bA) = 10^6\), the solution may lose up to 12 digits of accuracy.

WarningExercise
  1. Fit a line \(y = mx + b\) to points \((1,2), (2,3), (3,5), (4,4)\) via normal equations.

  2. Prove \(\kappa_2(\bA^T\bA) = \kappa_2(\bA)^2\) using singular values.

15.3 QR-Based Least Squares

The numerically superior approach.

NoteTheorem: Solving via QR

If \(\bA = \bQ\bR\) (thin QR), the least squares solution satisfies: \[ \begin{align} \bR\hat{\bx} = \bQ^T\bb. \end{align} \]

The goal is to minimize \(\|\bA\bx - \bb\|_2^2\). Let \(\bA = \bQ\bR\) be the thin QR factorization. Using the orthogonal projector \(\bP = \bQ\bQ^T\) and its complement \(\bI - \bP\): \[ \begin{align} \|\bA\bx - \bb\|_2^2 &= \|\bQ\bR\bx - \bb\|_2^2 \\ &= \|\bQ\bR\bx - (\bQ\bQ^T\bb + (\bI - \bQ\bQ^T)\bb)\|_2^2 \\ &= \|(\bQ\bR\bx - \bQ\bQ^T\bb) - (\bI - \bQ\bQ^T)\bb\|_2^2. \end{align} \] Since \(\bQ(\bR\bx - \bQ^T\bb) \in \mathcal{R}(\bQ)\) and \((\bI - \bQ\bQ^T)\bb \in \mathcal{R}(\bQ)^\perp\), the Pythagorean theorem gives: \[ \begin{align} \|\bA\bx - \bb\|_2^2 = \|\bQ(\bR\bx - \bQ^T\bb)\|_2^2 + \|(\bI - \bQ\bQ^T)\bb\|_2^2. \end{align} \] The first term is minimized (set to zero) when \(\bR\bx = \bQ^T\bb\), and the second term is independent of \(\bx\). Thus \(\hat{\bx} = \bR^{-1}\bQ^T\bb\) is the unique minimizer.

Tip

Performance: QR-based least squares costs \(O(mn^2)\). It is the default in np.linalg.lstsq.

WarningExercise
  1. Solve the line-fitting problem using QR. Compare coefficients with the normal equations result.

  2. Regularization: If \(\bA\) is rank-deficient, \(\|\hat{\bx}\|\) can explode. Add a penalty term \(\lambda \|\bx\|_2^2\) (Ridge Regression). How do the normal equations change?

15.4 Polynomial Regression

NoteDefinition: Vandermonde Matrix

For nodes \(t_1, ..., t_m\), the degree-\(n\) design matrix is: \[ \begin{align} V_{ij} = t_i^{j-1}. \end{align} \] Polynomial fitting \(p(t) = \sum c_j t^j\) is solved as \(\bV\bc \approx \mathbf{f}\).

WarningExercise
  1. Fit a degree-2 polynomial to 6 data points. Verify the residual and \(R^2\) score.

  2. Numerical Stability: Construct a Vandermonde matrix with \(n=10\). Compare the errors of lstsq(A, b) vs solve(A.T @ A, A.T @ b).