15 Overdetermined Systems and Least Squares
Approximating the solution to an overdetermined system where \(m > n\) and no exact solution exists.
15.1 Least Squares Problem
\(\bA\bx = \bb\) with \(\bA \in \fR^{m \times n}, m > n\). Usually \(\bb \notin \mathcal{R}(\bA)\).
Find \(\hat{\bx}\) that minimizes the Euclidean residual: \[ \begin{align} \hat{\bx} = \arg\min_{\bx \in \fR^n} \|\bA\bx - \bb\|_2^2. \end{align} \] Geometry: \(\bA\hat{\bx}\) is the orthogonal projection of \(\bb\) onto \(\mathcal{R}(\bA)\).
15.2 Normal Equations
The minimizer satisfies \(\bA^T\bA\hat{\bx} = \bA^T\bb\).
Condition: \(\bA^T\br = \bzero\) (Residual must be normal to the column space).
Unique Solution: \(\hat{\bx} = (\bA^T\bA)^{-1}\bA^T\bb\) if \(\bA\) has full column rank.
Conditioning of the Normal Equations. The condition number of the normal equations matrix satisfies \(\kappa_2(\bA^T\bA) = \kappa_2(\bA)^2\). Squaring the condition number can lead to a significant loss of precision. For example, if \(\kappa_2(\bA) = 10^6\), the solution may lose up to 12 digits of accuracy.
Fit a line \(y = mx + b\) to points \((1,2), (2,3), (3,5), (4,4)\) via normal equations.
Prove \(\kappa_2(\bA^T\bA) = \kappa_2(\bA)^2\) using singular values.
15.3 QR-Based Least Squares
The numerically superior approach.
If \(\bA = \bQ\bR\) (thin QR), the least squares solution satisfies: \[ \begin{align} \bR\hat{\bx} = \bQ^T\bb. \end{align} \]
The goal is to minimize \(\|\bA\bx - \bb\|_2^2\). Let \(\bA = \bQ\bR\) be the thin QR factorization. Using the orthogonal projector \(\bP = \bQ\bQ^T\) and its complement \(\bI - \bP\): \[ \begin{align} \|\bA\bx - \bb\|_2^2 &= \|\bQ\bR\bx - \bb\|_2^2 \\ &= \|\bQ\bR\bx - (\bQ\bQ^T\bb + (\bI - \bQ\bQ^T)\bb)\|_2^2 \\ &= \|(\bQ\bR\bx - \bQ\bQ^T\bb) - (\bI - \bQ\bQ^T)\bb\|_2^2. \end{align} \] Since \(\bQ(\bR\bx - \bQ^T\bb) \in \mathcal{R}(\bQ)\) and \((\bI - \bQ\bQ^T)\bb \in \mathcal{R}(\bQ)^\perp\), the Pythagorean theorem gives: \[ \begin{align} \|\bA\bx - \bb\|_2^2 = \|\bQ(\bR\bx - \bQ^T\bb)\|_2^2 + \|(\bI - \bQ\bQ^T)\bb\|_2^2. \end{align} \] The first term is minimized (set to zero) when \(\bR\bx = \bQ^T\bb\), and the second term is independent of \(\bx\). Thus \(\hat{\bx} = \bR^{-1}\bQ^T\bb\) is the unique minimizer.
Performance: QR-based least squares costs \(O(mn^2)\). It is the default in np.linalg.lstsq.
Solve the line-fitting problem using QR. Compare coefficients with the normal equations result.
Regularization: If \(\bA\) is rank-deficient, \(\|\hat{\bx}\|\) can explode. Add a penalty term \(\lambda \|\bx\|_2^2\) (Ridge Regression). How do the normal equations change?
15.4 Polynomial Regression
For nodes \(t_1, ..., t_m\), the degree-\(n\) design matrix is: \[ \begin{align} V_{ij} = t_i^{j-1}. \end{align} \] Polynomial fitting \(p(t) = \sum c_j t^j\) is solved as \(\bV\bc \approx \mathbf{f}\).
Fit a degree-2 polynomial to 6 data points. Verify the residual and \(R^2\) score.
Numerical Stability: Construct a Vandermonde matrix with \(n=10\). Compare the errors of
lstsq(A, b)vssolve(A.T @ A, A.T @ b).