|
numerics
|
A matrix factorization decomposes A into a product of simpler matrices whose structure makes solving linear systems, least-squares problems, and determinants straightforward. This week we implement two fundamental ones:
| Factorization | Form | What it solves |
|---|---|---|
| LU (Gaussian elimination) | PA = LU | Square systems Ax = b |
| QR (Householder) | A = QR | Least-squares min ‖Ax−b‖ |
Both are O(n³) in FLOPs and are the workhorses for everything built later: eigenvalue solvers, Newton's method, ODE implicit steps, and FEM assembly.
Solving Ax = b by computing A⁻¹ and then x = A⁻¹b costs two O(n³) operations. Factorization costs one O(n³) pass; each subsequent solve (forward + backward substitution) costs only O(n²).
When you need to solve Ax = bᵢ for many right-hand sides b₁, b₂, … (e.g. at each time step of an ODE, or for computing the inverse column by column), the factorization is done once and amortised over all solves.
For k = n, factorization is n× cheaper.
Gaussian elimination zeros below-diagonal entries column by column. At step k it subtracts multiples of row k from rows below it. The multipliers l_{ik} = A(i,k)/A(k,k) are exactly the entries of the lower triangular factor L. After n−1 steps the matrix is upper triangular U.
\[PA = LU\]
where:
Without pivoting, if A(k,k) = 0 the algorithm divides by zero. If A(k,k) is small, the multipliers l_{ik} blow up and amplify round-off errors.
Partial pivoting: at step k, swap row k with the row that has the largest |A(i,k)| for i ≥ k. This guarantees |L(i,j)| ≤ 1 for all i > j, bounding the growth factor — the ratio of largest element after elimination to before — to at most 2^{n−1}. In practice, growth is typically O(n^{2/3}).
L and U are stored in a single n×n matrix: U occupies the diagonal and above, L occupies the strict lower triangle (the diagonal of L is 1 implicitly).
Given PA = LU and the system Ax = b:
Each step is O(n²); the factorisation itself is O(n³).
The inverse is obtained by solving AX = I column by column — n solves of cost O(n²) each, total O(n³). Only compute the inverse when you genuinely need it; for specific right-hand sides, lu_solve is cheaper.
QR factorises A as A = QR where Q is orthogonal (Q^T Q = I) and R is upper triangular. For the least-squares problem min‖Ax − b‖:
\[\|Ax - b\|^2 = \|QRx - b\|^2 = \|Rx - Q^Tb\|^2\]
since Q is isometric (preserves norms). This splits into a triangular solve plus a fixed residual.
Classical Gram-Schmidt (CGS) orthogonalises columns sequentially. It is forward-stable but not backward-stable: round-off errors in early columns corrupt later ones. For ill-conditioned matrices CGS can produce nearly non-orthogonal Q.
A Householder reflector H = I − 2vv^T is orthogonal by construction and acts on the entire working matrix at once. Householder QR is backward-stable: the computed factorisation satisfies (A + E) = Q R with ‖E‖/‖A‖ = O(u), where u is machine epsilon.
For a vector x, choose v so that Hx = −sign(x₀)‖x‖ e₁:
\[v = x + \text{sign}(x_0)\|x\| e_1, \qquad \hat{v} = v/\|v\|\]
The sign choice is critical: if x₀ > 0 and we used v₀ = x₀ − ‖x‖, then v₀ → 0 as x₀ → ‖x‖ (catastrophic cancellation). Taking the same sign as x₀ avoids this.
After r = min(m−1, n) steps, R is upper triangular.
Q = H₀ · H₁ · … · H_{r−1}. Build by accumulating in reverse:
Why reverse? At step k (going backwards), Q[:,j] = eⱼ for j < k — those columns have not been touched yet. So the H_k update only needs j ≥ k.
The residual ‖(Q^T b)[n:]‖ is available for free — no need for an extra matrix-vector product.
Householder QR does not guarantee positive diagonal in R. Each reflector has determinant −1, so some diagonal entries of R may be negative. This is mathematically correct — both Q and R differ from the "canonical" form by sign flips. Q*R = A holds exactly regardless.
If positive-diagonal R is required (e.g., for uniqueness of the factorisation when A has full column rank), flip the sign of each negative R[k,k] and the corresponding column of Q. We omit this normalisation here.
| Future module | Uses |
|---|---|
| Eigenvalue solvers | QR iteration (apply QR repeatedly to converge to Schur form) |
| Newton's method (optimisation) | LU solve at each step: H · Δx = −g |
| ODE implicit solvers | LU for local Jacobian solve at each time step |
| FEM | LU/CG for assembled stiffness matrix |
| Linear regression | QR least-squares |
| Pseudo-inverse / SVD | QR as intermediate step |
| Operation | FLOPs | Stability |
|---|---|---|
| LU factorisation | 2n³/3 | Backward-stable with partial pivoting |
| Forward sub (Ly=b) | n² | Exact (no cancellation with ‖L‖≤1) |
| Backward sub (Ux=y) | n² | Backward-stable |
| QR factorisation | 2mn² − 2n³/3 (m≥n) | Backward-stable |
| Q^T b multiply | 2mn | Backward-stable |
| Backward sub for QR | n² | Backward-stable |
LU growth factor worst case: 2^{n−1}. In practice for random matrices: O(n^{1/2}). For diagonally dominant or SPD matrices, growth is bounded by a small constant.
lu_solve and cg on a 100×100 SPD system. Which is faster? When would you prefer CG over LU for SPD systems?lu_inv to compute A⁻¹ for a 3×3 matrix and verify A · A⁻¹ = I. Why is computing the inverse wasteful if you only need one solve?lu_log_det: compute log|det A| without overflow by summing log|U[i,i]| instead of multiplying. When is this needed?