Overview

A matrix factorization decomposes A into a product of simpler matrices whose structure makes solving linear systems, least-squares problems, and determinants straightforward. This week we implement two fundamental ones:

Factorization	Form	What it solves
LU (Gaussian elimination)	PA = LU	Square systems Ax = b
QR (Householder)	A = QR	Least-squares min ‖Ax−b‖

Both are O(n³) in FLOPs and are the workhorses for everything built later: eigenvalue solvers, Newton's method, ODE implicit steps, and FEM assembly.

1. Why Factorizations?

Solving Ax = b by computing A⁻¹ and then x = A⁻¹b costs two O(n³) operations. Factorization costs one O(n³) pass; each subsequent solve (forward + backward substitution) costs only O(n²).

When you need to solve Ax = bᵢ for many right-hand sides b₁, b₂, … (e.g. at each time step of an ODE, or for computing the inverse column by column), the factorization is done once and amortised over all solves.

Naive (A⁻¹ per solve): k solves → k × O(n³) = O(k n³)

Factorization then solve: O(n³) + k × O(n²)

For k = n, factorization is n× cheaper.

2. LU Factorization with Partial Pivoting

The idea: Gaussian elimination with bookkeeping

Gaussian elimination zeros below-diagonal entries column by column. At step k it subtracts multiples of row k from rows below it. The multipliers l_{ik} = A(i,k)/A(k,k) are exactly the entries of the lower triangular factor L. After n−1 steps the matrix is upper triangular U.

\[PA = LU\]

where:

P is a permutation matrix (row swaps)
L is unit lower triangular (diagonal = 1, |L(i,j)| ≤ 1 with pivoting)
U is upper triangular

Why partial pivoting?

Without pivoting, if A(k,k) = 0 the algorithm divides by zero. If A(k,k) is small, the multipliers l_{ik} blow up and amplify round-off errors.

Partial pivoting: at step k, swap row k with the row that has the largest |A(i,k)| for i ≥ k. This guarantees |L(i,j)| ≤ 1 for all i > j, bounding the growth factor — the ratio of largest element after elimination to before — to at most 2^{n−1}. In practice, growth is typically O(n^{2/3}).

Storage: packed L and U

L and U are stored in a single n×n matrix: U occupies the diagonal and above, L occupies the strict lower triangle (the diagonal of L is 1 implicitly).

A = [2  1  0]        After LU:   LU = [4   3   2 ]   piv = [1, 2, 2]
    [4  3  2]                         [0.5 0.5 -1]
    [8  7  9]                         [2   1   3 ]

Solving with LU (<tt>src/factorization/lu.cpp:54–83</tt>)

Given PA = LU and the system Ax = b:

Apply permutation: y = Pb (swap rows of b using piv)
Forward substitution: solve Lz = y (L has 1s on diagonal — no division)
Backward substitution: solve Ux = z

// Forward substitution: L z = y (L's diagonal is 1, implicit)
for (idx i = 1; i < n; ++i)
    for (idx j = 0; j < i; ++j)
        y[i] -= M(i, j) * y[j];
 
// Backward substitution: U x = z
for (idx i = n; i-- > 0; ) {
    for (idx j = i + 1; j < n; ++j)
        y[i] -= M(i, j) * y[j];
    y[i] /= M(i, i);
}

Each step is O(n²); the factorisation itself is O(n³).

Determinant and inverse

det(A) = det(P)⁻¹ · det(L) · det(U) = (−1)^{swaps} · 1 · ∏ U[i,i]

The inverse is obtained by solving AX = I column by column — n solves of cost O(n²) each, total O(n³). Only compute the inverse when you genuinely need it; for specific right-hand sides, lu_solve is cheaper.

3. QR Factorization via Householder Reflections

The idea

QR factorises A as A = QR where Q is orthogonal (Q^T Q = I) and R is upper triangular. For the least-squares problem min‖Ax − b‖:

\[\|Ax - b\|^2 = \|QRx - b\|^2 = \|Rx - Q^Tb\|^2\]

since Q is isometric (preserves norms). This splits into a triangular solve plus a fixed residual.

Why Householder instead of Gram-Schmidt?

Classical Gram-Schmidt (CGS) orthogonalises columns sequentially. It is forward-stable but not backward-stable: round-off errors in early columns corrupt later ones. For ill-conditioned matrices CGS can produce nearly non-orthogonal Q.

A Householder reflector H = I − 2vv^T is orthogonal by construction and acts on the entire working matrix at once. Householder QR is backward-stable: the computed factorisation satisfies (A + E) = Q R with ‖E‖/‖A‖ = O(u), where u is machine epsilon.

The Householder reflector

For a vector x, choose v so that Hx = −sign(x₀)‖x‖ e₁:

\[v = x + \text{sign}(x_0)\|x\| e_1, \qquad \hat{v} = v/\|v\|\]

The sign choice is critical: if x₀ > 0 and we used v₀ = x₀ − ‖x‖, then v₀ → 0 as x₀ → ‖x‖ (catastrophic cancellation). Taking the same sign as x₀ avoids this.

// Sign trick: add, don't subtract

v[0] += (x[0] >= real(0)) ? norm_x : -norm_x;

Algorithm (<tt>src/factorization/qr.cpp:65–103</tt>)

for k = 0 .. min(m-1, n) - 1:
    x  ← R[k:m, k]                        (column k from row k down)
    v  ← x + sign(x[0]) * ||x|| * e_1    (Householder vector)
    v  ← v / ||v||                         (normalise)
    for j = k .. n-1:                      (apply H_k to each column)
        R[k:m, j] -= 2*v*(v^T * R[k:m, j])
    save v for Q accumulation

After r = min(m−1, n) steps, R is upper triangular.

Building Q explicitly

Q = H₀ · H₁ · … · H_{r−1}. Build by accumulating in reverse:

Q = I
for k = r-1 downto 0:
    Q[k:m, k:m] = H_k * Q[k:m, k:m]

Why reverse? At step k (going backwards), Q[:,j] = eⱼ for j < k — those columns have not been touched yet. So the H_k update only needs j ≥ k.

Least-squares solve (<tt>src/factorization/qr.cpp:119–136</tt>)

// y = Q^T b  (Q^T[i,j] = Q[j,i])
for (idx i = 0; i < m; ++i)
    for (idx j = 0; j < m; ++j)
        y[i] += f.Q(j, i) * b[j];
 
// Back-substitute: R[:n,:n] x = y[:n]
for (idx i = n; i-- > 0; ) {
    xv[i] = y[i];
    for (idx j = i + 1; j < n; ++j) xv[i] -= f.R(i, j) * xv[j];
    xv[i] /= f.R(i, i);
}

The residual ‖(Q^T b)[n:]‖ is available for free — no need for an extra matrix-vector product.

Sign of R's diagonal

Householder QR does not guarantee positive diagonal in R. Each reflector has determinant −1, so some diagonal entries of R may be negative. This is mathematically correct — both Q and R differ from the "canonical" form by sign flips. Q*R = A holds exactly regardless.

If positive-diagonal R is required (e.g., for uniqueness of the factorisation when A has full column rank), flip the sign of each negative R[k,k] and the corresponding column of Q. We omit this normalisation here.

4. API Reference

Declarations (<tt>include/factorization/</tt>)

// ── LU ───────────────────────────────────────────────────────────────────────
struct LUResult {
    Matrix           LU;        // packed L (below diag) + U (diag + above)
    std::vector<idx> piv;       // piv[k] = row swapped to position k
    bool             singular;
};
 
LUResult lu(const Matrix& A);
void     lu_solve(const LUResult&, const Vector& b, Vector& x);
void     lu_solve(const LUResult&, const Matrix& B, Matrix& X);  // multi-RHS
real     lu_det  (const LUResult&);
Matrix   lu_inv  (const LUResult&);
 
// ── QR ───────────────────────────────────────────────────────────────────────
struct QRResult {
    Matrix Q;   // m×m orthogonal
    Matrix R;   // m×n upper triangular
};
 
QRResult qr      (const Matrix& A);
void     qr_solve(const QRResult&, const Vector& b, Vector& x);

Usage example

#include "numerics.hpp"
using namespace num;
 
// LU: solve Ax = b
Matrix A = ...;
Vector b = ...;
auto  f  = lu(A);
Vector x(n);
lu_solve(f, b, x);
 
// Solve again with a different b (only O(n²) this time)
Vector x2(n);
lu_solve(f, b2, x2);
 
// Determinant
real d = lu_det(f);
 
// QR: least-squares fit
Matrix A_ls = ...;   // overdetermined m×n
Vector b_ls = ...;
auto   fq   = qr(A_ls);
Vector x_ls(n);
qr_solve(fq, b_ls, x_ls);

5. What These Unlock

Future module	Uses
Eigenvalue solvers	QR iteration (apply QR repeatedly to converge to Schur form)
Newton's method (optimisation)	LU solve at each step: `H · Δx = −g`
ODE implicit solvers	LU for local Jacobian solve at each time step
FEM	LU/CG for assembled stiffness matrix
Linear regression	QR least-squares
Pseudo-inverse / SVD	QR as intermediate step

6. Complexity and Numerical Properties

Operation	FLOPs	Stability
LU factorisation	2n³/3	Backward-stable with partial pivoting
Forward sub (Ly=b)	n²	Exact (no cancellation with ‖L‖≤1)
Backward sub (Ux=y)	n²	Backward-stable
QR factorisation	2mn² − 2n³/3 (m≥n)	Backward-stable
Q^T b multiply	2mn	Backward-stable
Backward sub for QR	n²	Backward-stable

LU growth factor worst case: 2^{n−1}. In practice for random matrices: O(n^{1/2}). For diagonally dominant or SPD matrices, growth is bounded by a small constant.

7. Key Takeaways

Factorise once, solve many times. The O(n³) cost is paid upfront; each subsequent right-hand side costs only O(n²).
Partial pivoting makes LU stable. Without it, tiny pivots amplify round-off to catastrophic levels. |L(i,j)| ≤ 1 bounds the growth factor.
Householder is more stable than Gram-Schmidt. It is backward-stable regardless of condition number; CGS loses orthogonality for ill-conditioned matrices.
QR naturally gives least squares. No need to form the normal equations A^T A x = A^T b (which squares the condition number); QR solves in the original norm.
The sign of R's diagonal is not uniquely determined. Q*R = A always holds; normalisation to positive diagonal is optional.

Exercises

Solve a 4×4 system by hand using Gaussian elimination with partial pivoting. Verify that your L satisfies |L(i,j)| ≤ 1 everywhere.
Compare lu_solve and cg on a 100×100 SPD system. Which is faster? When would you prefer CG over LU for SPD systems?
Use lu_inv to compute A⁻¹ for a 3×3 matrix and verify A · A⁻¹ = I. Why is computing the inverse wasteful if you only need one solve?
Implement lu_log_det: compute log|det A| without overflow by summing log|U[i,i]| instead of multiplying. When is this needed?
*(Advanced)* Implement Modified Gram-Schmidt (MGS) QR and compare its orthogonality error ‖Q^T Q − I‖ against Householder QR for an ill-conditioned Vandermonde matrix.

References

Trefethen, L. & Bau, D. (1997). Numerical Linear Algebra. SIAM. — Lectures 20–23 (Householder), 21 (stability analysis). The standard reference.
Golub, G. & Van Loan, C. (2013). Matrix Computations, 4th ed. Johns Hopkins. — §3.2 (LU), §5.1 (Householder). Comprehensive.
Higham, N. (2002). Accuracy and Stability of Numerical Algorithms, 2nd ed. SIAM. — §9 (Gaussian elimination stability), §19 (orthogonal factorizations).