A matrix factorization rewrites \(A\) as a product of simpler matrices whose structure reduces solving, least-squares, and determinant problems to cheap triangular operations.

Why Factorizations?

Computing \(A^{-1}\) and then \(x = A^{-1}b\) costs two \(O(n^3)\) operations. Factorization costs one \(O(n^3)\) pass up front; each subsequent solve (forward + backward substitution) costs only \(O(n^2)\).

When the same coefficient matrix appears with many right-hand sides—time steps of an ODE, Newton iterations, column-by-column inversion—the factorization is amortised:

\[ \underbrace{O(n^3)}_{\text{factor once}} + k \cdot \underbrace{O(n^2)}_{\text{solve}} \quad \ll \quad k \cdot O(n^3) \]

For \(k = n\) solves the factorisation approach is \(n\) times cheaper.

LU Factorization with Partial Pivoting

The PA = LU Form

Every non-singular \(A \in \mathbb{R}^{n \times n}\) admits

\[ PA = LU \]

where:

P is a permutation matrix (records row swaps),
L is unit lower triangular ( \(L_{ii} = 1\), \(|L_{ij}| \leq 1\) with pivoting),
U is upper triangular.

\(L\) and \(U\) are stored packed in a single \(n \times n\) working matrix: \(U\) occupies the diagonal and above; \(L\) occupies the strict lower triangle with the implicit unit diagonal.

Doolittle Algorithm

Gaussian elimination proceeds column by column. At step \(k\), the multipliers

\[ l_{ik} = \frac{a_{ik}^{(k)}}{a_{kk}^{(k)}}, \quad i = k+1, \ldots, n-1 \]

are stored in the lower triangle, and the trailing submatrix receives a rank-1 (Schur complement) update:

\[ a_{ij}^{(k+1)} = a_{ij}^{(k)} - l_{ik}\, a_{kj}^{(k)}, \quad i,j > k \]

After \(n-1\) steps the working matrix holds \(L\) (below diagonal) and \(U\) (diagonal and above).

Why Partial Pivoting?

Without pivoting, a tiny pivot \(a_{kk}^{(k)}\) inflates the multipliers catastrophically. Example: for

\[ A = \begin{pmatrix} \varepsilon & 1 \\ 1 & 1 \end{pmatrix}, \quad \varepsilon = 10^{-16} \]

without pivoting \(l_{21} \approx 10^{16}\) and \(U_{22} \approx -10^{16}\). Swapping rows first gives \(l_{21} = \varepsilon\) and \(U_{22} \approx 1\).

Partial pivoting — swapping row \(k\) with the row achieving \(\max_{i \geq k}|a_{ik}^{(k)}|\) — guarantees \(|L_{ij}| \leq 1\) for all \(i > j\), which bounds the growth factor:

\[ \|LU\|_\infty \leq 2^{n-1} \|A\|_\infty \]

The \(2^{n-1}\) bound is the Wilkinson worst case; random matrices grow as \(O(n^{1/2})\) in practice, and diagonally dominant or SPD matrices have a growth factor bounded by a small constant.

Solving with the Factorization

Given \(PA = LU\) and a right-hand side \(b\), solve \(Ax = b\) in three steps:

Apply permutation: \(y \leftarrow Pb\) (row swaps, no arithmetic).
Forward substitution \(Ly = Pb\) (unit diagonal, \(n^2\) flops):
\[ y_i = y_i - \sum_{j=0}^{i-1} L_{ij}\, y_j, \quad i = 1, \ldots, n-1 \]
Backward substitution \(Ux = y\) ( \(n^2\) flops):
\[ x_i = \frac{1}{U_{ii}}\!\left(y_i - \sum_{j=i+1}^{n-1} U_{ij}\, x_j\right), \quad i = n-1, \ldots, 0 \]

Each triangular solve costs \(n^2\) flops; the factorization itself costs \(\frac{2}{3}n^3\) flops.

Determinant and Inverse

\[ \det(A) = (-1)^{\text{swaps}} \prod_{i=0}^{n-1} U_{ii} \]

where "swaps" counts non-trivial pivots ( \(\text{piv}[k] \neq k\)).

The inverse is computed by solving \(AX = I\) column by column— \(n\) calls to the triangular solve, each \(O(n^2)\), total \(O(n^3)\). Never compute \(A^{-1}\) explicitly when only \(A^{-1}b\) is needed; solve \(Ax = b\) directly.

Complexity and Stability

Operation	FLOPs	Notes
LU factorization	\(\frac{2}{3}n^3\)	Dominant one-time cost
Forward substitution	\(n^2\)	Per right-hand side
Backward substitution	\(n^2\)	Per right-hand side

LU with partial pivoting is backward stable: the computed solution \(\hat{x}\) satisfies

\[ (A + \delta A)\,\hat{x} = b, \qquad \frac{\|\delta A\|}{\|A\|} \leq \varepsilon_{\mathrm{mach}}\,\rho(n)\,\kappa(A) \]

where \(\rho(n) \leq 2^{n-1}\) is the growth factor and \(\kappa(A) = \|A\|\,\|A^{-1}\|\) is the condition number.

API

num::lu — compute the \(PA = LU\) factorization
num::lu_solve — forward/backward substitution given a factored system
num::lu_det — determinant from the packed factorization
num::lu_inv — matrix inverse via \(n\) triangular solves

QR Factorization (Householder)

The A = QR Form

Every \(A \in \mathbb{R}^{m \times n}\) with \(m \geq n\) admits

\[ A = QR \]

where \(Q \in \mathbb{R}^{m \times m}\) is orthogonal ( \(Q^TQ = I\)) and \(R \in \mathbb{R}^{m \times n}\) is upper triangular. The economy (thin) form uses \(\hat{Q} \in \mathbb{R}^{m \times n}\) and \(\hat{R} \in \mathbb{R}^{n \times n}\), discarding the \(m - n\) trailing columns of \(Q\) that contribute only to the residual.

Householder Reflectors

A Householder reflector is the rank-1 symmetric orthogonal matrix

\[ H = I - 2\hat{v}\hat{v}^T, \qquad H^T = H, \quad H^2 = I, \quad \det(H) = -1 \]

For a vector \(x \in \mathbb{R}^m\), choose \(\hat{v}\) so that \(Hx\) is a scalar multiple of \(e_1\):

\[ Hx = -\operatorname{sign}(x_0)\|x\|_2\, e_1 \]

Construction: form the unnormalized vector

\[ v = x + \operatorname{sign}(x_0)\|x\|_2\, e_1, \qquad \hat{v} = v / \|v\| \]

Sign convention: choose the sign matching \(x_0\), so that \(v_0 = x_0 + \operatorname{sign}(x_0)\|x\|\) is a sum of same-sign terms. The alternative \(v_0 = x_0 - \|x\|\) suffers catastrophic cancellation when \(x_0 \approx \|x\|\). Concisely:

\[ v_0 \mathrel{+}= \operatorname{sign}(x_0)\|x\| \]

Algorithm

Apply \(r = \min(m-1, n)\) reflections. Step \(k\) zeros all entries of column \(k\) strictly below the diagonal by applying \(H_k = I - 2\hat{v}_k\hat{v}_k^T\) to the trailing submatrix \(R[k:m,\, k:n]\):

\[ R[k:m,\, j] \;\leftarrow\; R[k:m,\, j] - 2\hat{v}_k\!\left(\hat{v}_k^T R[k:m,\, j]\right), \quad j = k, \ldots, n-1 \]

After \(r\) steps, \(R\) is upper triangular. The full \(Q\) is recovered by accumulating reflectors in reverse order starting from \(Q = I_m\):

\[ Q \;\leftarrow\; H_0 H_1 \cdots H_{r-1} \]

(built as \(Q \leftarrow H_{r-1}, \ldots, H_0\) applied right-to-left).

If only \(Q^Tb\) is required for a least-squares solve, apply each \(H_k\) directly to \(b\) in \(O(mn)\) total — never form \(Q\) explicitly.

Why Householder Beats Gram-Schmidt

Classical Gram-Schmidt orthogonalizes columns sequentially and is not backward stable: round-off in early columns corrupts later ones, and for ill-conditioned matrices the computed \(Q\) can be far from orthogonal.

Householder is backward stable regardless of \(\kappa(A)\):

\[ (A + \delta A) = \hat{Q}\hat{R}, \qquad \frac{\|\delta A\|_F}{\|A\|_F} = O(\varepsilon_{\mathrm{mach}}) \]

The normal equations \(A^TAx = A^Tb\) satisfy \(\|\delta A\|/\|A\| = O(\varepsilon_{\mathrm{mach}}\,\kappa(A)^2)\); for \(\kappa(A) > 10^8\) in double precision they lose all significant digits while QR still gives correct residuals.

Least-Squares Solve

For the overdetermined system \(\min_x \|Ax - b\|_2\) with \(m > n\), use the isometry of \(Q\):

\[ \|Ax - b\|_2^2 = \|QRx - b\|_2^2 = \|Rx - Q^Tb\|_2^2 = \|\hat{R}x - \tilde{c}\|_2^2 + \|\bar{c}\|_2^2 \]

where \(c = Q^Tb\), \(\tilde{c} = c[0:n]\) (the matched component), and \(\bar{c} = c[n:m]\) (the unavoidable residual). The minimum is attained by solving the \(n \times n\) upper triangular system

\[ \hat{R}\,x = \tilde{c} \]

via backward substitution. The residual norm \(\|\bar{c}\|_2\) is available as a by-product at no extra cost.

Complexity

Operation	FLOPs	Notes
Factorization	\(2mn^2 - \frac{2}{3}n^3\)	Building \(R\) and storing reflectors
Building \(Q\) explicitly	\(4m^2n - 2mn^2 + \frac{2}{3}n^3\)	Avoid when possible
Applying \(Q^T\) to a vector	\(2mn\)	Sufficient for least-squares
Backward substitution	\(n^2\)	Per right-hand side

API

num::qr — compute the Householder QR factorization
num::qr_solve — least-squares solve \(\min\|Ax - b\|_2\) given a factored system