10 LU Factorization

Most direct methods for solving linear systems rely on matrix factorizations that reduce a complex system into multiple simpler ones. The LU factorization is the most fundamental of these, decomposing a square matrix \(\bA\) into a unit lower triangular matrix \(\bL\) and an upper triangular matrix \(\bU\). This process is essentially the matrix form of Gaussian elimination.

Gaussian elimination transforms \(\bA\bx=\bb\) into an equivalent upper triangular system. LU factorization stores the elimination steps, so the same work can be reused for many right-hand sides.

10.1 LU and Gaussian Elimination

Triangular systems are easy because the unknowns can be recovered one at a time. If \(\bU\) is upper triangular, the last equation contains only \(x_n\), the next contains only \(x_{n-1}\) and \(x_n\), and so on. This is backward substitution. Lower triangular systems are solved similarly by forward substitution.

Definition: LU Factorization

\(\bA = \bL\bU\): \[ \begin{align} \bL = \begin{pmatrix} 1 & 0 & ... \\ \ell_{21} & 1 & ... \\ \vdots & \vdots & \ddots \end{pmatrix}, \quad \bU = \begin{pmatrix} u_{11} & u_{12} & ... \\ 0 & u_{22} & ... \\ \vdots & \vdots & \ddots \end{pmatrix}. \end{align} \]

The matrix \(\bU\) contains the row-echelon form produced by elimination. The matrix \(\bL\) stores the multipliers used to eliminate entries below each pivot. If row \(i\) is updated by subtracting \(\ell_{ik}\) times row \(k\), then \(\ell_{ik}\) appears in \(\bL\).

Example

(One elimination) Let \[ \begin{align} \bA = \begin{pmatrix} 2 & 1 \\ 6 & 5 \end{pmatrix}. \end{align} \] To eliminate the entry \(6\) below the pivot \(2\), subtract \(3\) times row 1 from row 2: \[ \begin{align} \begin{pmatrix} 2 & 1 \\ 6 & 5 \end{pmatrix} \longrightarrow \begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix}. \end{align} \] The multiplier is \(\ell_{21}=3\), so \[ \begin{align} \bL = \begin{pmatrix} 1 & 0 \\ 3 & 1 \end{pmatrix}, \qquad \bU = \begin{pmatrix} 2 & 1 \\ 0 & 2 \end{pmatrix}, \qquad \bA=\bL\bU. \end{align} \]

Exercise

Compare free parameters: how many in \(\bL\) vs. \(\bU\)? Do they sum to \(n^2\)?
If \(\bA = \bL\bU\) is invertible, prove \(\bL\) and \(\bU\) are also invertible.

Remark

(Existence and Uniqueness) \(\bA\) admits a unique LU factorization iff all leading principal submatrices \(\bA_k\) are nonsingular (\(k=1, ..., n-1\)). The algorithm fails if a zero pivot is encountered.

Remark

(Elimination without pivoting is fragile) A zero pivot causes the algebraic algorithm to stop. A tiny pivot may be worse numerically: dividing by a tiny number creates huge multipliers, which amplify rounding errors in later rows.

Remark

(Flop Cost) Computing \(\bL\) and \(\bU\) via Gaussian elimination costs .

Remark

(The Solve Pattern) To solve \(\bA\bx = \bb\) given \(\bA = \bL\bU\):

Solve \(\bL\by = \bb\) via forward substitution (\(O(n^2)\)).
Solve \(\bU\bx = \by\) via backward substitution (\(O(n^2)\)).

Advantage: Once \(\bA\) is factored, each new \(\bb\) costs only \(O(n^2)\).

Example

(Solving with the factors) For the previous example, solve \(\bA\bx=\bb\) with \(\bb=(3,11)^T\). First solve \(\bL\by=\bb\): \[ \begin{align} y_1=3, \qquad 3y_1+y_2=11 \Rightarrow y_2=2. \end{align} \] Then solve \(\bU\bx=\by\): \[ \begin{align} 2x_2=2 \Rightarrow x_2=1, \qquad 2x_1+x_2=3 \Rightarrow x_1=1. \end{align} \] The solution is \(\bx=(1,1)^T\).

Exercise

Factor \(\bA = \begin{pmatrix} 2 & 3 & 1 \\ 4 & 7 & 3 \\ -2 & -3 & 1 \end{pmatrix}\) by hand.
Solve \(\bA\bx = (5, 15, 4)^T\) using your factors.
SciPy: Use scipy.linalg.lu and explain why it always returns a permutation \(\bP\).

10.2 Pivoting and Stability

While mathematically elegant, basic LU factorization is numerically unstable if any pivot (the diagonal element used to eliminate entries below it) is small relative to the entries it is eliminating. This can lead to catastrophic rounding errors. Partial Pivoting mitigates this by ensuring that at each step, we swap rows to place the largest available entry in the pivot position.

Example

(Small pivot) Consider \[ \begin{align} \bA = \begin{pmatrix} 10^{-20} & 1 \\ 1 & 1 \end{pmatrix}. \end{align} \] Without pivoting, the first multiplier is \(10^{20}\). The elimination step creates huge intermediate entries even though the original matrix has modest entries. Partial pivoting swaps the two rows first, using the pivot \(1\) instead of \(10^{-20}\).

Definition: LUP Decomposition

The result of Gaussian elimination with partial pivoting is recorded as \(\mathbf{P}\bA = \bL\bU\), where \(\bP\) is a permutation matrix that tracks the row swaps.

Remark

(Permutation matrices) Multiplying by \(\bP\) reorders rows. Permutation matrices are orthogonal: \(\bP^{-1}=\bP^T\). Applying a permutation is cheap because no arithmetic is required; the code only reorders indices.

Remark

(Stability and Growth Factor) The numerical stability of LUP is characterized by the growth factor \(\rho = \max|u_{ij}| / \max|a_{ij}|\). If \(\rho\) is large, the solution may be inaccurate. Partial pivoting keeps \(\rho\) small in practice (typically \(O(n)\)), ensuring the algorithm is backward stable. While the theoretical worst-case growth is \(2^{n-1}\), such matrices are rarely encountered in engineering applications.

Exercise

Give a \(2 \times 2\) matrix where LU fails (due to a zero or small pivot) but LUP succeeds.
Why is \(\bP^{-1}\) trivial to compute for any permutation matrix? (Hint: consider \(\bP^T\)).
Complete pivoting (swapping both rows and columns) provides even stronger stability guarantees but is rarely used. Why? (Consider the trade-off between search cost and marginal stability gain).

10.3 Algorithm Cost Analysis

Understanding the computational cost is vital for choosing the right solver for large-scale engineering problems. The bulk of the work is in the factorization step.

There are two separate costs:

Factorization cost: Build \(\bL\) and \(\bU\) once. This costs \(O(n^3)\).
Solve cost: Use the factors for a specific \(\bb\). This costs \(O(n^2)\).

This distinction is the reason factorizations are valuable.

Remark

(Total Cost for \(k\) Right-Hand Sides) For a system with \(n\) unknowns and \(k\) different right-hand side vectors \(\bb\), the total cost is: \[ \begin{align} \text{Cost} = \frac{2}{3}n^3 + 2kn^2 \text{ flops}. \end{align} \] When \(k\) is large, the \(O(n^3)\) cost of factorization is amortized over many \(O(n^2)\) solves. For example, solving for 500 \(\bb\)’s for \(n=1000\) using one factorization is significantly faster than re-factorizing for each new \(\bb\).

Proof

Forward substitution solves \(\bL\by = \bb\) where \(\ell_{ii} = 1\). The \(i\)-th entry \(y_i\) is given by: \[ \begin{align} y_i = b_i - \sum_{j=1}^{i-1} \ell_{ij} y_j. \end{align} \] Counting the floating-point operations:

For a fixed \(i\), the summation requires \(i-1\) multiplications and \(i-1\) additions/subtractions.
Total flops: \(\sum_{i=1}^n 2(i-1) = 2 \sum_{k=0}^{n-1} k = 2 \frac{(n-1)n}{2} = n^2 - n\).

For large \(n\), each triangular solve costs approximately \(n^2\) flops.

Remark

(Implementation rule) In NumPy/SciPy, use solve for one right-hand side or lu\_factor/lu\_solve when reusing a factorization. Do not form \(\bA^{-1}\) to solve a linear system.

Exercise

Sum the operations for forward substitution and show the cost is exactly \(n^2 - n\).
Use scipy.linalg.lu\_factor and lu\_solve for a system with 3 right-hand sides. Verify residuals.