23  Conjugate Gradient Method

The Conjugate Gradient (CG) method is a standard iterative solver for large, sparse, symmetric positive definite (SPD) linear systems. It belongs to the class of Krylov subspace methods, which seek an approximate solution within a space spanned by powers of the matrix applied to the initial residual.

23.1 Krylov Subspaces

NoteDefinition: Krylov Subspace

The \(k\)-th Krylov subspace \(\mathcal{K}_k(\bA, \bb)\) is defined as the span of the first \(k\) vectors in the sequence \(\{\bb, \bA\bb, \bA^2\bb, ... \}\): \[ \begin{align} \mathcal{K}_k(\bA, \bb) = \text{span}\{ \bb, \bA\bb, \bA^2\bb, ..., \bA^{k-1}\bb \}. \end{align} \] Krylov methods are efficient because they only require the ability to compute matrix-vector products \(\bA\bv\), making them ideal for sparse matrices where \(\bA\) is never explicitly stored.

23.2 CG as a Projection Method

Tip

Error Minimization Properties: For an SPD system \(\bA\bx = \bb\), the CG method identifies the unique vector \(\bx_k \in \mathcal{K}_k(\bA, \bb)\) that minimizes the \(\bA\)-norm of the error: \[ \begin{align} \bx_k = \arg\min_{\by \in \mathcal{K}_k} \|\bx^* - \by\|_\bA, \end{align} \] where \(\bx^*\) is the exact solution and \(\|\be\|_\bA = \sqrt{\be^T\bA\be}\). This is equivalent to minimizing the quadratic energy functional \(\phi(\by) = \frac{1}{2} \by^T \bA \by - \by^T \bb\).

23.3 Conjugate Directions and A-Orthogonality

Tip

The Basis of Directions: To find this minimum efficiently, CG constructs a basis of search directions \(\{\bp_0, \bp_1, ..., \bp_{k-1}\}\) that are \(\bA\)-orthogonal (or conjugate), meaning \(\bp_i^T \bA \bp_j = 0\) for \(i \neq j\).

  • Global Optimality: Because directions are \(\bA\)-orthogonal, the \(k\)-th update \(\bx_k = \bx_{k-1} + \alpha_{k-1}\bp_{k-1}\) is guaranteed to be the best possible update in the entire subspace \(\mathcal{K}_k\). There is no need to revisit previous directions.

  • Short Recurrence: Due to the symmetry of \(\bA\), the new conjugate direction \(\bp_k\) can be computed using only the current residual and the single previous direction. This leads to the \(O(n)\) storage requirement.

23.4 The CG Algorithm

NoteDefinition: Conjugate Gradient Algorithm

Initialize: \(\bx^{(0)} = \bzero, \br^{(0)} = \bb, \bp^{(0)} = \br^{(0)}\).\ Iteration:

  1. (Step length)

  2. (Update iterate)

  3. (Update residual)

  4. (Gram-Schmidt weight)

  5. (Next direction)

Tip

Efficiency:

  • Matvecs: Only 1 matrix-vector product per iteration.

  • Memory: \(O(n)\) storage (requires only a few vectors). No need to store the basis.

23.5 Convergence Rate

NoteTheorem: Convergence Bound

\[ \begin{align} \frac{\|\be^{(k)}\|_\bA}{\|\be^{(0)}\|_\bA} \leq 2 \left( \frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1} \right)^k. \end{align} \]

Tip

While stationary methods depend on \((\kappa-1)/(\kappa+1)\), CG depends on the square root of the condition number. For \(\kappa=100\), CG reduces error \(5\times\) faster per step.

Tip

Superlinear Convergence: If eigenvalues are clustered, CG converges much faster than the worst-case bound suggests (e.g., \(k \ll n\) for clustered spectra).

23.6 Preconditioning

The convergence of CG depends on the condition number \(\kappa(\bA)\). Preconditioning transforms the system into \(\bM^{-1}\bA\bx = \bM^{-1}\bb\), where \(\bM\) is a symmetric positive definite matrix that approximates \(\bA\) but is much easier to invert.

Tip

Spectral Transformation: The goal of the preconditioner is to cluster the eigenvalues of \(\bM^{-1}\bA\) near 1. A perfectly preconditioned system (\(\bM = \bA\)) has all eigenvalues equal to 1 and converges in a single step. In practice, a good preconditioner reduces the effective condition number, significantly accelerating convergence.

NoteDefinition: Common Preconditioners

  1. Jacobi: \(\bM = \text{diag}(\bA)\). (Cheap, moderate speedup).

  2. Incomplete Cholesky (IC): \(\bM \approx \bL\bL^T\) with sparse factors. (Robust).

  3. Multigrid: Near \(O(n)\) total complexity for PDE systems.

WarningExercise
  1. Convergence Check: Solve a \(100 \times 100\) Poisson system via CG. Compare your error plot to the theoretical bound.

  2. Spectral Clustering: Construct \(\bA\) with eigenvalues \(\{1, ..., 1, 100, 200\}\). Observe superlinear convergence in the first few steps.

  3. Preconditioning: Compare iteration counts for a sparse system with and without an IC preconditioner.