The Conjugate Gradient (CG) method is a standard iterative solver for large, sparse, symmetric positive definite (SPD) linear systems. It belongs to the class of Krylov subspace methods, which seek an approximate solution within a space spanned by powers of the matrix applied to the initial residual.
Krylov Subspaces
The \(k\)-th Krylov subspace \(\mathcal{K}_k(\bA, \bb)\) is defined as the span of the first \(k\) vectors in the sequence \(\{\bb, \bA\bb, \bA^2\bb, ... \}\): \[
\begin{align}
\mathcal{K}_k(\bA, \bb) = \text{span}\{ \bb, \bA\bb, \bA^2\bb, ..., \bA^{k-1}\bb \}.
\end{align}
\] Krylov methods are efficient because they only require the ability to compute matrix-vector products \(\bA\bv\), making them ideal for sparse matrices where \(\bA\) is never explicitly stored.
CG as a Projection Method
Error Minimization Properties: For an SPD system \(\bA\bx = \bb\), the CG method identifies the unique vector \(\bx_k \in \mathcal{K}_k(\bA, \bb)\) that minimizes the \(\bA\)-norm of the error: \[
\begin{align}
\bx_k = \arg\min_{\by \in \mathcal{K}_k} \|\bx^* - \by\|_\bA,
\end{align}
\] where \(\bx^*\) is the exact solution and \(\|\be\|_\bA = \sqrt{\be^T\bA\be}\). This is equivalent to minimizing the quadratic energy functional \(\phi(\by) = \frac{1}{2} \by^T \bA \by - \by^T \bb\).
Conjugate Directions and A-Orthogonality
The Basis of Directions: To find this minimum efficiently, CG constructs a basis of search directions \(\{\bp_0, \bp_1, ..., \bp_{k-1}\}\) that are \(\bA\)-orthogonal (or conjugate), meaning \(\bp_i^T \bA \bp_j = 0\) for \(i \neq j\).
Global Optimality: Because directions are \(\bA\)-orthogonal, the \(k\)-th update \(\bx_k = \bx_{k-1} + \alpha_{k-1}\bp_{k-1}\) is guaranteed to be the best possible update in the entire subspace \(\mathcal{K}_k\). There is no need to revisit previous directions.
Short Recurrence: Due to the symmetry of \(\bA\), the new conjugate direction \(\bp_k\) can be computed using only the current residual and the single previous direction. This leads to the \(O(n)\) storage requirement.
The CG Algorithm
Initialize: \(\bx^{(0)} = \bzero, \br^{(0)} = \bb, \bp^{(0)} = \br^{(0)}\).\ Iteration:
(Step length)
(Update iterate)
(Update residual)
(Gram-Schmidt weight)
(Next direction)
Convergence Rate
\[
\begin{align}
\frac{\|\be^{(k)}\|_\bA}{\|\be^{(0)}\|_\bA} \leq 2 \left( \frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1} \right)^k.
\end{align}
\]
While stationary methods depend on \((\kappa-1)/(\kappa+1)\), CG depends on the square root of the condition number. For \(\kappa=100\), CG reduces error \(5\times\) faster per step.
Superlinear Convergence: If eigenvalues are clustered, CG converges much faster than the worst-case bound suggests (e.g., \(k \ll n\) for clustered spectra).
Preconditioning
The convergence of CG depends on the condition number \(\kappa(\bA)\). Preconditioning transforms the system into \(\bM^{-1}\bA\bx = \bM^{-1}\bb\), where \(\bM\) is a symmetric positive definite matrix that approximates \(\bA\) but is much easier to invert.
Spectral Transformation: The goal of the preconditioner is to cluster the eigenvalues of \(\bM^{-1}\bA\) near 1. A perfectly preconditioned system (\(\bM = \bA\)) has all eigenvalues equal to 1 and converges in a single step. In practice, a good preconditioner reduces the effective condition number, significantly accelerating convergence.
…
Jacobi: \(\bM = \text{diag}(\bA)\). (Cheap, moderate speedup).
Incomplete Cholesky (IC): \(\bM \approx \bL\bL^T\) with sparse factors. (Robust).
Multigrid: Near \(O(n)\) total complexity for PDE systems.
Convergence Check: Solve a \(100 \times 100\) Poisson system via CG. Compare your error plot to the theoretical bound.
Spectral Clustering: Construct \(\bA\) with eigenvalues \(\{1, ..., 1, 100, 200\}\). Observe superlinear convergence in the first few steps.
Preconditioning: Compare iteration counts for a sparse system with and without an IC preconditioner.