22  Jacobi Iteration

22.1 Jacobi Iteration

NoteDefinition: Splitting of \(\bA\)

To derive a stationary iteration from \(\bA\bx = \bb\), we decompose \(\bA\) as \[\bA = \bD + \bR, \qquad \bD = \mathrm{diag}(a_{11},...,a_{nn}), \quad \bR = \bA - \bD.\] Assuming all diagonal entries are nonzero (\(a_{ii} \neq 0\) for all \(i\)), we can rearrange \((\bD + \bR)\bx = \bb\) to obtain the fixed-point equation \(\bx = -\bD^{-1}\bR\bx + \bD^{-1}\bb\).

NoteDefinition: Jacobi Iteration

The Jacobi iteration for \(\bA\bx = \bb\) is the stationary scheme \[\bx^{(k+1)} = \bT_J\bx^{(k)} + \bD^{-1}\bb, \qquad \bT_J = -\bD^{-1}\bR,\] or in component form, \[x_i^{(k+1)} = \frac{1}{a_{ii}}\Bigl(b_i - \sum_{j \neq i} a_{ij}x_j^{(k)}\Bigr), \quad i = 1,...,n.\] Each component \(x_i^{(k+1)}\) depends only on values from the previous iterate \(\bx^{(k)}\), so all \(n\) components may be updated simultaneously (or in any order).

22.2 Convergence of Jacobi Iteration

NoteTheorem: Convergence of Jacobi Iteration

The Jacobi iteration converges to the solution of \(\bA\bx = \bb\) for every initial vector \(\bx^{(0)}\) if and only if \(\rho(\bT_J) < 1\).

This is a direct application of the general convergence theorem for stationary iterations: convergence for all \(\bx^{(0)}\) is equivalent to \(\rho(\bT) < 1\) for the iteration matrix. Here \(\bT_J = -\bD^{-1}\bR = \bI - \bD^{-1}\bA\), so the fixed-point equation \(\bx^* = \bT_J\bx^* + \bD^{-1}\bb\) simplifies to \(\bD^{-1}\bA\bx^* = \bD^{-1}\bb\), i.e., \(\bA\bx^* = \bb\).

NoteTheorem: Jacobi Convergence for Strictly Diagonally Dominant Matrices

If \(\bA\) is strictly diagonally dominant, meaning \[|a_{ii}| > \sum_{j \neq i} |a_{ij}|, \quad \text{for all } i = 1, ..., n,\] then \(\rho(\bT_J) < 1\) and the Jacobi iteration converges for every \(\bx^{(0)}\).

Let \(\lambda\) be any eigenvalue of \(\bT_J = -\bD^{-1}\bR\) with eigenvector \(\bv \neq \bzero\). Let index \(i^*\) satisfy \(|v_{i^*}| = \|\bv\|_\infty > 0\). The eigenvalue equation \(\bT_J\bv = \lambda\bv\) gives, for row \(i^*\): \[\lambda v_{i^*} = -\frac{1}{a_{i^*i^*}}\sum_{j \neq i^*} a_{i^*j}\,v_j.\] Taking absolute values and dividing by \(|v_{i^*}|\): \[|\lambda| \leq \frac{1}{|a_{i^*i^*}|} \sum_{j \neq i^*} |a_{i^*j}| \cdot \frac{|v_j|}{|v_{i^*}|} \leq \frac{\sum_{j \neq i^*} |a_{i^*j}|}{|a_{i^*i^*}|} < 1,\] where the last inequality is strict diagonal dominance. Since \(\lambda\) was arbitrary, \(\rho(\bT_J) < 1\).

NoteExample

Effect of matrix structure on convergence.

Case 1 — Strictly diagonally dominant: \(\bA_1 = \begin{pmatrix} 5 & 1 & 1 \\ 1 & 5 & 1 \\ 1 & 1 & 5 \end{pmatrix}\). Each row satisfies \(|a_{ii}| = 5 > 2 = \sum_{j\neq i}|a_{ij}|\), so the theorem guarantees convergence. One can compute \(\rho(\bT_J) = 2/5 = 0.4\).

Case 2 — Diagonal dominance violated in one row: \(\bA_2 = \begin{pmatrix} 2 & 1 & 3 \\ 1 & 3 & 1 \\ 2 & 2 & 2 \end{pmatrix}\). Row 3 violates strict diagonal dominance. The theorem gives no guarantee; \(\rho(\bT_J)\) must be checked numerically.

Case 3 — Divergent case: \(\bA_3 = \begin{pmatrix} 1 & 3 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{pmatrix}\). Here \(\rho(\bT_J) > 1\) and the Jacobi iteration diverges for every \(\bx^{(0)} \neq \bx^*\).

22.3 Stopping Criteria

NoteDefinition: Residual-Based Stopping Criterion

Stop when the norm of the residual falls below a tolerance: \[\|\bA\bx^{(k)} - \bb\| < \epsilon_{\mathrm{res}}.\] This directly measures how well \(\bx^{(k)}\) satisfies the linear system.

NoteDefinition: Increment-Based Stopping Criterion

Stop when the change between successive iterates is small: \[\|\bx^{(k+1)} - \bx^{(k)}\| < \epsilon_{\mathrm{inc}}.\] This is cheap (no matrix–vector product needed), but measures the rate of change of the iteration rather than the accuracy of the solution directly.

NoteDefinition: Relative Residual Criterion

Stop when the residual is small relative to the right-hand side: \[\frac{\|\bA\bx^{(k)} - \bb\|}{\|\bb\|} < \epsilon_{\mathrm{rel}}.\] This is scale-invariant: multiplying \(\bA\) and \(\bb\) by a constant does not alter the stopping condition.

Tip

A maximum iteration count is always imposed as a safeguard against slow convergence or divergence. Typical practice is to set \(k_{\max}\) based on the problem size and report non-convergence if the tolerance is not met within the budget.

NoteTheorem: A Priori Error Bound for Jacobi Iteration

Let \(\rho = \rho(\bT_J) < 1\). For all \(k \geq 1\), \[\|\bx^{(k)} - \bx^*\| \leq \frac{\rho^k}{1-\rho}\,\|\bx^{(1)} - \bx^{(0)}\|.\]

The basic contraction estimate gives \(\|\bx^{(k)} - \bx^*\| \leq \rho^k\|\bx^{(0)} - \bx^*\|\). We bound \(\|\bx^{(0)} - \bx^*\|\) via the telescoping sum: \[\|\bx^{(0)} - \bx^*\| \leq \sum_{j=0}^{\infty}\|\bx^{(j+1)} - \bx^{(j)}\|.\] Since \(\|\bx^{(j+1)} - \bx^{(j)}\| = \|\bT_J^j(\bx^{(1)}-\bx^{(0)})\| \leq \rho^j\|\bx^{(1)}-\bx^{(0)}\|\), the geometric series gives \(\|\bx^{(0)} - \bx^*\| \leq \|\bx^{(1)}-\bx^{(0)}\|/(1-\rho)\). Combining yields the stated bound.

WarningExercise

Implement the Jacobi method in Python.

  1. Write jacobi(A, b, x0, tol, max\_iter) that returns the iterate at convergence (relative residual below tol) and the number of iterations used.

  2. Test on \(\bA_1\) from the example with \(\bb = (7,7,7)^T\). Compute the exact solution by hand. How many iterations are needed for \(\epsilon_{\mathrm{rel}} = 10^{-8}\)?

  3. Plot \(\|\bA\bx^{(k)} - \bb\|\) versus \(k\) on a semi-log scale. What is the observed slope? Compare to \(\log_{10}\rho(\bT_J)\).

WarningExercise

(A priori vs. observed convergence) For the system with \(\bA_1\) and \(\bb = (7,7,7)^T\):

  1. Evaluate the a priori bound for \(k = 20\) iterations. How many times larger is the bound than the actual error \(\|\bx^{(20)} - \bx^*\|\)?

  2. Consider the \(5\times5\) tridiagonal matrix with \(4\) on the diagonal and \(-1\) on the off-diagonals. Verify that it is strictly diagonally dominant, compute \(\rho(\bT_J)\) numerically, and determine how many iterations are needed to achieve \(\|\bx^{(k)}-\bx^*\| < 10^{-10}\).

22.4 Gauss-Seidel Iteration

NoteDefinition: Splitting of \(\bA\) for Gauss-Seidel

Decompose \(\bA = \bL_* + \bU_*\), where \(\bL_*\) is the lower triangular part (including the diagonal) and \(\bU_*\) is the strictly upper triangular part: \[\bA = \bL_* + \bU_*, \qquad (\bL_*)_{ij} = \begin{cases} a_{ij} & j \leq i \\ 0 & j > i \end{cases}, \quad (\bU_*)_{ij} = \begin{cases} a_{ij} & j > i \\ 0 & j \leq i \end{cases}.\] Rearranging \((\bL_* + \bU_*)\bx = \bb\) as \(\bL_*\bx = \bb - \bU_*\bx\) yields the Gauss-Seidel fixed-point equation.

NoteDefinition: Gauss-Seidel Iteration

The Gauss-Seidel iteration for \(\bA\bx = \bb\) is \[\bx^{(k+1)} = \bT_{GS}\bx^{(k)} + \bL_*^{-1}\bb, \qquad \bT_{GS} = -\bL_*^{-1}\bU_*.\] In component form, updating in order \(i = 1, ..., n\): \[x_i^{(k+1)} = \frac{1}{a_{ii}}\Bigl(b_i - \sum_{j < i} a_{ij}\,x_j^{(k+1)} - \sum_{j > i} a_{ij}\,x_j^{(k)}\Bigr).\] The key distinction from Jacobi: components \(x_1^{(k+1)}, ..., x_{i-1}^{(k+1)}\) already computed in the current sweep are used immediately.

Tip

Because updated values are used as soon as they are available, Gauss-Seidel typically converges roughly twice as fast as Jacobi in practice. However, the sequential dependence among components means updates cannot be parallelized directly, unlike Jacobi.

NoteTheorem: Gauss-Seidel Convergence for SPD Matrices

If \(\bA\) is symmetric positive definite, then the Gauss-Seidel iteration converges for every initial vector \(\bx^{(0)}\).

We show \(\rho(\bT_{GS}) < 1\). Write \(\bA = \bL_* + \bU_*\). Since \(\bA\) is symmetric, \(\bU_* = \bL_*^T - \bD\). The iteration matrix is \(\bT_{GS} = -\bL_*^{-1}\bU_*\). To show all eigenvalues satisfy \(|\lambda| < 1\), suppose \(\bT_{GS}\bv = \lambda\bv\) for \(\bv \neq \bzero\). Then \(-\bU_*\bv = \lambda\bL_*\bv\), i.e., \((\bA - \bL_*)\bv = -\lambda\bL_*\bv\), giving \(\bA\bv = (1-\lambda)\bL_*\bv\). Taking the inner product with \(\bv\) and using \(\bA \succ 0\): \[(1-\lambda)\,\bv^T\bL_*\bv = \bv^T\bA\bv > 0.\] Decompose \(\bv^T\bL_*\bv = \bv^T\bD\bv/2 + \tfrac{1}{2}\bv^T\bA\bv > 0\) (both terms positive since \(\bD \succ 0\) and \(\bA \succ 0\)), so \(1 - \lambda > 0\), meaning \(\lambda < 1\). A symmetric argument using conjugates shows \(|\lambda| < 1\), hence \(\rho(\bT_{GS}) < 1\).

NoteTheorem: Gauss-Seidel Convergence for Strictly Diagonally Dominant Matrices

If \(\bA\) is strictly diagonally dominant, the Gauss-Seidel iteration converges for every \(\bx^{(0)}\).

The proof parallels the Jacobi case. Let \(\bT_{GS}\bv = \lambda\bv\), \(\bv \neq \bzero\). Pick \(i^*\) with \(|v_{i^*}| = \|\bv\|_\infty\). The \(i^*\)-th component of \(-\bU_*\bv = \lambda\bL_*\bv\) gives \[\lambda a_{i^*i^*} v_{i^*} = -\sum_{j < i^*} a_{i^*j}(\lambda v_j) - \sum_{j > i^*} a_{i^*j} v_j.\] Taking absolute values and using \(|v_j| \leq |v_{i^*}|\): \[|\lambda|\,|a_{i^*i^*}| \leq |\lambda| \sum_{j < i^*}|a_{i^*j}| + \sum_{j > i^*}|a_{i^*j}|.\] If \(|\lambda| \geq 1\), then \(|\lambda|\,|a_{i^*i^*}| \leq |\lambda|\sum_{j \neq i^*}|a_{i^*j}|\), contradicting strict diagonal dominance. Hence \(|\lambda| < 1\).

NoteExample

Jacobi vs. Gauss-Seidel on a tridiagonal system.

Consider the \(3 \times 3\) tridiagonal system with \(\bA = \begin{pmatrix}4&-1&0\\-1&4&-1\\0&-1&4\end{pmatrix}\), \(\bb = (3, 2, 3)^T\) (exact solution \(\bx^* = (1,1,1)^T\)).

\[x_1^{(1)} = \tfrac{3}{4} = 0.75, \quad x_2^{(1)} = \tfrac{2}{4} = 0.5, \quad x_3^{(1)} = \tfrac{3}{4} = 0.75.\] \[x_1^{(1)} = \tfrac{3}{4} = 0.75, \quad x_2^{(1)} = \tfrac{2 + 0.75}{4} = \tfrac{2.75}{4} = 0.6875, \quad x_3^{(1)} = \tfrac{3 + 0.6875}{4} = \tfrac{3.6875}{4} \approx 0.922.\] Gauss-Seidel is already significantly closer to \(\bx^* = (1,1,1)^T\) after a single sweep.

22.5 Successive Over-Relaxation (SOR)

Both Jacobi and Gauss-Seidel can converge slowly when \(\rho(\bT)\) is close to 1. The idea behind SOR is simple: take the Gauss-Seidel update as a direction and overshoot by a factor \(\omega > 1\) to accelerate convergence. Choosing the optimal \(\omega\) can reduce the spectral radius dramatically, turning \(O(n^2)\) iterations into \(O(n)\) for certain model problems.

NoteDefinition: SOR Iteration

The Successive Over-Relaxation (SOR) method introduces a relaxation parameter \(\omega > 0\). Let \(\tilde{x}_i\) denote the Gauss-Seidel update for component \(i\). SOR blends the new and old values: \[x_i^{(k+1)} = (1-\omega)\,x_i^{(k)} + \omega\,\tilde{x}_i^{(k+1)},\] where \[\tilde{x}_i^{(k+1)} = \frac{1}{a_{ii}}\Bigl(b_i - \sum_{j<i}a_{ij}\,x_j^{(k+1)} - \sum_{j>i}a_{ij}\,x_j^{(k)}\Bigr).\] Substituting, the full SOR update is \[x_i^{(k+1)} = x_i^{(k)} + \frac{\omega}{a_{ii}}\Bigl(b_i - \sum_{j<i}a_{ij}\,x_j^{(k+1)} - a_{ii}\,x_i^{(k)} - \sum_{j>i}a_{ij}\,x_j^{(k)}\Bigr).\] Special cases: \(\omega = 1\) recovers Gauss-Seidel; \(\omega \in (1,2)\) is over-relaxation; \(\omega \in (0,1)\) is under-relaxation (useful for stabilizing slowly divergent iterations).

NoteTheorem: Kahan’s Necessary Condition for SOR

The SOR iteration can converge only if \(0 < \omega < 2\).

The iteration matrix for SOR is \(\bT_\omega = (\bD + \omega\bL)^{-1}((1-\omega)\bD - \omega\bU)\) where \(\bL, \bU\) are the strictly lower/upper triangular parts of \(\bA\). Its determinant equals \((1-\omega)^n\) (product of the eigenvalues). Since \(\rho(\bT_\omega) \geq |\det(\bT_\omega)|^{1/n} = |1-\omega|\), convergence requires \(\rho(\bT_\omega) < 1\), which forces \(|1-\omega| < 1\), i.e., \(0 < \omega < 2\).

NoteTheorem: Optimal SOR for Consistently Ordered Matrices

For a consistently ordered matrix (e.g., tridiagonal systems arising from discretizing PDEs) with Jacobi spectral radius \(\rho_J = \rho(\bT_J)\), the optimal relaxation parameter and corresponding spectral radius are \[\omega^* = \frac{2}{1 + \sqrt{1 - \rho_J^2}}, \qquad \rho(\bT_{\omega^*}) = \omega^* - 1 = \frac{1 - \sqrt{1-\rho_J^2}}{1 + \sqrt{1-\rho_J^2}}.\] When \(\rho_J \approx 1 - \varepsilon\) (slow Jacobi convergence), \(\omega^* \approx 2 - 2\sqrt{\varepsilon}\) and \(\rho(\bT_{\omega^*}) \approx 1 - 2\sqrt{\varepsilon} \ll \rho_J\).

For consistently ordered matrices, the eigenvalues \(\mu\) of \(\bT_\omega\) relate to the eigenvalues \(\lambda\) of \(\bT_J\) via \((\mu + \omega - 1)^2 = \lambda^2\omega^2\mu\). Minimizing \(\max|\mu|\) over \(\omega\) gives the stated formulas. The key inequality \(\rho(\bT_{\omega^*}) \approx 1 - 2\sqrt{\varepsilon}\) versus \(\rho_J \approx 1 - \varepsilon\) shows SOR reduces the number of iterations by a factor of roughly \(\sqrt{\varepsilon}^{-1}/2\) — a square-root speedup.

Tip

For the \(n\)-point 1D Poisson problem (tridiagonal, \(h = 1/(n+1)\)), Jacobi has \(\rho_J = \cos(\pi h) \approx 1 - \frac{\pi^2 h^2}{2}\), requiring \(O(n^2)\) iterations. With optimal SOR, \(\rho(\bT_{\omega^*}) \approx 1 - 2\pi h\), requiring only \(O(n)\) iterations — a dramatic improvement. The 2D Poisson problem shows similar gains.

NoteExample

Effect of \(\omega\) on convergence rate.

For the \(n = 9\) tridiagonal system (1D Poisson), \(h = 0.1\), so \(\rho_J = \cos(\pi/10) \approx 0.951\). The optimal parameter is \[\omega^* = \frac{2}{1 + \sqrt{1 - 0.951^2}} \approx \frac{2}{1 + 0.309} \approx 1.527,\] giving \(\rho(\bT_{\omega^*}) \approx 0.527\). With Gauss-Seidel (\(\omega=1\)), \(\rho \approx 0.904\). The number of iterations to reduce the error by \(10^{-8}\) is approximately \(8\log_{10} / \log_{10}(1/\rho)\), which is about 169 for Gauss-Seidel but only about 35 for SOR — nearly a fivefold speedup.

WarningExercise

(Gauss-Seidel implementation and comparison)

  1. Implement gauss\_seidel(A, b, x0, tol, max\_iter) in Python. The key difference from Jacobi is to update \(x[i]\) in place at each step.

  2. Run both Jacobi and Gauss-Seidel on the \(10\times10\) tridiagonal system with \(4\) on the diagonal and \(-1\) on the off-diagonals, \(\bb = \mathbf{1}\). Plot the residual norm versus iteration count on a semi-log scale for both methods. Estimate the ratio of iterations needed to reach \(10^{-8}\) residual.

  3. Compute \(\rho(\bT_J)\) and \(\rho(\bT_{GS})\) numerically. Is the observed convergence ratio consistent with \(\rho(\bT_{GS}) \approx \rho(\bT_J)^2\)? (This relation holds exactly for consistently ordered matrices.)

WarningExercise

(SOR parameter tuning)

  1. Implement sor(A, b, x0, omega, tol, max\_iter).

  2. For the same \(10\times10\) tridiagonal system, compute the theoretical optimal \(\omega^*\) using the formula in the Optimal SOR theorem. Run SOR with \(\omega \in \{1.0, 1.2, 1.4, \omega^*, 1.8, 1.9\}\).

  3. Plot iterations to convergence (residual \(< 10^{-8}\)) versus \(\omega\). Verify that \(\omega^*\) is indeed optimal and that convergence fails for \(\omega \geq 2\).

  4. At the optimal \(\omega^*\), compare the convergence rate to Jacobi and Gauss-Seidel from the previous exercise.