19  Lanczos Algorithm

Given a symmetric matrix \(\bA \in \fR^{n \times n}\), find the \(k\) dominant or extreme eigenpairs efficiently when \(k \ll n\). For large, sparse matrices, explicitly computing all \(n\) eigenvalues via the QR algorithm costs \(O(n^3)\) and is impractical. The Lanczos algorithm builds a \(k\)-dimensional Krylov subspace that captures the extreme eigenvalues in \(O(k \cdot n)\) work.

19.1 Krylov Subspaces

NoteDefinition: Krylov Subspace

Given a matrix \(\bA \in \fR^{n \times n}\) and a starting vector \(\bv \in \fR^n\), the \(k\)-th Krylov subspace is \[\mathcal{K}_k(\bA, \bv) = \mathrm{span}\{\bv,\, \bA\bv,\, \bA^2\bv,\, ...,\, \bA^{k-1}\bv\}.\]

The intuition: if \(\bv = \sum_i \alpha_i \bu_i\) in the eigenbasis of \(\bA\), then \(\bA^j\bv = \sum_i \alpha_i \lambda_i^j \bu_i\). For large \(j\), the component along the dominant eigenvector dominates. The Krylov subspace concentrates information about the extreme eigenvalues.

19.2 The Lanczos Recurrence

NoteTheorem: Lanczos Relation

Starting from a unit vector \(\bv_1\), the \(k\)-step Lanczos process builds an orthonormal basis \(V_k = [\bv_1, ..., \bv_k]\) for \(\mathcal{K}_k(\bA, \bv_1)\) satisfying \[\bA V_k = V_k T_k + \beta_k \bv_{k+1} \be_k^T,\] where \(T_k \in \fR^{k \times k}\) is the symmetric tridiagonal matrix \[T_k = \begin{pmatrix} \alpha_1 & \beta_1 & & \\ \beta_1 & \alpha_2 & \beta_2 & \\ & \ddots & \ddots & \ddots \\ & & \beta_{k-1} & \alpha_k \end{pmatrix}.\] The diagonal entries \(\alpha_j = \bv_j^T \bA \bv_j\) are Rayleigh quotients and the off-diagonal entries \(\beta_j = \|\bw_j\|_2\) are residual norms.

Each new basis vector is generated by the three-term recurrence: set \(\bw = \bA\bv_j\), subtract the projections onto the two previous basis vectors, \[\bw \leftarrow \bw - \alpha_j \bv_j - \beta_{j-1}\bv_{j-1}, \qquad \alpha_j = \bv_j^T(\bA\bv_j),\] then normalize: \(\beta_j = \|\bw\|\), \(\bv_{j+1} = \bw/\beta_j\). Assembling these column equations gives \(\bA V_k = V_k T_k + \beta_k \bv_{k+1}\be_k^T\). Since each \(\bv_{j+1}\) is obtained by orthogonalizing against \(\bv_j\) and \(\bv_{j-1}\) only, the columns of \(V_k\) are orthonormal by induction in exact arithmetic.

19.3 Ritz Values and the Galerkin Projection

NoteDefinition: Ritz Values and Ritz Vectors

The eigenvalues \(\theta_1, ..., \theta_k\) of \(T_k\) are called the Ritz values of \(\bA\) with respect to \(\mathcal{K}_k\). If \(T_k \bs_i = \theta_i \bs_i\), the corresponding Ritz vector is \(\tilde{\bu}_i = V_k \bs_i\).

NoteProposition: Galerkin Projection and Cheap Residual Estimate

\(T_k = V_k^T \bA V_k\) is the Galerkin projection of \(\bA\) onto \(\mathcal{K}_k\). The residual of a Ritz pair satisfies \[\|\bA\tilde{\bu}_i - \theta_i \tilde{\bu}_i\|_2 = \beta_k |s_i(k)|,\] where \(s_i(k)\) is the \(k\)-th component of the eigenvector \(\bs_i\) of \(T_k\). This costs \(O(k)\) to compute without forming \(\tilde{\bu}_i\) explicitly.

Using the Lanczos relation \(\bA V_k = V_k T_k + \beta_k \bv_{k+1} \be_k^T\): \[\bA\tilde{\bu}_i - \theta_i\tilde{\bu}_i = \bA V_k \bs_i - \theta_i V_k \bs_i = V_k(T_k - \theta_i \bI)\bs_i + \beta_k \bv_{k+1} s_i(k).\] Since \(T_k \bs_i = \theta_i \bs_i\), the first term vanishes, and \(\|V_k \bs_i\|_2 = \|\bs_i\|_2 = 1\) with \(\|\bv_{k+1}\|_2 = 1\) gives the result.

19.4 Convergence

NoteTheorem: Convergence of Ritz Values

The extreme Ritz values converge to the extreme eigenvalues first. For the largest eigenvalue \(\lambda_1\) with spectral gap \(\delta = (\lambda_1 - \lambda_2)/(\lambda_1 - \lambda_n)\), the error after \(k\) Lanczos steps satisfies \[|\lambda_1 - \theta_1^{(k)}| \leq (\lambda_1 - \lambda_n) \left(\frac{\tan\angle(\bv_1, \bu_1)}{T_{k-1}(1 + 2\delta)}\right)^2,\] where \(T_{k-1}\) is the Chebyshev polynomial of degree \(k-1\) and \(\bu_1\) is the true eigenvector. For large spectral gap, \(T_{k-1}(1+2\delta)\) grows exponentially in \(k\).

Tip

Interior eigenvalues converge last. To target an interior eigenvalue \(\sigma\), use shift-and-invert Lanczos with \((\bA - \sigma\bI)^{-1}\), which maps the target eigenvalue to the dominant one.

19.5 Algorithm

Tip

In floating-point arithmetic, the three-term recurrence alone loses orthogonality after many steps, causing ghost eigenvalues (converged Ritz values reappear spuriously). Full reorthogonalization prevents this at cost \(O(k^2 n)\).


function lanczos(A, k, tol):
    v[0] <- random unit vector
    beta[-1] <- 0

    for j = 0 to k-1:
        w <- A * v[j]                    // matvec
        alpha[j] <- v[j]^T * w          // Rayleigh quotient

        // Three-term recurrence
        w <- w - alpha[j]*v[j]
        if j > 0: w <- w - beta[j-1]*v[j-1]

        // Full reorthogonalization (Modified Gram-Schmidt)
        for l = 0 to j:
            w <- w - (v[l]^T * w) * v[l]

        beta[j] <- ||w||
        if beta[j] < 1e-12: break       // exact invariant subspace

        v[j+1] <- w / beta[j]

    T <- tridiag(alpha[0..j], beta[0..j-1])
    (theta, S) <- eig_sym(T)
    // Residuals: ||A*u_i - theta_i*u_i|| = beta[j] * |S[j,i]|
    return theta, S, v[0..j], beta[j]

Total cost: \(O(k \cdot \mathrm{nnz})\) for matvecs (sparse \(\bA\)) plus \(O(k^2 n)\) for reorthogonalization plus \(O(k^3)\) for the inner eigensolver on \(T_k\).

19.6 Exercises

WarningExercise
  1. Let \(\bA = \mathrm{diag}(5, 3, 1)\) and \(\bv_1 = \frac{1}{\sqrt{3}}(1,1,1)^T\). Carry out 2 steps of Lanczos by hand: compute \(\alpha_1\), \(\bv_2\), \(\beta_1\), \(\alpha_2\). Write down \(T_2\) and find its Ritz values. How close are they to eigenvalues \(5\) and \(1\)?

  2. Prove that in exact arithmetic, the Lanczos algorithm terminates in at most \(n\) steps, producing an exact eigendecomposition of \(\bA\).

  3. Show that \(T_k = V_k^T \bA V_k\) directly from the Lanczos relation.

  4. The residual estimate \(\|\bA\tilde{\bu}_i - \theta_i\tilde{\bu}_i\| = \beta_k|s_i(k)|\) is cheap. Why is this computationally important? How does it avoid forming the full Ritz vector?

  5. (Ghost eigenvalues) Implement Lanczos on \(\bA = \mathrm{diag}(n, n-1, ..., 1)\) for \(n = 50\), running 100 steps without reorthogonalization. Plot the Ritz values at each step. How many spurious copies of \(\lambda_1 = 50\) appear? Repeat with full reorthogonalization.