|
numerics
|
Every real matrix \(A \in \mathbb{R}^{m \times n}\) admits the singular value decomposition
\[ A = U \Sigma V^T, \]
where \(U \in \mathbb{R}^{m \times r}\) and \(V \in \mathbb{R}^{n \times r}\) have orthonormal columns, \(r = \min(m,n)\), and \(\Sigma = \operatorname{diag}(\sigma_1, \ldots, \sigma_r)\) with \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r \geq 0\) are the singular values. The economy (thin) form stores only \(r\) columns of \(U\) and \(V\); the full form pads \(U\) or \(V\) to square. This library returns the economy form.
Two complementary algorithms are provided:
The one-sided Jacobi method applies Givens rotations from the right to the columns of \(A\) until they are mutually orthogonal. At convergence, column norms equal the singular values and normalized columns equal the left singular vectors.
The singular values of \(A\) are the square roots of the eigenvalues of \(A^T A\). Explicitly forming \(A^T A\) squares the condition number: \(\kappa(A^T A) = \kappa(A)^2\). For ill-conditioned \(A\) this causes severe loss of accuracy in the smallest singular values. The one-sided method works directly on \(A\) and avoids this entirely.
For column pair \((p, q)\), define the inner products
\[ \alpha = \|a_p\|^2, \qquad \beta = \|a_q\|^2, \qquad \gamma = a_p \cdot a_q. \]
The relative cosine \(|\gamma|/\sqrt{\alpha\beta}\) measures deviation from orthogonality. A rotation is skipped whenever this quantity is below the convergence tolerance.
To zero \([A^T A]_{pq} = \gamma\) via a right rotation \(A \leftarrow AG\), define
\[ \zeta = \frac{\beta - \alpha}{2\gamma}. \]
Then \(t\) is the smaller root of \(t^2 + 2\zeta t - 1 = 0\):
\[ t = \frac{\operatorname{sign}(\zeta)}{|\zeta| + \sqrt{1 + \zeta^2}}, \qquad c = \frac{1}{\sqrt{1+t^2}}, \qquad s = ct. \]
With \(a_p' = ca_p - sa_q\) and \(a_q' = sa_p + ca_q\), one can verify that \(a_p'^T a_q' = 0\) by construction.
Define the off-orthogonality functional
\[ \psi(A) = \sum_{p < q} \gamma_{pq}^2. \]
Each rotation decreases \(\psi\) by exactly \(\gamma_{pq}^2\):
\[ \psi(A') = \psi(A) - \gamma_{pq}^2. \]
After a full sweep over all \(\binom{r}{2}\) pairs:
\[ \psi(A_{\mathrm{sweep}+1}) \leq \left(1 - \frac{2}{r(r-1)}\right)\psi(A_{\mathrm{sweep}}) \qquad \text{(linear phase)}. \]
Once \(\max_{p \neq q}|[A^T A]_{pq}|\) is small the method enters a quadratic phase, identical to the analysis of symmetric Jacobi applied to \(A^T A\). In practice 5–10 sweeps suffice for double precision.
One-sided Jacobi is among the most accurate SVD algorithms. Whereas standard bidiagonalization (LAPACK dgesdd) computes all singular values to absolute accuracy \(O(\varepsilon_{\mathrm{mach}}\|A\|)\), the Jacobi method achieves high relative accuracy for small singular values: if \(A\) is column-equilibrated, \(\sigma_k\) is computed with relative error \(O(\varepsilon_{\mathrm{mach}})\) even when \(\sigma_k / \sigma_1 = O(\varepsilon_{\mathrm{mach}})\).
Complexity per sweep: \(O(r^2)\) rotations \(\times\) \(O(m + n)\) flops per rotation \(= O(r^2(m+n))\).
| Algorithm | Cost | Accuracy | Use when |
|---|---|---|---|
| One-sided Jacobi (this library) | \(O(r^2(m+n)\times\) sweeps) | High relative | Small \(n\), need small \(\sigma_k\) accurately |
LAPACK dgesvd (bidiag + QR) | \(O(mn^2)\) | Absolute | Standard dense SVD |
LAPACK dgesdd (divide & conquer) | \(O(mn^2)\) | Absolute | Fast, most common |
| Randomized SVD | \(O(mn\ell)\) | Approximate | \(k \ll \min(m,n)\) |
For tall-skinny \(A\) ( \(m \gg n\)): compute thin QR first ( \(A = \hat{Q}\hat{R}\)), run Jacobi on \(\hat{R} \in \mathbb{R}^{n \times n}\), then recover \(U = \hat{Q}\hat{U}\).
API: num::svd, num::SVDResult
When only the \(k\) largest singular values are needed and \(k \ll \min(m,n)\), a full SVD wastes almost all its work. The randomized algorithm exploits random sketching to capture the dominant singular subspace cheaply.
The best rank- \(k\) approximation to \(A\) in both the Frobenius and spectral norms is the truncated SVD using the \(k\) largest singular values:
\[ \min_{\operatorname{rank}(B) \leq k} \|A - B\|_F = \left(\sum_{j > k} \sigma_j^2\right)^{1/2}. \]
The randomized algorithm approximates this optimum with a controllable error that vanishes as oversampling increases.
Given target rank \(k\) and oversampling \(p\) (typically \(p = 5\)– \(10\)), set \(\ell = k + p\) capped at \(\min(m, n)\) to avoid rank overflow.
Return \((U_{[:,0:k]},\,\Sigma_{0:k},\,V^T_{0:k,:})\).
For Gaussian \(\Omega\) with \(\ell = k + p\), \(p \geq 2\):
\[ \mathbb{E}\|A - QQ^T A\|_F \leq \left(1 + \frac{k}{p-1}\right)^{1/2} \left(\sum_{j > k} \sigma_j^2\right)^{1/2}. \]
The factor in front of the optimal tail error approaches 1 as \(p \to \infty\). In practice \(p = 10\) is sufficient for well-separated spectra.
If \(\ell > \min(m,n)\), the sketch \(Y\) has rank at most \(\min(m,n)\) and the thin QR produces at most \(m\) orthonormal columns. Requesting more columns causes out-of-bounds access, so the implementation clamps \(\ell \leftarrow \min(k+p,\,\min(m,n))\) before proceeding.
| Parameter | Effect on accuracy | Effect on cost |
|---|---|---|
| Oversampling \(p\) | Increases (error factor \(\to 1\) as \(p \to \infty\)) | Linear in \(p\) |
| Power iterations \(q\) | Dramatic improvement for slow spectral decay | \(2q\) extra DGEMM passes |
| Gaussian vs SRHT sketch | Similar guarantees | SRHT: \(O(mn\log\ell)\) vs Gaussian: \(O(mn\ell)\) |
Optional power iterations replace \(Y = A\Omega\) with \(Y = (AA^T)^q A\Omega\), raising singular value decay to the \(q\)-th power. This dramatically improves approximation quality for slowly decaying spectra such as text or recommendation-system matrices; \(q = 1\)– \(3\) typically suffices.
Total cost: \(O(mn\ell)\) (two DGEMMs) \(+ O(m\ell^2)\) (thin QR) \(+ O(\ell^2 n)\) (small SVD). For \(\ell \ll \min(m,n)\) this is dramatically cheaper than the exact \(O(mn\min(m,n))\) SVD.
API: num::svd_truncated