Every real matrix $A \in \mathbb{R}^{m \times n}$ admits the singular value decomposition

\[ A = U \Sigma V^T, \]

where $U \in \mathbb{R}^{m \times r}$ and $V \in \mathbb{R}^{n \times r}$ have orthonormal columns, $r = \min(m,n)$, and $\Sigma = \operatorname{diag}(\sigma_1, \ldots, \sigma_r)$ with $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r \geq 0$ are the singular values. The economy (thin) form stores only $r$ columns of $U$ and $V$; the full form pads $U$ or $V$ to square. This library returns the economy form.

Two complementary algorithms are provided:

One-sided Jacobi SVD (num::svd) — dense, high relative accuracy for all singular values, including those close to zero.
Randomized truncated SVD (num::svd_truncated) — finds the $k$ largest singular values cheaply when $k \ll \min(m,n)$ and fast spectral decay holds.

One-Sided Jacobi SVD

The one-sided Jacobi method applies Givens rotations from the right to the columns of $A$ until they are mutually orthogonal. At convergence, column norms equal the singular values and normalized columns equal the left singular vectors.

Why Not Form \f$A^T A\f$

The singular values of $A$ are the square roots of the eigenvalues of $A^T A$. Explicitly forming $A^T A$ squares the condition number: $\kappa(A^T A) = \kappa(A)^2$. For ill-conditioned $A$ this causes severe loss of accuracy in the smallest singular values. The one-sided method works directly on $A$ and avoids this entirely.

Column Orthogonality Condition

For column pair $(p, q)$, define the inner products

\[ \alpha = \|a_p\|^2, \qquad \beta = \|a_q\|^2, \qquad \gamma = a_p \cdot a_q. \]

The relative cosine $|\gamma|/\sqrt{\alpha\beta}$ measures deviation from orthogonality. A rotation is skipped whenever this quantity is below the convergence tolerance.

Rotation Angle

To zero $[A^T A]_{pq} = \gamma$ via a right rotation $A \leftarrow AG$, define

\[ \zeta = \frac{\beta - \alpha}{2\gamma}. \]

Then $t$ is the smaller root of $t^2 + 2\zeta t - 1 = 0$:

\[ t = \frac{\operatorname{sign}(\zeta)}{|\zeta| + \sqrt{1 + \zeta^2}}, \qquad c = \frac{1}{\sqrt{1+t^2}}, \qquad s = ct. \]

With $a_p' = ca_p - sa_q$ and $a_q' = sa_p + ca_q$, one can verify that $a_p'^T a_q' = 0$ by construction.

Convergence

Define the off-orthogonality functional

\[ \psi(A) = \sum_{p < q} \gamma_{pq}^2. \]

Each rotation decreases $\psi$ by exactly $\gamma_{pq}^2$:

\[ \psi(A') = \psi(A) - \gamma_{pq}^2. \]

After a full sweep over all $\binom{r}{2}$ pairs:

\[ \psi(A_{\mathrm{sweep}+1}) \leq \left(1 - \frac{2}{r(r-1)}\right)\psi(A_{\mathrm{sweep}}) \qquad \text{(linear phase)}. \]

Once $\max_{p \neq q}|[A^T A]_{pq}|$ is small the method enters a quadratic phase, identical to the analysis of symmetric Jacobi applied to $A^T A$. In practice 5–10 sweeps suffice for double precision.

High Relative Accuracy

One-sided Jacobi is among the most accurate SVD algorithms. Whereas standard bidiagonalization (LAPACK dgesdd) computes all singular values to absolute accuracy $O(\varepsilon_{\mathrm{mach}}\|A\|)$, the Jacobi method achieves high relative accuracy for small singular values: if $A$ is column-equilibrated, $\sigma_k$ is computed with relative error $O(\varepsilon_{\mathrm{mach}})$ even when $\sigma_k / \sigma_1 = O(\varepsilon_{\mathrm{mach}})$.

Complexity and Comparison

Complexity per sweep: $O(r^2)$ rotations $\times$ $O(m + n)$ flops per rotation $= O(r^2(m+n))$.

Algorithm	Cost	Accuracy	Use when
One-sided Jacobi (this library)	$O(r^2(m+n)\times$ sweeps)	High relative	Small $n$, need small $\sigma_k$ accurately
LAPACK `dgesvd` (bidiag + QR)	$O(mn^2)$	Absolute	Standard dense SVD
LAPACK `dgesdd` (divide & conquer)	$O(mn^2)$	Absolute	Fast, most common
Randomized SVD	$O(mn\ell)$	Approximate	$k \ll \min(m,n)$

For tall-skinny $A$ ( $m \gg n$): compute thin QR first ( $A = \hat{Q}\hat{R}$), run Jacobi on $\hat{R} \in \mathbb{R}^{n \times n}$, then recover $U = \hat{Q}\hat{U}$.

API: num::svd, num::SVDResult

Randomized Truncated SVD

When only the $k$ largest singular values are needed and $k \ll \min(m,n)$, a full SVD wastes almost all its work. The randomized algorithm exploits random sketching to capture the dominant singular subspace cheaply.

Eckart–Young Theorem

The best rank- $k$ approximation to $A$ in both the Frobenius and spectral norms is the truncated SVD using the $k$ largest singular values:

\[ \min_{\operatorname{rank}(B) \leq k} \|A - B\|_F = \left(\sum_{j > k} \sigma_j^2\right)^{1/2}. \]

The randomized algorithm approximates this optimum with a controllable error that vanishes as oversampling increases.

Algorithm (Halko–Martinsson–Tropp 2011)

Given target rank $k$ and oversampling $p$ (typically $p = 5$– $10$), set $\ell = k + p$ capped at $\min(m, n)$ to avoid rank overflow.

Draw a Gaussian sketch $\Omega \in \mathbb{R}^{n \times \ell}$ with $\Omega_{ij} \sim \mathcal{N}(0,1)$.
Form $Y = A\Omega \in \mathbb{R}^{m \times \ell}$ (one DGEMM).
Thin QR: compute orthonormal $Q \in \mathbb{R}^{m \times \ell}$ such that $\operatorname{range}(Q) \approx \operatorname{range}(A)$.
Project: $B = Q^T A \in \mathbb{R}^{\ell \times n}$ (one DGEMM).
Compute the full SVD of the small matrix $B$: $B = \tilde{U}\Sigma V^T$.
Lift to the original space: $U = Q\tilde{U}_{[:,\,0:k]}$.

Return $(U_{[:,0:k]},\,\Sigma_{0:k},\,V^T_{0:k,:})$.

HMT Error Bound

For Gaussian $\Omega$ with $\ell = k + p$, $p \geq 2$:

\[ \mathbb{E}\|A - QQ^T A\|_F \leq \left(1 + \frac{k}{p-1}\right)^{1/2} \left(\sum_{j > k} \sigma_j^2\right)^{1/2}. \]

The factor in front of the optimal tail error approaches 1 as $p \to \infty$. In practice $p = 10$ is sufficient for well-separated spectra.

Oversampling Cap \f$\ell \leq \min(m,n)\f$

If $\ell > \min(m,n)$, the sketch $Y$ has rank at most $\min(m,n)$ and the thin QR produces at most $m$ orthonormal columns. Requesting more columns causes out-of-bounds access, so the implementation clamps $\ell \leftarrow \min(k+p,\,\min(m,n))$ before proceeding.

Accuracy vs Cost Trade-offs

Parameter	Effect on accuracy	Effect on cost
Oversampling $p$	Increases (error factor $\to 1$ as $p \to \infty$)	Linear in $p$
Power iterations $q$	Dramatic improvement for slow spectral decay	$2q$ extra DGEMM passes
Gaussian vs SRHT sketch	Similar guarantees	SRHT: $O(mn\log\ell)$ vs Gaussian: $O(mn\ell)$

Optional power iterations replace $Y = A\Omega$ with $Y = (AA^T)^q A\Omega$, raising singular value decay to the $q$-th power. This dramatically improves approximation quality for slowly decaying spectra such as text or recommendation-system matrices; $q = 1$– $3$ typically suffices.

Total cost: $O(mn\ell)$ (two DGEMMs) $+ O(m\ell^2)$ (thin QR) $+ O(\ell^2 n)$ (small SVD). For $\ell \ll \min(m,n)$ this is dramatically cheaper than the exact $O(mn\min(m,n))$ SVD.

API: num::svd_truncated

Algorithm	Cost	Accuracy	Use when
One-sided Jacobi (this library)	\(O(r^2(m+n)\times\) sweeps)	High relative	Small \(n\), need small \(\sigma_k\) accurately
LAPACK `dgesvd` (bidiag + QR)	\(O(mn^2)\)	Absolute	Standard dense SVD
LAPACK `dgesdd` (divide & conquer)	\(O(mn^2)\)	Absolute	Fast, most common
Randomized SVD	\(O(mn\ell)\)	Approximate	\(k \ll \min(m,n)\)

Parameter	Effect on accuracy	Effect on cost
Oversampling \(p\)	Increases (error factor \(\to 1\) as \(p \to \infty\))	Linear in \(p\)
Power iterations \(q\)	Dramatic improvement for slow spectral decay	\(2q\) extra DGEMM passes
Gaussian vs SRHT sketch	Similar guarantees	SRHT: \(O(mn\log\ell)\) vs Gaussian: \(O(mn\ell)\)