11 Symmetric Positive Definite (SPD) Matrices

Symmetric Positive Definite (SPD) matrices form one of the most important classes of matrices in scientific computing. They possess several remarkable properties that allow for algorithms that are twice as fast and significantly more stable than those for general matrices.

An SPD matrix is the matrix version of positive curvature or positive energy. The quadratic form \(\bx^T\bA\bx\) is always positive away from the origin, so it defines a geometry, a convex energy landscape, and a stable class of linear systems.

11.1 Definitions and Properties

SPD matrices arise naturally in a wide variety of engineering contexts, including optimization (as Hessian matrices of convex functions), statistics (as covariance matrices), and physical simulations (as stiffness matrices in structural mechanics).

The expression \(\bx^T\bA\bx\) is called a quadratic form. It converts a vector into a scalar. If \(\bA\) is symmetric, then this scalar measures the energy of \(\bx\) in the geometry defined by \(\bA\).

Definition: Positive Definite

A symmetric matrix \(\bA \in \fR^{n \times n}\) is positive definite (PD) if the quadratic form \(\bx^T\bA\bx\) is strictly positive for every nonzero vector \(\bx \in \fR^n\): \[ \begin{align} \bx^T\bA\bx > 0 \quad \text{for every nonzero } \bx \in \fR^n. \end{align} \] If \(\bA\) is both symmetric and PD, it is SPD.

Definition: Positive Semidefinite

A symmetric matrix \(\bA\) is positive semidefinite (PSD) if \[ \begin{align} \bx^T\bA\bx \geq 0 \quad \text{for every } \bx \in \fR^n. \end{align} \] Positive definite means strictly positive for every nonzero vector. Positive semidefinite allows flat directions where \(\bx^T\bA\bx=0\).

Example

(Definite vs. semidefinite) \[ \begin{align} \begin{pmatrix}2&0\\0&1\end{pmatrix} \quad \text{is SPD, while} \quad \begin{pmatrix}1&0\\0&0\end{pmatrix} \quad \text{is PSD but not PD.} \end{align} \] The second matrix has a flat direction: \((0,1)^T\) has zero quadratic energy.

Exercise

Test \(\bA = \begin{pmatrix}2 & 1 \\ 1 & 2\end{pmatrix}\) for PD property by expanding \(\bx^T\bA\bx\).
Prove a diagonal matrix \(\bD\) is PD iff all diagonal entries \(d_i > 0\).

Remark

(Spectral Characterization) A symmetric matrix is PD iff all its eigenvalues are strictly positive. Every SPD matrix is invertible (\(\det \bA = \prod \lambda_i > 0\)).

Remark

(Equivalent tests for SPD) For a symmetric matrix \(\bA\), the following are equivalent:

\(\bx^T\bA\bx>0\) for every nonzero \(\bx\).
Every eigenvalue of \(\bA\) is positive.
Cholesky factorization succeeds with positive pivots.
All leading principal minors are positive.

The eigenvalue and Cholesky tests are the most useful computationally.

Exercise

If \(\bA\) is SPD, show \(\bB = \bA + c\bI\) for \(c > 0\) is also SPD.
Use np.linalg.eigvalsh to verify SPD property for several test matrices.

11.2 Geometry and Metric

An SPD matrix \(\bA\) can be thought of as defining a new geometry or “metric” on \(\fR^n\), where the distance and angle between vectors are weighted by the entries of \(\bA\). This perspective is fundamental to understanding the behavior of iterative solvers like the Conjugate Gradient method.

If \(\bA=\bQ\boldsymbol{\Lambda}\bQ^T\), then \[ \begin{align} \bx^T\bA\bx = \sum_{i=1}^n \lambda_i z_i^2, \qquad \bz=\bQ^T\bx. \end{align} \] The eigenvectors give the principal directions of the geometry, and the eigenvalues determine how expensive motion is in each direction.

Definition: \(\bA\)-Inner Product

For SPD \(\bA\), the mapping \(\langle \bx, \by \rangle_\bA = \bx^T\bA\by\) satisfies the axioms of an inner product.

\(\bA\)-Norm: The induced norm is \(\|\bx\|_\bA = \sqrt{\bx^T\bA\bx}\).
Ellipsoid: The unit ball in this norm, \(\{\bx : \|\bx\|_\bA = 1\}\), is an \(n\)-dimensional ellipsoid whose principal axes are aligned with the eigenvectors of \(\bA\).

Remark

Physical Meaning: Directions with large eigenvalues are stiff'' orexpensive’’ (small axes in the unit ellipsoid); small eigenvalues represent ``sensitive’’ directions.

Example

(Ellipse) For \[ \begin{align} \bA=\begin{pmatrix}4&0\\0&1\end{pmatrix}, \end{align} \] the unit \(\bA\)-norm curve is \[ \begin{align} 4x_1^2+x_2^2=1. \end{align} \] The axis in the \(x_1\) direction is shorter because that direction has larger eigenvalue. Moving in that direction costs more energy.

Exercise

For \(\bA = \begin{pmatrix} 4 & 0 \\ 0 & 1 \end{pmatrix}\), compute the distance between \((1,0)^T\) and origin in both Euclidean and \(\bA\)-metrics.
Where does the \(\bA\)-metric appear in statistics? (Mahalanobis distance).

11.3 Cholesky Decomposition

The symmetry and positive definiteness of \(\bA\) allow for a specialized version of the LU factorization known as the Cholesky decomposition. The decomposition can be interpreted as a generalized square root for SPD matrices, where \(\bA = \bL\bL^T\).

Cholesky is the natural direct solver for SPD systems. It stores only one triangular factor instead of two, avoids pivoting, and preserves the symmetry of the problem.

Definition: Cholesky Decomposition

Every SPD matrix \(\bA\) admits a unique decomposition \(\bA = \bL\bL^T\), where \(\bL\) is lower triangular with strictly positive diagonal entries.

Example

(A \(2\times2\) Cholesky factor) Let \[ \begin{align} \bA=\begin{pmatrix}4&2\\2&3\end{pmatrix}. \end{align} \] Assume \[ \begin{align} \bL=\begin{pmatrix}\ell_{11}&0\\\ell_{21}&\ell_{22}\end{pmatrix}. \end{align} \] Matching \(\bA=\bL\bL^T\) gives \(\ell_{11}=2\), \(\ell_{21}=1\), and \(\ell_{22}=\sqrt{2}\), so \[ \begin{align} \bL=\begin{pmatrix}2&0\\1&\sqrt{2}\end{pmatrix}. \end{align} \]

Remark

(Flop Cost) The computational cost of the Cholesky decomposition is , approximately half that of a general LU factorization.

Remark

Stability Advantage: Unlike general LU, Cholesky is guaranteed to be stable without pivoting. In code, np.linalg.cholesky is the standard numerical test for positive definiteness; if it raises a LinAlgError, the matrix is not PD.

Remark

(Solving with Cholesky) If \(\bA=\bL\bL^T\), then solving \(\bA\bx=\bb\) requires two triangular solves:

Solve \(\bL\by=\bb\).
Solve \(\bL^T\bx=\by\).

The factorization costs \(O(n^3)\), but each solve costs only \(O(n^2)\).

Exercise

Factor \(\bA = \begin{pmatrix} 4 & 2 \\ 2 & 3 \end{pmatrix}\) by hand.
Why does Cholesky not require row swaps?
Solve \(\bA\bx = \bb\) using Cholesky for \(\bA = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}\).

Exercise

(Applications)

Covariance: Generate X = np.random.randn(100, 4). Verify X.T @ X is SPD via Cholesky.
Gram Matrix: Prove that \(\mathbf{G} = \bX^T\bX\) is at least positive semidefinite for any \(\bX\), and PD if \(\bX\) has full column rank.

Remark

(Where SPD matrices appear)

Optimization: Hessians of strictly convex quadratic functions.
Statistics: Covariance and Gram matrices, often PSD and SPD when data are full rank.
PDEs: Discrete Laplacians and stiffness matrices for elliptic problems.
Least squares: Normal equations \(\bA^T\bA\bx=\bA^T\bb\) are SPD when \(\bA\) has full column rank.

Remark

(Connection to CG) Conjugate Gradient is designed for large sparse SPD systems. It minimizes the quadratic energy \[ \begin{align} \phi(\bx)=\frac{1}{2}\bx^T\bA\bx-\bb^T\bx, \end{align} \] whose unique minimizer satisfies \(\bA\bx=\bb\) when \(\bA\) is SPD.

Proof

For any nonzero vector \(\bv \in \fR^n\), consider the quadratic form: \[ \begin{align} \bv^T \mathbf{G} \bv &= \bv^T (\bX^T \bX) \bv \\ &= (\bX \bv)^T (\bX \bv) \\ &= \|\bX \bv\|_2^2. \end{align} \] Since the squared Euclidean norm is always non-negative, \(\bv^T \mathbf{G} \bv \geq 0\), proving \(\mathbf{G}\) is positive semidefinite. If \(\bX\) has full column rank, then \(\bX \bv = \bzero\) iff \(\bv = \bzero\). Thus \(\bv^T \mathbf{G} \bv > 0\) for all nonzero \(\bv\), making \(\mathbf{G}\) positive definite.

Exercise

(SPD Condition Number) \(\kappa_2(\bA) = \lambda_{\max} / \lambda_{\min}\).

If \(\bA\) is SPD with eigenvalues \(\{10, 0.01\}\), what is \(\kappa_2(\bA)\)?
Verify \(\kappa_2(\bL) = \sqrt{\kappa_2(\bA)}\) where \(\bL\) is the Cholesky factor.