33 Markov Chains and Stochastic Matrices

Markov chains model random processes whose next state depends only on the current state. With the column convention used here, probability vectors are columns and the update is \(\bx^{(k+1)}=\bP\bx^{(k)}\).

33.1 Definitions

Definition: Probability Vector

A vector \(\bx\in\fR^n\) is a probability vector if \[ \begin{align} x_i\geq 0,\qquad \sum_{i=1}^n x_i=1. \end{align} \] The entry \(x_i\) is the probability of state \(i\).

Definition: Column-Stochastic Matrix

Matrix \(\bP \in \fR^{n \times n}\) where:

\(P_{ij} \geq 0\).
\(\sum_i P_{ij} = 1\) (Columns sum to 1).

Remark

(Column convention) Column \(j\) of \(\bP\) contains the probabilities of moving from state \(j\) to all possible next states. Some texts use row-stochastic matrices and row probability vectors. The two conventions are transposes of each other.

Theorem: Spectrum of a Stochastic Matrix

If \(\bP\) is column-stochastic, then \(\lambda=1\) is an eigenvalue and every eigenvalue satisfies \(|\lambda|\leq 1\).

Proof

Since \(\mathbf{1}^T\bP=\mathbf{1}^T\), \(1\) is a left eigenvalue, hence an eigenvalue. To bound the spectrum, apply the row-stochastic argument to \(\bP^T\): if \(\bP^T\bv=\lambda\bv\) and \(|v_j|=\|\bv\|_\infty\), then \[ \begin{align} |\lambda|\,|v_j| = \left|\sum_i P_{ij}v_i\right| \leq \sum_i P_{ij}|v_i| \leq |v_j|. \end{align} \] Thus \(|\lambda|\leq 1\).

Exercise

Show that \(\mathbf{1}^T\bP=\mathbf{1}^T\) for a column-stochastic matrix.
Conclude that \(1\) is a left eigenvalue of \(\bP\), hence an eigenvalue.
Prove \(|\lambda|\leq 1\) first for a row-stochastic matrix by taking the largest-magnitude entry of an eigenvector.
Apply the previous step to \(\bP^T\) to prove the result above.

Definition: Markov Chain

Probability distributions \(\bx^{(k)}\) evolving by \(\bx^{(k+1)} = \bP \bx^{(k)}\).

After \(k\) steps: \(\bx^{(k)} = \bP^k \bx^{(0)}\).
Long-run behavior is governed by the eigendecomposition of \(\bP\).

Definition: Stationary Distribution

A probability vector \(\boldsymbol{\pi}\) such that \(\bP \boldsymbol{\pi} = \boldsymbol{\pi}\).

Right eigenvector of \(\bP\) for \(\lambda=1\).
Represents the equilibrium state of the chain.

Example

(Two-state chain) Let \[ \begin{align} \bP= \begin{pmatrix} 1-a & b\\ a & 1-b \end{pmatrix}, \qquad 0<a,b<1. \end{align} \] Solving \(\bP\boldsymbol{\pi}=\boldsymbol{\pi}\) with \(\pi_1+\pi_2=1\) gives \[ \begin{align} \boldsymbol{\pi} = \frac{1}{a+b} \begin{pmatrix} b\\ a\end{pmatrix}. \end{align} \]

Definition: Reducible and Periodic Chains

A chain is reducible if some state cannot be reached from another state. A state has period \(d>1\) if returns to that state occur only at times divisible by \(d\). A primitive chain is irreducible and aperiodic.

Example

(Two failure modes) The identity matrix is reducible: every state is absorbing. The matrix \[ \begin{align} \begin{pmatrix} 0&1\\ 1&0 \end{pmatrix} \end{align} \] is irreducible but periodic with period \(2\), so the distribution oscillates instead of converging.

33.2 Convergence and Mixing

Theorem: Perron-Frobenius for Chains

If \(\bP\) is irreducible (all states reachable) and primitive (no cycles), then:

\(\lambda_1 = 1\) is simple and unique.
\(|\lambda_i| < 1\) for all other eigenvalues.
\(\lim_{k\to\infty} \bP^k \bx^{(0)} = \boldsymbol{\pi}\) for any initial state.

Definition: Spectral Gap

The gap \(\delta = 1 - |\lambda_2|\) controls convergence speed.

Theorem: Spectral Mixing Heuristic

For a diagonalizable primitive chain, the distance to stationarity is dominated asymptotically by the second eigenvalue: \[ \begin{align} \|\bx^{(k)}-\boldsymbol{\pi}\| \approx C|\lambda_2|^k. \end{align} \] Thus a larger spectral gap \(\delta=1-|\lambda_2|\) means faster mixing, and the iteration count to reach tolerance \(\varepsilon\) scales like \[ \begin{align} k_\varepsilon \approx \frac{\log(1/\varepsilon)}{\delta} \end{align} \] when \(\delta\) is small.

Exercise

Write an initial distribution as \(\bx^{(0)}=\boldsymbol{\pi}+\sum_{i\geq 2} c_i\bv_i\) using eigenvectors of \(\bP\).
Apply \(\bP^k\) and use \(\bP\boldsymbol{\pi}=\boldsymbol{\pi}\).
Identify the slowest-decaying term.
Use \(\log(1-\delta)\approx -\delta\) to derive the mixing-time estimate in the result above.

33.3 Properties and Design

Definition: Detailed Balance (Reversibility)

A chain is reversible if \(\pi_j P_{ij} = \pi_i P_{ji}\) for all \(i, j\).

Theorem: Reversible Chains Have Real Spectrum

If \(\bP\) satisfies detailed balance with stationary distribution \(\boldsymbol{\pi}\) and \(\bD_{\boldsymbol{\pi}}=\operatorname{diag}(\boldsymbol{\pi})\), then \[ \begin{align} \bD_{\boldsymbol{\pi}}^{-1/2}\bP\bD_{\boldsymbol{\pi}}^{1/2} \end{align} \] is symmetric. Therefore reversible chains have real eigenvalues.

Exercise

Compute the \((i,j)\) entry of \(\bD_{\boldsymbol{\pi}}^{-1/2}\bP\bD_{\boldsymbol{\pi}}^{1/2}\).
Use detailed balance from the result above to show that this entry equals the \((j,i)\) entry.
Explain why similarity preserves eigenvalues.
Conclude the result above.

Remark

(Design Tool: Metropolis-Hastings) Construct a reversible chain for a desired \(\boldsymbol{\pi}\) by accepting moves with probability \(a = \min(1, \frac{\pi_{\text{new}} Q(\text{old}|\text{new})}{\pi_{\text{old}} Q(\text{new}|\text{old})})\).

Theorem: Metropolis-Hastings Detailed Balance

Let \(Q_{ij}\) be a proposal probability from state \(j\) to state \(i\). Define \[ \begin{align} A_{ij} = \min\left(1,\frac{\pi_i Q_{ji}}{\pi_j Q_{ij}}\right), \qquad i\neq j. \end{align} \] The transition probabilities \(P_{ij}=Q_{ij}A_{ij}\) for \(i\neq j\), with the remaining probability kept at state \(j\), satisfy detailed balance with \(\boldsymbol{\pi}\).

Proof

For \(i\neq j\), \[ \begin{align} \pi_j P_{ij} = \pi_j Q_{ij} \min\left(1,\frac{\pi_i Q_{ji}}{\pi_j Q_{ij}}\right) = \min(\pi_j Q_{ij},\pi_i Q_{ji}). \end{align} \] The same calculation with \(i\) and \(j\) interchanged gives \[ \begin{align} \pi_i P_{ji} = \min(\pi_i Q_{ji},\pi_j Q_{ij}). \end{align} \] Thus \(\pi_jP_{ij}=\pi_iP_{ji}\).

33.4 Exercises

Exercise

Construct \(\bP\) for a 2-state chain with transition probabilities \(\{0.1, 0.2\}\). Find \(\boldsymbol{\pi}\).
Prove that \(|\lambda| \leq 1\) for any stochastic matrix using the result above.
Simulate 100 steps of PageRank on a 5-node graph. Check convergence rate vs. \(\lambda_2\) using the result above.
Find \(\boldsymbol{\pi}\) for a random walk on \(K_4\) (complete graph).
Give one reducible chain and one periodic irreducible chain. Explain which part of the result above fails.
Verify detailed balance for a three-state Metropolis-Hastings chain using the result above.