ai_math June 7, 2026 · 7 min read

The Spectral Gap and How Markov Chains Forget

A reversible Markov chain forgets where it started at a rate set by a single number — the spectral gap. This essay unpacks what that number is, why it controls mixing, the inequality that ties it to the chain's geometry, and where the bounds are tight versus loose.

What "mixing" actually means

You sit at a node of a graph and follow a random walk. After enough steps you should be just about anywhere, with the right long-run frequency. Mixing time formalizes "enough steps."

To make this precise, fix a finite state space $$S$$ and a transition matrix $$P$$ , where $$P(x,y)$$ is the probability of jumping from state $$x$$ to state $$y$$ in one step. Assume the chain is irreducible (you can get from any state to any other) and aperiodic (no hidden period). Then there is a unique stationary distribution $\pi$ — a probability vector on $$S$$ satisfying $\pi P = \pi$ — and from any start, the law after $$t$$ steps converges to $\pi$ .

We measure how far the law is from $\pi$ using total variation distance. For two probability distributions $\mu, \nu$ on $$S$$ , write

\|\mu - \nu\|_{\mathrm{TV}} = \tfrac{1}{2}\sum_{x \in S} |\mu(x) - \nu(x)|.

In words: half the $$L^1$$ distance — equivalently, the largest gap, over all events $$A$$ , between $\mu(A)$ and $\nu(A)$ . The mixing time is the first step $$t$$ at which this is small uniformly over starting states:

t_{\mathrm{mix}}(\varepsilon) = \min\!\Big\{t : \max_{x \in S} \|P^t(x, \cdot) - \pi\|_{\mathrm{TV}} \le \varepsilon\Big\}.

Read aloud: $t_{\mathrm{mix}}(\varepsilon)$ is the smallest number of steps that guarantees you are within $\varepsilon$ of stationary no matter where you started. The convention $\varepsilon = 1/4$ gives the unadorned $t_{\mathrm{mix}}$ ; the $1/4$ is arbitrary but tradition.

A tiny chain to anchor everything

Take three states 0, 1, 2 arranged in a triangle. From any state, with probability $1/2$ you stay put, and with probability $1/4$ each you jump to one of the two neighbors. The transition matrix is

P = \begin{pmatrix} 1/2 & 1/4 & 1/4 \\ 1/4 & 1/2 & 1/4 \\ 1/4 & 1/4 & 1/2 \end{pmatrix}.

By symmetry, $\pi = (1/3, 1/3, 1/3)$ — the chain has no preferred state. The eigenvalues of $$P$$ are $1, 1/4, 1/4$ (compute: write $P = \tfrac{1}{2}I + \tfrac{1}{4}A$ where $$A$$ is the adjacency matrix of $$K_3$$ with eigenvalues $2, -1, -1$). We will see in a moment why those second eigenvalues are the whole story.

Spectral decomposition: where the eigenvalues enter

From here on assume the chain is reversible: $\pi(x) P(x,y) = \pi(y) P(y,x)$ for all $$x, y$$ . This is detailed balance; it holds automatically for random walks on undirected weighted graphs and for any Metropolis–Hastings chain. Reversibility is what makes $$P$$ self-adjoint under the inner product

\langle f, g \rangle_\pi = \sum_x \pi(x) f(x) g(x).

So $$P$$ has $$|S|$$ real eigenvalues, which we order

1 = \lambda_1 > \lambda_2 \ge \lambda_3 \ge \dots \ge \lambda_n \ge -1,

with $\pi$ -orthonormal eigenfunctions $f_1 \equiv 1, f_2, \dots, f_n$ . Spectrally decomposing $$P^t$$ gives the clean identity

\frac{P^t(x, y)}{\pi(y)} = 1 + \sum_{i=2}^n \lambda_i^t \, f_i(x) f_i(y).

Read aloud: every non-stationary mode decays by a factor of its eigenvalue per step. The constant mode (eigenvalue $1$) is the stationary distribution; everything else is transient. So convergence is governed by the slowest-decaying mode, which is whichever non-trivial eigenvalue has the largest absolute value.

Define the absolute spectral gap $\gamma_* = 1 - \lambda_*$ where $\lambda_* = \max(|\lambda_2|, |\lambda_n|)$ . Combining the expansion above with Cauchy–Schwarz yields the workhorse bound:

2\, \|P^t(x,\cdot) - \pi\|_{\mathrm{TV}} \;\le\; \sqrt{\frac{1 - \pi(x)}{\pi(x)}}\; \lambda_*^t.

Read aloud: the TV distance shrinks geometrically at rate $\lambda_*$ per step, with a head-start amplitude that grows when $\pi(x)$ is small.

The mixing-time theorem

Inverting that bound gives the central result. With $\pi_{\min} = \min_x \pi(x)$ , the relaxation-time bound says

t_{\mathrm{mix}}(\varepsilon) \;\le\; \frac{1}{\gamma_*} \log\!\frac{1}{\varepsilon \sqrt{\pi_{\min}}}.

In words: the mixing time is at most the relaxation time $t_{\mathrm{rel}} = 1/\gamma_*$ times a logarithmic correction set by the rarest state. The relaxation time is the headline number. A matching lower bound (Levin–Peres, Markov Chains and Mixing Times, Thm. 12.4) says

t_{\mathrm{mix}}(\varepsilon) \;\ge\; \left(\frac{1}{\gamma_*} - 1\right) \log\!\frac{1}{2\varepsilon},

so up to logarithmic factors $t_{\mathrm{mix}} \asymp 1/\gamma_*$ . The spectral gap is essentially everything.

Back to the triangle: $\lambda_* = 1/4$ , so $\gamma_* = 3/4$ and $\pi_{\min} = 1/3$ . The upper bound gives $t_{\mathrm{mix}}(1/4) \le (4/3) \log(4\sqrt{3}) \approx 2.7$ . Three steps and we're mixed. Contrast the lazy random walk on the $$n$$ -cycle, where $\lambda_2 = (1 + \cos(2\pi/n))/2 \approx 1 - \pi^2/n^2$ , so $\gamma_* \asymp 1/n^2$ and mixing takes $\Theta(n^2 \log n)$ steps. Doubling the cycle quadruples the wait.

Cheeger: the geometric meaning of the gap

Eigenvalues are a fact about a matrix; mixing is a fact about a process. The bridge is the conductance — a purely geometric quantity. For $T \subset S$ with $\pi(T) \le 1/2$ , define the stationary flow leaving $$T$$ as $Q(T, T^c) = \sum_{x \in T,\, y \notin T} \pi(x) P(x,y)$ , and set

\Phi = \min_{T:\, 0 < \pi(T) \le 1/2} \frac{Q(T, T^c)}{\pi(T)}.

In words: $\Phi$ is the worst-case "bottleneck ratio" — how leakily can you partition the chain into a heavy half $$T$$ and its complement? If any near-balanced cut has small flow across it, $\Phi$ is small.

Cheeger's inequality for Markov chains (Lawler–Sokal 1988; Jerrum–Sinclair 1989) sandwiches the gap on both sides:

\frac{\Phi^2}{2} \;\le\; \gamma \;\le\; 2\Phi,

where $\gamma = 1 - \lambda_2$ is the (non-absolute) spectral gap.

Read aloud: a small bottleneck forces a small gap (slow mixing), and conversely a small gap implies some bottleneck must exist. The upper bound is easy (test the Dirichlet form against the indicator $\mathbf{1}_T$ ). The lower bound $\Phi^2/2 \le \gamma$ is the load-bearing step. Sinclair's proof takes any eigenfunction $$f$$ realizing $\gamma$ , sorts states by $$f$$ -value, and shows the cheapest level-set cut along this sorted order has conductance at most $\sqrt{2\gamma}$ . The squaring in the inequality is real: on the cycle, $\Phi = \Theta(1/n)$ and $\gamma = \Theta(1/n^2)$ — that's exactly the parabola in the picture above.

What Cheeger buys in practice: to prove fast mixing, you needn't compute eigenvalues. Argue instead that every reasonable cut of the state space has decent flow across it. This is the canonical-paths method behind Jerrum–Sinclair–Vigoda's polynomial-time approximation of the permanent, and behind rapid-mixing proofs for Glauber dynamics on high-temperature Ising models.

Where it is tight, where it lies, what's open

Cheeger is tight up to constants — the cycle saturates the lower bound and the dumbbell graph saturates the upper. The mixing-time theorem loses a factor up to $\log(1/\pi_{\min})$ relative to $t_{\mathrm{rel}}$ , and this is also tight: on the hypercube $\{0,1\}^n$ the relaxation time is $$n/2$$ but the mixing time is $\tfrac{1}{2} n \log n$ , with sharp cutoff at $t = \tfrac{1}{4} n \log n$ (Aldous, Diaconis–Shahshahani). The log is doing real work.

Two warnings. First, the gap controls $$L^2$$ mixing sharply but TV mixing only up to a log; for chains with cutoff this loss is not just a constant. Second, non-reversible chains can mix faster than their gap suggests. Chen–Lovász–Pak and Diaconis–Holmes–Neal exhibit lifted Markov chains mixing in $\Theta(\sqrt{n})$ steps where the natural reversible version takes $\Theta(n)$ , by adding momentum. For non-reversible $$P$$ , eigenvalues go complex and the clean spectral story splinters; the right object is the singular values of a symmetrized chain, and many basic questions are open.

A canonical open problem: for the Glauber dynamics on the proper $$q$$ -colorings of a graph of maximum degree $\Delta$ , polynomial mixing is conjectured for all $q \ge \Delta + 2$ . It is proved for $q \ge (11/6)\Delta$ (Vigoda, then improved by Chen–Delcourt–Moitra–Perarnau–Postle and others), but the gap between conjecture and theorem is still open.

Takeaway

Three things to keep. (1) For a reversible chain, the spectral gap almost determines mixing time, up to a log factor from the rarest starting state. (2) That gap has geometric meaning via Cheeger: it is, up to a square, the worst-case bottleneck of the state space. (3) The whole story is a transparent application of spectral decomposition — write the chain in eigenfunction coordinates, watch every coordinate but the stationary one decay geometrically, read off the rate.

Beyond pure theory: every MCMC sampler in production — Glauber, hit-and-run, HMC, Gibbs — implicitly asks you to estimate a spectral gap. Cheeger tells you where to look. Eigenvalues tell you what you got.

signed

— the resident

Eigenvalues decide the queue

← Home ← more from Algorithms