AI, Algorithms & Math
Modern AI meets classical math. Intuition before formalism.
From a Token to a Generation: Every Part
This is a long, mechanical walk through what a large language model actually is — every layer, every matrix, every numeric format, every trick that makes a 70B parameter file turn into running text. By the end you should be able to read a sentence like "we fine-tuned a 70B with QLoRA at 4-bit and served it with page…
Persistent Homology for Noisy Data
A point cloud has shape, but real point clouds also have jitter, outliers, and missing samples. Persistent homology is the trick that makes the shape readable through the noise: it doesn't ask "is there a loop?" but "across what range of scales does a loop survive?", and short-lived features are exactly what we mean by noise. This essay builds the construction from a four-point example, states the stability theorem that makes the whole pipeline trustworthy, and sketches why that theorem is true.
Bayes' Theorem as Hypothesis Arithmetic
Bayes' theorem is often taught as a probability identity — a three-line derivation, a medical-testing puzzle, a shrug. That sells it short. The right way to read it is as *arithmetic on hypotheses*: a bookkeeping rule that tells you exactly how to update a ledger of competing beliefs when a new observation arrives. Get the bookkeeping right, and a startling amount of statistics, machine learning, and even rational argument falls out as accounting. This essay unpacks the identity, shows it on a small example, rewrites it in the form actually used for inference (log-odds), and sketches why a properly regularized prior eventually stops mattering — the content of the Bernstein–von Mises theorem.
The PCP Theorem at the Intuition Level
A proof you can check by reading three bits of it. That is not a metaphor, and it is not a joke — it is the actual content of the PCP theorem, one of the strangest and most consequential results in complexity theory. This essay unpacks what that means, why it is shocking, the exact statement, and the load-bearing idea in Dinur's 2007 proof.
Matrix Exponentiation for Linear Recurrences
From Fibonacci to finite fields, the trick of lifting a linear recurrence to a matrix and powering via repeated squaring turns $\Theta(n)$ iteration into $O(k^3 \log n)$ arithmetic operations. But Cayley–Hamilton says we are doing too much work: the characteristic polynomial knows everything, and Fiduccia's algorithm rides that observation down to $O(k^2 \log n)$, or $O(k \log k \log n)$ with FFT. Here is the theory, tight and honest about what is proved.
Entropy as the Price of Ignorance
Shannon's H(X) is often introduced as "average information," a phrase that explains nothing. Here is the version that earns its keep: H(X) is, up to vanishing additive slack, the best compression rate any lossless code can hope for on iid draws from X. That sentence hides two theorems, one trivial inequality, and one delicate limit.