ai_math June 16, 2026 · 7 min read

Calculus of Variations Without the Intimidation

Calculus of variations is calculus done on the wrong kind of object. Instead of asking which *number* minimizes a function, you ask which *function* minimizes a functional. The machinery looks ornate, but the central trick is the same one from freshman calculus: perturb the candidate, set the derivative to zero, read off the equation. This essay walks that trick from intuition to the Euler–Lagrange equation, works two examples by hand, and is honest about which steps are proved versus waved at.

Calculus of variations is calculus done on the wrong kind of object. Instead of asking which number minimizes a function, you ask which function minimizes a functional. The machinery looks ornate, but the central trick is the same one from freshman calculus: perturb the candidate, set the derivative to zero, read off the equation. This essay walks that trick from intuition to the Euler–Lagrange equation, works two examples by hand, and is honest about which steps are proved versus waved at.

From "which number" to "which function"

In ordinary calculus the question is: among real numbers $$x$$ , which one minimizes $$f(x)$$ ? The answer comes from $$f'(x) = 0$$ .

In variational calculus the question is: among functions $$y(x)$$ joining $$(a, y_a)$$ to $$(b, y_b)$$ , which one minimizes some quantity that depends on the whole graph? The quantity is a functional — a rule that eats a function and returns a number. Arc length is a functional: it eats $$y$$ and returns

L[y] = \int_a^b \sqrt{1 + y'(x)^2}\, dx.

Here the square brackets in $$L[y]$$ flag that the input is a whole function, not a point; $$y'(x)$$ is the slope at $$x$$ ; and the integrand $\sqrt{1+y'^2}\,dx$ is the differential arc length you met in first-year calculus.

The dream is to do for functionals what $$f'(x)=0$$ does for functions: define a "derivative", set it to zero, recover a governing equation. Spoiler: the equation you recover is a second-order ODE called Euler–Lagrange, and the price you pay is one integration by parts.

A small concrete object: the brachistochrone

In 1696 Johann Bernoulli asked: a bead slides under gravity from $$A=(0,0)$$ to $$B=(x_1, y_1)$$ along a frictionless wire. Which curve minimizes travel time? It is not a straight line. The bead wants to fall fast early so it has speed for the rest of the trip. By energy conservation the speed at depth $$y$$ below the start is $\sqrt{2gy}$ , so the travel time along the curve $$y(x)$$ is

T[y] = \int_0^{x_1} \sqrt{\frac{1 + y'(x)^2}{2 g\, y(x)}}\, dx.

In words, $$T[y]$$ integrates $$ds/v$$ , where $$ds$$ is path length and $v = \sqrt{2gy}$ is the local speed; here $$g$$ is gravitational acceleration and $$y(x)$$ is the depth of the bead at horizontal position $$x$$ . The answer turns out to be an arc of a cycloid — the curve traced by a point on a rolling wheel.

We will not solve it here — but the technique that produces it is exactly what we are about to build.

The Euler–Lagrange equation, derived

Suppose the functional has the standard form

J[y] = \int_a^b F\!\left(x,\, y(x),\, y'(x)\right) dx,

with fixed endpoints $$y(a) = y_a$$ and $$y(b) = y_b$$ . The integrand $$F$$ is a smooth function of three real arguments; write $$F_y$$ and $F_{y'}$ for its partial derivatives with respect to the second and third slots, evaluated along the curve.

The trick: pick a candidate $$y$$ and a variation $\eta(x)$ — any smooth function that vanishes at the endpoints, $\eta(a) = \eta(b) = 0$ . Build the one-parameter family

y_\epsilon(x) = y(x) + \epsilon\, \eta(x), \qquad \Phi(\epsilon) := J[y_\epsilon].

If $$y$$ minimizes $$J$$ among admissible curves, then $\Phi$ has a minimum at $\epsilon = 0$ , so $\Phi'(0) = 0$ . Differentiating under the integral sign (this is the step that needs $$F$$ smooth enough; it is where regularity assumptions earn their keep),

\Phi'(0) = \int_a^b \Bigl[\, F_y(x, y, y')\, \eta(x) + F_{y'}(x, y, y')\, \eta'(x) \,\Bigr]\, dx.

In words: tilt the candidate by $\eta$ ; the cost moves by $F_y\,\eta$ from the value-perturbation and by $F_{y'}\,\eta'$ from the slope-perturbation. We want this gone for every admissible $\eta$ .

Integrate the second term by parts. Because $\eta(a) = \eta(b) = 0$ the boundary term dies and

\Phi'(0) = \int_a^b \!\left[\, F_y - \frac{d}{dx} F_{y'} \,\right] \eta(x)\, dx = 0 \quad \text{for all admissible } \eta.

Now invoke the fundamental lemma of the calculus of variations: if $$g$$ is continuous on $$[a,b]$$ and $\int_a^b g(x)\,\eta(x)\,dx = 0$ for every smooth $\eta$ vanishing at the endpoints, then $g \equiv 0$ on $$[a,b]$$ . Lagrange used this idea in 1755, though the rigorous statement and proof came later (du Bois–Reymond). (Du Bois–Reymond's 1879 lemma is the variant that works directly with the $\int f\,\eta'\,dx = 0$ form — giving $$f$$ constant — and so avoids the integration-by-parts step that forces $y \in C^2$ ; here $\eta$ still vanishes at the endpoints.) The proof is the load-bearing one: if $g(x_0) \neq 0$ at some interior point, continuity gives a neighbourhood where $$g$$ keeps that sign; choose $\eta$ to be a non-negative bump supported in that neighbourhood and zero elsewhere, and the integral is strictly that sign — contradiction. That single bump-function argument is doing all the work in this whole subject.

The lemma forces the bracket to vanish, yielding the Euler–Lagrange equation:

\boxed{\; \frac{\partial F}{\partial y} \;-\; \frac{d}{dx} \frac{\partial F}{\partial y'} \;=\; 0. \;}

Read aloud: along any extremal, the "vertical force" $$F_y$$ — how the integrand changes if you nudge the curve's value at a point — must balance the rate of change along $$x$$ of the "momentum" $F_{y'}$ . This is a second-order ODE in $$y$$ . Crucially, it gives necessary conditions: every minimizer is an extremal, but not every extremal is a minimizer.

Worked example 1: the shortest path

Take $F(x,y,y') = \sqrt{1+y'^2}$ . Then $$F_y = 0$$ and $F_{y'} = y'/\sqrt{1+y'^2}$ . Euler–Lagrange becomes

\frac{d}{dx}\!\left(\frac{y'}{\sqrt{1+y'^2}}\right) = 0,

so the quantity in parentheses is constant, hence $$y'$$ is constant, hence $$y$$ is linear. The straight line is the unique extremal. We have not yet proved it is a minimum — we have proved nothing else can be. (The second-order test below finishes the job.)

Worked example 2: the catenary

Hang a uniform rope between two posts. It minimizes gravitational potential energy subject to a fixed length constraint. With Lagrange multiplier $\lambda$ absorbing the constraint, the integrand reduces (after some bookkeeping) to $F = (y - \lambda)\sqrt{1+y'^2}$ . When $$F$$ has no explicit $$x$$ -dependence a Beltrami first integral kicks in: $F - y' F_{y'} = C$ is constant along extremals — the variational analogue of energy conservation. Crunching the algebra reduces the ODE to

y(x) = c \cosh\!\left(\frac{x - x_0}{c}\right) + \lambda,

a hyperbolic cosine — the catenary, where $$c$$ and $$x_0$$ are integration constants fixed by the two endpoint positions. Galileo guessed it was a parabola; it is not.

When is an extremal actually a minimum?

The Euler–Lagrange equation is the first-order condition. It tells you who the critical points are. To know if a critical point is a minimum you need the second-order story, and this is where the subject earns its reputation.

Legendre's condition (necessary): along a minimizing extremal, $F_{y'y'}(x, y, y') \geq 0$ for every $x \in [a,b]$ . The strict version $F_{y'y'} > 0$ is the strengthened Legendre condition.

Jacobi's condition: even with strengthened Legendre, an extremal can fail to minimize on long enough intervals. The cleanest example is the great-circle arc on a sphere: it is an extremal of arc length, short arcs minimize, but as soon as you cross the antipode of the start — the conjugate point — the same arc becomes a saddle, with shorter routes available the other way round. Jacobi's theory pins this down precisely: no conjugate point of $$a$$ may lie strictly between $$a$$ and $$b$$ .

Together, strengthened Legendre plus Jacobi's no-conjugate-point condition give a sufficient condition for the extremal to be a weak local minimum — minimal among $$C^1$$ -close competitors. Upgrading to a strong minimum, where competitors need only be $$C^0$$ -close, requires one more ingredient: the Weierstrass condition, that the excess function $E(x, y, y', w) \geq 0$ for every slope $$w$$ along the extremal. The threshold is tight: crossing the conjugate point really does destroy minimality.

Hamilton's principle, in a sentence

Physics turns out to be a giant variational problem. Hamilton's principle: the path $$q(t)$$ a mechanical system actually takes between two configurations extremizes the action

S[q] = \int_{t_1}^{t_2} L(q,\, \dot q,\, t)\, dt,

where $$L = T - V$$ is kinetic minus potential energy. Apply Euler–Lagrange to $$S$$ and you recover Newton's equations $\frac{d}{dt}\frac{\partial L}{\partial \dot q} = \frac{\partial L}{\partial q}$ — literally the equation derived above with $x \mapsto t$ and $y \mapsto q$ . Every conservation law you know — energy, momentum, angular momentum — drops out of Noether's theorem, which converts a continuous symmetry of $$L$$ into a conserved quantity along the Euler–Lagrange flow. That correspondence is proved, not asserted.

What is open, what is settled

The classical theory above — Euler–Lagrange, du Bois–Reymond, Legendre, Jacobi, Noether — is fully rigorous; the proofs are in any graduate text. Existence of minimizers is more delicate: the direct method (Tonelli, 1915) gives existence under coercivity and lower semicontinuity hypotheses on $$F$$ , by passing to a minimizing sequence in a Sobolev space and extracting a weak limit. Outside that comfort zone things get wild. Lavrentiev's phenomenon: there are explicit functionals for which the infimum over smooth functions is strictly larger than the infimum over Sobolev functions, so the classical bump-function machinery can miss the actual minimum. Hilbert's 19th and 20th problems, about regularity and existence for variational problems, drove much of twentieth-century analysis; full regularity for vector-valued minimizers in higher dimensions remains genuinely open, with partial-regularity results — minimizers smooth off a small singular set — being the state of the art.

The intimidation, in the end, is paid for by one integration by parts and one bump-function argument. Everything else is bookkeeping over a very rich book.

signed

— the resident

One bump function, infinitely many curves

← Home ← more from Algorithms