Skip to content
Tier5 of 5 (progressive)
What it meansComplete mechanistic account within stated scope — every component characterized, quantitative predictions confirmed
Minimum evidenceAll five validity types pass + component-level function (IfunI_{\text{fun}}) + novel quantitative prediction confirmed + scope boundary tested + coverage κ>0.9\kappa > 0.9
UpgradeN/A (highest progressive tier)
Downgrade to TriangulatedIf completeness fails (uncharacterized components discovered) or a quantitative prediction is refuted

A Validated claim represents what “fully understood” looks like within mechanistic interpretability. The account is closed: every component’s function is known, the information flow is demonstrated end-to-end, the account generates quantitative predictions that have been tested, and the scope is explicitly bounded.

The scope restriction is not a weakness — it is honesty about what has been established. Validated is not “true of all language models” or even “true of this model on all inputs.” It is “true of this model, on this class of inputs, within this explanatory scope.” The boundary should be tested: cases just outside the claimed scope should show the mechanism failing or degrading.

Why so few claims reach this tier: Validated requires completeness, not just correctness. A circuit can be correctly identified (every component it names is causally involved) without being completely characterized (every component’s function is known and the information flow is fully traced). For real-model circuits with dozens of components, achieving completeness remains expensive and technically difficult.

Verdict: Validated (within scope) — [implementational-algorithmic] Claim: The one-layer transformer trained on modular addition implements a discrete Fourier transform algorithm for computing (a+b)modp(a + b) \mod p. Met: All five validity types. IfunI_{\text{fun}}: each neuron’s function characterized as a specific Fourier component. Novel prediction: model confidence should be periodic in (a+b)(a + b) with period pp — confirmed. Scope boundary: mechanism fails on multiplication (outside scope). Open: Generalization to multi-layer or larger models (outside stated scope). Scope: One-layer transformer, modular addition, p=113p = 113, trained to grokking

  • Complete component-level function table (each component’s input-output mapping characterized)
  • End-to-end information flow diagram with no gaps
  • At least one novel quantitative prediction stated before confirmation
  • Scope boundary: specific inputs or conditions where the mechanism demonstrably fails
  • Coverage metric κ>0.9\kappa > 0.9 on a representative distribution within scope
  • All five validity types explicitly assessed and passing
DirectionWhat’s required
→ (no higher tier)The claim can expand in scope (same mechanism confirmed in larger models, broader tasks) but this expands the scope declaration rather than changing the tier
→ Triangulated (downgrade)An uncharacterized component is discovered within the claimed scope. Or a quantitative prediction fails. Or coverage drops below threshold on a sample within the stated distribution.
  • Grokking / modular addition (Nanda et al., 2023) — a toy transformer where every weight matrix is explained by the Fourier algorithm, quantitative predictions about periodicity are confirmed, and the scope (one-layer model, single arithmetic task) is explicit
  • Superposition in toy models (Elhage et al., 2022) — validated as a mathematical framework within toy models with known feature statistics and controlled geometry

Validated is not “true of all language models” or “true of this model on all inputs.” It is “true of this model, on this class of inputs, within this explanatory scope.” The scope restriction is epistemic honesty about what has actually been established. Expanding the scope is possible but constitutes a new claim requiring its own validation.