The Five Validity Types
Section titled “The Five Validity Types”Validity is the organizing framework for what a circuit claim must satisfy to count as evidence rather than coincidence. The four-validity taxonomy originates in experimental design (Cook & Campbell 1979; Shadish, Cook & Campbell 2002), where it was developed to evaluate causal claims in social science and biomedical research. The fifth type — interpretive validity — is added to address the gap between validated mechanisms and defensible explanations that the MI-specific literature has surfaced (Geiger et al. 2024; Méloux et al. 2026).
The intellectual lineage of the five-type framework:
| Validity type | Origin | Foundational references |
|---|---|---|
| Construct | Philosophy of science / measurement theory | Cronbach & Meehl (1955); Craver (2007) |
| Internal | Experimental methodology / systems neuroscience | Campbell (1957); Woodward (2003) |
| External | Experimental methodology / pharmacology | Cook & Campbell (1979); Shadish et al. (2002) |
| Measurement | Classical test theory / measurement theory | Lord & Novick (1968); Campbell & Fiske (1959) |
| Interpretive | Mechanistic interpretability methodology | Marr (1982); Geiger et al. (2021) |
Why five types rather than one global score
Section titled “Why five types rather than one global score”A single score summarizing “how good a circuit is” obscures the fact that circuit claims can fail in qualitatively different ways. A circuit can be measured by a reliable instrument and still correspond to no coherent computational concept. A circuit can correspond to a coherent concept and rest on purely correlational evidence. A circuit can survive rigorous causal testing at one intervention strength on one prompt distribution and collapse under any other. A circuit can pass all of those tests and still be described at the wrong level of abstraction. These are not points on a continuum; they are independent failures that demand independent remedies.
The five-type taxonomy makes those failures named and reportable. A verdict that satisfies internal validity but not construct validity is honestly described as causally implicated but theoretically underspecified, and the remedy is a clearer construct rather than more interventions. A verdict that satisfies construct validity but not external validity is coherent but local, and the remedy is replication rather than redefinition. A verdict that passes all four traditional types but fails interpretive validity is validated but overclaimed, and the remedy is scoping the narrative to match the evidence level. These distinctions are routinely collapsed in MI write-ups, with the result that claims of different types are presented as if they were equivalent.
The five types
Section titled “The five types”| Type | Question | Parent discipline | Key criteria | |---|---|---| | Construct | Is the claimed entity a coherent theoretical concept? | Philosophy of science (Cronbach & Meehl 1955; Craver 2007) | C1 Falsifiability, C2 Structural plausibility, C3 Task specificity, C4 Minimality, C5 Convergent validity | | Measurement | Is the instrument that produced the evidence trustworthy? | Measurement theory (Campbell & Fiske 1959; Lord & Novick 1968) | M1 Reliability, M2 Invariance, M3 Baseline separation, M4 Sensitivity, M5 Calibration, M6 Construct coverage | | Internal | Does the evidence establish that the component implements the computation? | Systems neuroscience (Woodward 2003; Craver 2007) | I1 Necessity, I2 Sufficiency, I3 Specificity, I4 Consistency, I5 Confound control | | External | Does the claim generalize beyond the tested conditions? | Pharmacology (Clark 1926; Hill 1910; Gaddum 1937) | E1 Intervention reach, E2 Graded response, E3 Selectivity, E4 Effect magnitude, E5 Robustness, E6 Cross-architecture | | Interpretive | Is the narrative about the mechanism licensed by the evidence? | Mechanistic interpretability methodology (Marr 1982; Geiger et al. 2024) | V1 Level declaration, V2 Level-evidence match, V3 Narrative coherence, V4 Alternative exclusion, V5 Scope honesty |
How the five types interact
Section titled “How the five types interact”The types are not independent in the sense that they can be evaluated in any order; they have an implicit dependency structure:
- Construct validity comes first. A construct that is not clearly defined cannot be measured, and an ambiguous construct cannot have its causal role meaningfully tested.
- Measurement validity gates internal validity. An instrument that is unreliable cannot support a causal inference. A high IIA score computed without a random-vector baseline does not license a representational claim, regardless of how well the internal interventions performed.
- Internal validity gates external validity. A finding that has not been established causally within the discovery conditions cannot be said to generalize. External validity asks about the reach of an established result, not the credibility of an unestablished one.
- External validity gates upgrade from result to property. An internally valid claim that does not generalize is a local result rather than a finding. The upgrade from result on a benchmark to property of the model requires external validity evidence.
- Interpretive validity is downstream of all four. A narrative about a mechanism cannot be evaluated until the mechanism itself has been established as real, generalizable, coherent, and well-measured.
The dependency order does not mean that work on any single type must wait for the previous one to be finished. It means that a verdict at any level should name the types at which evidence is missing, rather than upgrading the verdict on the strength of evidence from a different type.
How each type connects to its casebook
Section titled “How each type connects to its casebook”Each validity type has a dedicated casebook that translates the type’s abstract requirements into operational criteria, instruments, and reporting rules.
- Construct validity is operationalized by the Philosophy of Science Casebook, which provides falsifiability, structural plausibility, task specificity, minimality, and convergent validity criteria.
- Measurement validity is operationalized by the Measurement Theory Casebook, which provides reliability, invariance, baseline separation, sensitivity, calibration, and construct coverage criteria.
- Internal validity is operationalized by the Neuroscience Casebook, which provides necessity, sufficiency, specificity, consistency, and confound-control criteria.
- External validity is operationalized by the Pharmacology Casebook, which provides intervention reach, graded response, selectivity, effect magnitude, robustness, and cross-architecture generalization criteria.
- Interpretive validity is operationalized by the Mechanistic Interpretability Casebook, which provides level declaration, level-evidence match, narrative coherence, alternative exclusion, and scope honesty criteria.
How validity types connect to verdicts
Section titled “How validity types connect to verdicts”A circuit claim must eventually address all five validity types. In practice, evidence accumulates incrementally, and the verdict tiers encode which types have been addressed so far:
| Verdict tier | Validity types addressed | What’s still open |
|---|---|---|
| Proposed (A) | Construct (partial) | Internal, external, measurement, interpretive |
| Causally suggestive (B) | Construct (partial) + Internal (I1 only) | Sufficiency, specificity, consistency, external, measurement |
| Triangulated (D) | Construct (partial) + Internal (I1–I4) + External (partial) | Full construct, measurement, interpretive |
| Mechanistically supported (E) | Construct + Internal + External + Measurement | Interpretive |
| Validated (F) | All five types | — |
No published MI paper has yet reached the Validated tier under this framework. Most published circuits sit between Causally suggestive and Triangulated.
Where the literature most often goes wrong
Section titled “Where the literature most often goes wrong”The taxonomy makes recurring error patterns diagnosable:
- Construct conflation. A circuit named for a behavior is treated as though the behavior and the circuit were the same concept. “The IOI circuit” conflates the behavior (indirect object identification) with the particular set of components found by a particular method.
- Causal overreach. Internal-validity evidence (ablation degrades performance) is reported as establishing external validity (“the model uses this circuit for IOI”) without testing generalization.
- Baseline omission. Measurement-validity failures (M3) are presented as internal-validity successes. An IIA of 0.48 is impressive until the random-vector baseline turns out to be 0.44 (Sutter et al. 2025).
- Single-prompt generalization. External-validity claims are made from a single prompt distribution, treating the distribution as the phenomenon rather than a sample from it.
- Level-evidence mismatch. Implementational evidence (ablation) is presented as licensing algorithmic-level claims (“this head implements name-moving”) without the causal abstraction evidence (IIA) required for the upgrade.
Each error is named by one of the five types, and each has a specific remedy.