Internal Validity — Formal Specification
Section titled “Internal Validity — Formal Specification”| Question | Does the evidence establish that the component implements the computation, not merely participates in it? |
| Lens | Neuroscience |
| Criteria | I1–I5 |
| Dependency | Internal validity is the workhorse — most MI evidence is internal-validity evidence. But it says nothing about whether the finding generalizes (external), whether the instrument is reliable (measurement), whether the construct is coherent (construct), or whether the narrative is correct (interpretive). |
| Status in MI | Best-addressed by existing methods; still routinely method-conditional |
| Last updated | 16 May 2026 |
Internal validity asks whether the causal inference from intervention to behavior is licensed within the experimental setup. The Neuroscience lens explains the intellectual background. This page gives the formal definitions, quantitative thresholds, and calibration data.
I1 — Necessity
Section titled “I1 — Necessity”Removing the component should degrade the behavior. For circuit , model , input , and counterfactual value :
Pass condition: with an equal-size random-component baseline producing .
| value | Interpretation |
|---|---|
| Strong necessity — component is critical for the behavior | |
| Moderate — component contributes but is not the sole driver | |
| Weak — component participates but may be one of many | |
| Not necessary — indistinguishable from random components |
Ablation method is part of the claim. Necessity scores are a joint property of the component and the ablation type. Miller, Chughtai & Saunders (2024) show that the same circuit’s faithfulness varies from 87% under mean ablation to below 50% under other methods. The full claim must state the ablation method.
Common confounds:
- Bottleneck confound. A component that many computations route through is necessary for all of them, but implements none in particular.
- Off-manifold confound. Zero and mean ablation push activations to values the model never encounters during training.
Calibration:
| Circuit | Method | Notes | |
|---|---|---|---|
| IOI name-movers (Wang et al. 2022) | Mean ablation | Drops logit diff from 3.56 to 0.46 | |
| IOI name-movers | Resample ablation | Method-dependent; same circuit, weaker score | |
| Induction heads (Olsson et al. 2022) | Mean ablation | High (qualitative) | Stronger on repeated sequences, weaker on non-repeated |
I2 — Sufficiency
Section titled “I2 — Sufficiency”Isolating or restoring the component should reproduce the behavior. The recovery fraction is:
Pass condition: on held-out prompts, with the complement ablation method stated.
| value | Interpretation |
|---|---|
| Strong sufficiency — circuit reproduces nearly all of the behavior in isolation | |
| Moderate — circuit captures most of the behavior | |
| Weak — circuit contributes substantially but something is missing | |
| Not sufficient — circuit alone does not drive the behavior |
The asymmetry with necessity. Necessity requires ablating the circuit. Sufficiency requires ablating everything outside the circuit. Resample ablation of the complement is a stricter test than mean ablation, since mean ablation leaves systematic residual signal.
Two forms of sufficiency:
- Isolation sufficiency: Run only the circuit; ablate the complement. This is what measures.
- Restoration sufficiency: In a corrupted prompt where the behavior fails, restoring only the circuit restores the behavior. This is the activation-patching form and typically yields higher because the rest of the model remains intact.
Calibration:
| Circuit | Method | Notes | |
|---|---|---|---|
| IOI (Wang et al. 2022) | Mean ablation of complement | 87% of logit diff recovered | |
| Greater-Than (Hanna et al. 2023) | Mean ablation of complement | 89.5% of probability diff recovered |
I3 — Specificity
Section titled “I3 — Specificity”The component should be more necessary for the target behavior than for unrelated behaviors.
Pass condition: against at least one related off-task behavior.
| Specificity value | Interpretation |
|---|---|
| Strong specificity — component is much more necessary for than | |
| Moderate specificity | |
| Weak specificity | |
| Inverted — component is more necessary for the control behavior (red flag) |
Off-task selection matters. The control behavior must be related, not trivially distinct.
| Target task | Informative off-task | Trivial off-task |
|---|---|---|
| IOI | Subject-verb agreement | Modular arithmetic |
| Greater-Than | Successor | Translation |
| Gendered pronouns | IOI | Factual recall |
The double dissociation test. The strongest specificity evidence is a double dissociation: ablating circuit impairs behavior but not , and ablating circuit impairs but not .
Calibration: No published circuit paper reports a formal specificity ratio against a related task. Induction heads have implicit specificity (stronger on repeated sequences than non-repeated), but this is not quantified as a ratio.
I4 — Consistency
Section titled “I4 — Consistency”The effect should replicate across contexts sufficient to rule out an artifact of the discovery distribution.
Pass condition: Replication across at least two of three axes, with bootstrap confidence intervals on the principal metrics.
| Axis | What it tests | Example |
|---|---|---|
| Cross-prompt | Template or paraphrase robustness | IOI with varied syntactic structures |
| Cross-seed | Independence from random initialization | Same circuit found in independently trained copies |
| Cross-checkpoint | Stability across training | Circuit present at step 50k, 100k, and 200k |
Calibration:
| Circuit | Cross-prompt | Cross-seed | Cross-checkpoint | Assessment |
|---|---|---|---|---|
| IOI (Wang et al. 2022) | Partial (name substitutions, ABBA/BABA) | Not tested | Not tested | One axis, partially |
| Induction heads (Olsson et al. 2022) | Yes (any repeated sequence) | Yes (multiple model families) | Yes (training dynamics) | All three axes — unusually strong |
| Greater-Than (Hanna et al. 2023) | Partial (year ranges) | Not tested | Not tested | One axis, partially |
I5 — Confound Control
Section titled “I5 — Confound Control”The observed effect should not be explained by collateral disruption to non-circuit components.
Pass condition: At least two ablation methods compared, with consistent results.
| Confound | Mechanism | Mitigation |
|---|---|---|
| Off-manifold ablation | Zero and mean ablation push activations to out-of-distribution values | Use resample ablation against a counterfactual distribution |
| Backup suppression | Ablating one component can suppress or activate backup mechanisms | Test individual and joint ablation; report backup activation |
| Layer-norm redistribution | Ablating a component changes layer-norm statistics for all subsequent components | Compare effects with and without freezing layer-norm parameters |
Method comparison protocol: Report the same metric under at least two ablation methods. If the results diverge substantially, the finding is method-conditional — flag it as such.
Calibration:
| Circuit | Methods compared | Consistent? | Notes |
|---|---|---|---|
| IOI (Wang et al. 2022) | Mean ablation only | N/A — single method | Miller et al. (2024) later showed method-dependence |
| IOI (Miller et al. 2024) | Mean vs. resample vs. others | No — substantial divergence | Faithfulness ranges 87% to below 50% depending on method |
Partial-pass interpretation
Section titled “Partial-pass interpretation”| Evidence pattern | Criteria met | Interpretation | Recommended language |
|---|---|---|---|
| Necessary but not sufficient | I1 | Distributed or incomplete circuit | ”Causally implicated, not localized” |
| Sufficient but not necessary | I2 | Redundancy or forced route | ”A capable route, not shown necessary” |
| Necessary + sufficient, not specific | I1, I2 | General-capability component | ”Real mechanism, not task-specific” |
| Necessary + sufficient + specific, not consistent | I1, I2, I3 | Benchmark artifact possible | ”Locally established, not yet robust” |
| Strong I1 + I2, single ablation method | I1, I2 (conditional) | Method-conditional claim | ”Sufficient under [method]; not tested under alternatives” |
| All five met | I1–I5 | Full internal validity | Upgrade to external validity testing |
Protocol
Section titled “Protocol”For circuit and behavior :
- I1. Ablate ; record under at least two methods. Compare to equal-size random baseline.
- I2. Ablate complement; record . Use held-out prompts not used for discovery.
- I3. Compute for one related off-task . Report specificity ratio.
- I4. Replicate across at least two of: cross-prompt, cross-seed, cross-checkpoint.
- I5. Compare results across ablation methods. If inconsistent, report the range and flag as method-conditional.