Verdict Tier 2: Causally Suggestive
Section titled “Verdict Tier 2: Causally Suggestive”| Tier | 2 of 5 (progressive) |
| What it means | Necessity established — removing the mechanism changes behavior — but sufficiency and specificity remain open |
| Minimum evidence | I1 (necessity) + E4 (effect magnitude) + V1 (level declaration) |
| Upgrade to Mechanistically supported | Sufficiency (I2) + Specificity (I3) + Consistency (I4) + Measurement reliability (M1) |
| Downgrade to Proposed | If the causal effect is shown to be an artifact of the ablation method or indistinguishable from random controls |
What this tier establishes
Section titled “What this tier establishes”A Causally Suggestive claim has crossed the most important threshold in mechanistic interpretability: from correlation to causation. At least one intervention (ablation, patching, or editing) has demonstrated that the claimed mechanism is necessary for the target behavior. Removing it changes the output in a way that exceeds random-component baselines.
What this tier does not establish is whether the mechanism is the whole story. Most circuits at this tier demonstrate necessity without sufficiency — ablating them hurts, but the circuit alone does not reproduce the behavior. Specificity is also typically untested: the ablation hurts the target behavior, but does it also degrade unrelated behaviors? If so, the component may be a general bottleneck rather than a specific mechanism.
This is where the majority of published mechanistic interpretability findings currently reside.
Example verdict statement
Section titled “Example verdict statement”Verdict: Causally suggestive —
[implementational-topographic]Claim: The IOI circuit (26 heads across layers 0-11) is necessary for indirect object identification in GPT-2 Small. Met: I1 (mean ablation of circuit heads reduces logit diff by 0.73, vs. 0.12 for size-matched random set), E4 (absolute effect = 2.1 logits), V1 (claim stated at head level) Open: I2 (sufficiency under resample ablation), I3 (specificity vs. general language tasks), C5 (multi-method convergence) Scope: GPT-2 Small, IOI task, ABBA template distribution
Minimum reporting for this tier
Section titled “Minimum reporting for this tier”- Ablation method named explicitly (zero, mean, resample, activation patching)
- Effect size on target behavior with confidence interval
- Random-component control (same number of components, randomly selected) with its effect size
- Baseline metric value (full model performance on the target behavior)
- Statement of whether sufficiency or specificity has been tested
Upgrade and downgrade
Section titled “Upgrade and downgrade”| Direction | What’s required |
|---|---|
| → Mechanistically supported | Sufficiency (I2): circuit alone recovers target behavior. Specificity (I3): selectivity index SI > 10 or meaningful task separation. Consistency (I4): replication across templates/methods/seeds. Reliability (M1): bootstrap |
| → Proposed (downgrade) | The causal effect disappears under a more appropriate ablation method (e.g., mean → resample), or the random-component control produces equal effect |
| → Underdetermined | Multiple non-overlapping circuits produce equivalent necessity effects and cannot be distinguished |
Characteristic occupants
Section titled “Characteristic occupants”- IOI circuit (Wang et al., 2022) under its original mean-ablation evaluation — necessity demonstrated, but sufficiency challenged by Miller et al. (2024)
- Knowledge neuron editing (Meng et al., 2022) — ROME demonstrates necessity of specific MLPs for factual recall, sufficiency partially established
- Individual head ablation studies — most activation-patching-based claims in the literature
- ACDC-discovered circuits (Conmy et al., 2023) — automated discovery with necessity verification but limited specificity testing
Key references
Section titled “Key references”- Wang et al. (2022). Interpretability in the Wild: a Circuit for Indirect Object Identification. arXiv:2211.00593
- Meng et al. (2022). Locating and Editing Factual Associations in GPT. arXiv:2202.05262
- Conmy et al. (2023). Towards Automated Circuit Discovery for Mechanistic Interpretability. arXiv:2304.14997
- Miller et al. (2024). Faithfulness Metrics for Circuit Discovery. arXiv:2407.08734