Skip to content
Tier2 of 5 (progressive)
What it meansNecessity established — removing the mechanism changes behavior — but sufficiency and specificity remain open
Minimum evidenceI1 (necessity) + E4 (effect magnitude) + V1 (level declaration)
Upgrade to Mechanistically supportedSufficiency (I2) + Specificity (I3) + Consistency (I4) + Measurement reliability (M1)
Downgrade to ProposedIf the causal effect is shown to be an artifact of the ablation method or indistinguishable from random controls

A Causally Suggestive claim has crossed the most important threshold in mechanistic interpretability: from correlation to causation. At least one intervention (ablation, patching, or editing) has demonstrated that the claimed mechanism is necessary for the target behavior. Removing it changes the output in a way that exceeds random-component baselines.

What this tier does not establish is whether the mechanism is the whole story. Most circuits at this tier demonstrate necessity without sufficiency — ablating them hurts, but the circuit alone does not reproduce the behavior. Specificity is also typically untested: the ablation hurts the target behavior, but does it also degrade unrelated behaviors? If so, the component may be a general bottleneck rather than a specific mechanism.

This is where the majority of published mechanistic interpretability findings currently reside.

Verdict: Causally suggestive — [implementational-topographic] Claim: The IOI circuit (26 heads across layers 0-11) is necessary for indirect object identification in GPT-2 Small. Met: I1 (mean ablation of circuit heads reduces logit diff by 0.73, vs. 0.12 for size-matched random set), E4 (absolute effect = 2.1 logits), V1 (claim stated at head level) Open: I2 (sufficiency under resample ablation), I3 (specificity vs. general language tasks), C5 (multi-method convergence) Scope: GPT-2 Small, IOI task, ABBA template distribution

  • Ablation method named explicitly (zero, mean, resample, activation patching)
  • Effect size on target behavior with confidence interval
  • Random-component control (same number of components, randomly selected) with its effect size
  • Baseline metric value (full model performance on the target behavior)
  • Statement of whether sufficiency or specificity has been tested
DirectionWhat’s required
→ Mechanistically supportedSufficiency (I2): circuit alone recovers target behavior. Specificity (I3): selectivity index SI > 10 or meaningful task separation. Consistency (I4): replication across templates/methods/seeds. Reliability (M1): bootstrap ρXX0.7\rho_{XX'} \geq 0.7
→ Proposed (downgrade)The causal effect disappears under a more appropriate ablation method (e.g., mean → resample), or the random-component control produces equal effect
→ UnderdeterminedMultiple non-overlapping circuits produce equivalent necessity effects and cannot be distinguished
  • IOI circuit (Wang et al., 2022) under its original mean-ablation evaluation — necessity demonstrated, but sufficiency challenged by Miller et al. (2024)
  • Knowledge neuron editing (Meng et al., 2022) — ROME demonstrates necessity of specific MLPs for factual recall, sufficiency partially established
  • Individual head ablation studies — most activation-patching-based claims in the literature
  • ACDC-discovered circuits (Conmy et al., 2023) — automated discovery with necessity verification but limited specificity testing
  • Wang et al. (2022). Interpretability in the Wild: a Circuit for Indirect Object Identification. arXiv:2211.00593
  • Meng et al. (2022). Locating and Editing Factual Associations in GPT. arXiv:2202.05262
  • Conmy et al. (2023). Towards Automated Circuit Discovery for Mechanistic Interpretability. arXiv:2304.14997
  • Miller et al. (2024). Faithfulness Metrics for Circuit Discovery. arXiv:2407.08734