Skip to content
Validity typeExternal
Pass conditionThe absolute effect is large enough to support the computational story being told
Evidence familyBehavioral
Minimum reportingAbsolute recovery fraction (not just statistical significance); comparison to published baselines
Common failure modeReporting statistical significance without absolute magnitude

Effect magnitude distinguishes between a component that is causally necessary (absence degrades behavior by any nonzero amount) and one that implements the computation (absence degrades behavior by a large fraction of the total effect).

A head accounting for 3% of the logit difference satisfies necessity in the strict sense but is not doing most of the work. Calling it “a core circuit member” when it contributes 3% while another contributes 60% is effect-magnitude overclaiming.

Recovery fraction:

recovery_fraction = |ablation_delta| / |full_model_logit_diff|

Threshold: ≥ 0.10 (≥10% contribution) for inclusion in a primary mechanism claim. Components with < 0.05 should be classified as minor contributors.

  • Faithfulness (I2): How well the circuit as a whole recovers the behavior.
  • Effect magnitude (E4): How much individual components contribute to the total.

A circuit with high faithfulness (87%) may contain components with low individual effect magnitude (3%) where the high faithfulness comes from many small contributors. Effect magnitude analysis identifies which components do most of the work.

TaskPublished faithfulnessSource
IOI87% logit-diff recoveryWang et al. 2022
Greater-Than89.5% prob-diff recoveryHanna et al. 2023
SVA93% logit-diff recoveryLazo et al. 2025

A circuit at 40% recovery is not yet competitive with these baselines.

  • Absolute recovery fraction for each circuit member.
  • Statistical significance in addition to, not instead of, absolute magnitude.
  • Comparison to published baselines using the table above.