Skip to content
Validity typeConstruct
Pass conditionA named, specific result is stated before data collection that would disconfirm the circuit claim
Evidence familyAny (criterion is about claim structure, not the instrument)
Minimum reportingThe disconfirmation condition verbatim, with a threshold value, stated prospectively
Common failure mode”The circuit is real unless ablation fails” — too inclusive to be falsifiable

Falsifiability is the minimum condition for a claim to have determinate content. A circuit claim that cannot be disconfirmed by any specifiable result is not a scientific claim; it is a label applied to whatever the instruments found.

The criterion requires:

  1. A stated disconfirmation condition — a specific result that, if obtained, would lead the researcher to conclude the circuit claim is wrong. Must be stated before the disconfirmatory test is run.
  2. A threshold — quantitative or at minimum ordinal. “IIA < 0.10 above the random-vector baseline across three prompt splits” is acceptable; “IIA is not significantly different from baseline” is not.
  3. Prior commitment — the condition must be committed to before collecting the relevant data. A disconfirmation condition invented after a negative result is rationalization, not falsifiability.

Falsifiability lives under construct validity because the problem is about the content of the claim, not whether an experiment succeeded. A claim defined only by what the instruments found — “the SVA circuit is whatever components show above-chance IIA on SVA” — is unfalsifiable by construction: adding components in response to any negative result always restores the claim.

Falsification condition: If DAS-IIA at L8.MLP on a held-out SVA set (n=200 prompts, 3 random seeds) does not exceed the random-vector baseline by ≥0.10 in at least 2 of 3 seeds, the claim that L8.MLP is a primary causal locus for SVA in GPT-2 Small is disconfirmed.

Too inclusive: “The circuit is real unless ablation fails.” A sufficiently large circuit almost never fails ablation — you can always add the ablated component back.

Post-hoc: The threshold was set after seeing the data.

Vague: “If the circuit doesn’t generalize, we would revisit the claim.” Names no instrument, no threshold, no specific result.

Falsifiability interacts with minimality (C4): a non-minimal circuit is harder to falsify because many components provide fallback explanations. It also interacts with convergent validity (C5): when instruments disagree (Jaccard ≈ 0), the falsifiability condition must specify which instrument’s result counts as disconfirmation.