Skip to content
TierLateral (outside progression)
What it meansEvidence actively contradicts the claimed mechanism — a specific prediction has failed or the finding is shown to be artifactual
When to assignA prediction of the mechanism has been tested and refuted, OR the mechanism is demonstrated to be a measurement artifact
Relationship to progressive tiersAny claim at any progressive tier can be moved to Disconfirmed when contradicting evidence emerges
Scientific valueHigh — disconfirmation narrows the hypothesis space and is informative

Disconfirmed is not “failed research.” It is a positive scientific conclusion: the evidence actively contradicts the mechanistic claim. A field that never disconfirms is not doing science. The lateral position (rather than placing it below Proposed) reflects this: disconfirmation is a different kind of conclusion, not a worse one.

Disconfirmation can take three forms: prediction failure (the mechanism predicts X, the model does not-X), artifact demonstration (the finding disappears under improved methodology), or construct dissolution (the named entity is not a coherent construct separable from other processing).

Each form is informative. Prediction failure narrows the space of viable mechanisms. Artifact demonstration improves methodology for the whole field. Construct dissolution reveals that the question was ill-posed, redirecting inquiry.

Verdict: Disconfirmed — [implementational-topographic] Claim: The IOI circuit is sufficient for indirect object identification under distribution-respecting ablation. Disconfirming evidence: Miller et al. (2024) demonstrated that sufficiency (R=0.87R = 0.87 under mean ablation) drops to R<0.50R < 0.50 under resample ablation. The original sufficiency claim is an artifact of mean ablation’s distributional assumptions. Type: Artifact demonstration — the finding is method-conditional, not mechanism-intrinsic. Remaining valid claims: Necessity of the circuit components remains established. Sufficiency under mean ablation remains a true statement (with method qualification). Scope: GPT-2 Small, IOI task, sufficiency specifically (not the full circuit claim)

TypeDefinitionExample
Prediction failureMechanism predicts behavior XX; model produces ¬X\neg XA claimed “gender circuit” predicts male bias; model shows no gender preference on the test distribution
Artifact demonstrationFinding disappears under improved methodologyPatching result vanishes when mean ablation is replaced by resample ablation
Construct dissolutionNamed entity is not separable from other processing”The bias circuit” is indistinguishable from “the gender knowledge circuit” — the construct has no independent existence
  • The original claim stated precisely (what was predicted)
  • The disconfirming evidence (what was observed instead)
  • The type of disconfirmation (prediction failure, artifact, or dissolution)
  • What remains valid from the original work (disconfirmation is usually partial)
  • Whether the disconfirmation is total (mechanism is wrong) or scoped (mechanism is method-conditional or distribution-limited)
TransitionMeaning
Any tier → DisconfirmedNew evidence contradicts the claim
Disconfirmed → Proposed (rare)The disconfirming evidence is itself shown to be flawed; the original claim is reopened
Disconfirmed → refined claim at Tier 1+The original claim is revised to accommodate the disconfirming evidence — the revised claim is a new entity
  • IOI sufficiency under resample ablationMiller et al. (2024) demonstrated method-conditionality of the sufficiency result
  • Early “knowledge neuron” localization claims — initial claims that single neurons store facts were partially disconfirmed by distributed representation evidence
  • Induction head toxicity claimsWang et al. (2025) self-withdrawn after methodological concerns