Skip to content
Tier4 of 5 (progressive)
What it meansMultiple methods with non-overlapping assumptions converge on the same mechanism
Minimum evidenceC5 (multi-method convergence) + E5 (external robustness) + V2 (cross-procedure agreement) + nomological network density
Upgrade to ValidatedCompleteness — every component’s function characterized, quantitative predictions confirmed, scope boundary tested
Downgrade to Mechanistically supportedIf convergence fails (methods disagree on core components) or external robustness is refuted

A Triangulated claim has been confirmed by methods whose failure modes do not overlap. The key property is robustness to methodological critique: if activation patching is shown to have distributional artifacts, the weight-space analysis still stands. If the behavioral test has confounds, the attention pattern evidence is independent.

This is a qualitative transition, not merely “more evidence.” A single methodology, no matter how well-executed, produces findings conditional on that methodology’s assumptions. Triangulation means the finding survives the failure of any single method’s assumptions. The claim’s epistemic status is fundamentally different from a well-replicated single-method result.

Convergence is formalized via the robust core: the intersection of circuits identified by each independent method. Claims about the robust core are more strongly supported than claims about the union.

Verdict: Triangulated — [implementational-topographic] Claim: Induction heads (L5H5, L5H1 in GPT-2 Small) implement in-context copying via QK composition with previous-token heads. Met: C5 (attention pattern analysis + QK weight decomposition + training dynamics + behavioral ablation all converge), E5 (mechanism found in GPT-2 Small, Medium, and Large), V2 (manual circuit identification and ACDC agree on core heads, Jaccard = 0.72) Open: IfunI_{\text{fun}} (complete component-level function for all supporting heads), quantitative prediction (novel prediction not yet tested) Scope: GPT-2 family, in-context copying of arbitrary tokens, sequences with repeated subsequences

  • At least two methods named, with their assumptions explicitly stated
  • Jaccard similarity (or equivalent overlap measure) between circuits identified by each method
  • The robust core (intersection) identified and distinguished from method-specific periphery
  • External robustness evidence: distributions or model sizes tested beyond discovery context
  • At least three independently testable predictions, with at least two confirmed by different methods
DirectionWhat’s required
→ ValidatedEvery component’s input-output function characterized (IfunI_{\text{fun}}). At least one novel quantitative prediction confirmed post-hoc. Scope boundary explicitly tested (mechanism fails just outside scope). Coverage κ>0.9\kappa > 0.9.
→ Mechanistically supported (downgrade)Methods are shown to share a hidden assumption (their “independence” was illusory). Or external robustness fails: mechanism does not transfer to claimed distributions/sizes.

Running the same method twice (e.g., activation patching with different hyperparameters) is replication, not triangulation. The methods must have non-overlapping failure modes — if one fails due to a distributional assumption, the other must not share that assumption. Two variants of patching (mean vs. resample) are closer to replication than triangulation because both assume the same causal model of intervention.

  • Induction heads (Olsson et al., 2022) — the strongest candidate in the literature, confirmed by attention pattern analysis, QK composition analysis, training dynamics (phase transition), cross-model search, and behavioral ablation
  • IOI circuit robust core — the subset of heads where Wang et al. (2022) manual analysis, Conmy et al. (2023) ACDC, and EAP-IG all agree