Skip to content

This framework asks: Does our method tell us something that simpler baselines cannot?

A circuit-discovery method may produce high faithfulness scores, but if a trivial baseline (random circuit, top-norm heads, all-heads-in-layer) achieves the same score, the method adds no value. Incremental validity quantifies the unique contribution of a method above and beyond what is already explained by baselines.

This is the “so what?” test. Many methods can identify circuits, but the practical question is whether the additional complexity of a given approach yields better predictions than methods that are cheaper, simpler, or already established. Incremental validity answers this by hierarchical comparison.

SourceYearKey contribution
Hunsley & Meyer, “The incremental validity of psychological testing”2003Framework for incremental validity in assessment
Sechrest, “Incremental validity: A recommendation”1963Original formulation of incremental validity
Haynes & Lench, “Incremental validity of new clinical assessment measures”2003Criteria for demonstrating added value
Conmy et al., “Towards Automated Circuit Discovery”2023Baseline comparisons in circuit discovery
Syed et al., “Attribution Patching Outperforms Automated Circuit Discovery”2023Hierarchical method comparison in mechanistic interpretability

Let ( \theta_{\text{method}} ) be the faithfulness of the circuit identified by our method and ( \theta_{\text{baseline}} ) the faithfulness of the best baseline circuit at the same sparsity level. The incremental validity is:

[ \Delta\theta = \theta_{\text{method}} - \theta_{\text{baseline}} ]

To test significance, we use a paired comparison across tasks:

[ t = \frac{\overline{\Delta\theta}}{SE(\Delta\theta)}, \quad SE = \frac{s_{\Delta}}{\sqrt{K}} ]

where ( K ) is the number of tasks. We also report the proportion of tasks where the method strictly dominates all baselines, and the effect size (Cohen’s ( d )):

[ d = \frac{\overline{\Delta\theta}}{s_{\text{pooled}}} ]

Incremental Validity Analysis (36_incremental_validity.py)

Section titled “Incremental Validity Analysis (36_incremental_validity.py)”

Compares the target method’s circuit against a hierarchy of baselines (random, top-norm, top-gradient, layer-wise) at matched sparsity. Reports the incremental gain, statistical significance, effect size, and the proportion of tasks where the method dominates.

What it establishes: That the method provides unique value — its circuits are better than what trivial heuristics produce. What it does not establish: Why the method works better — only that it does.

Usage:

uv run python 36_incremental_validity.py --tasks ioi sva greater_than --baselines random norm gradient
PatternWhat it means
d > 0.8, dominates on all tasksLarge practical gain — method clearly adds value
d 0.3–0.8, dominates on most tasksModerate gain — method is useful but not transformative
d < 0.3Small or negligible gain — method may not justify its complexity
Method loses to baseline on any taskCritical failure — investigate whether the method has a systematic blind spot