F08 — Incremental Validity
Section titled “F08 — Incremental Validity”This framework asks: Does our method tell us something that simpler baselines cannot?
A circuit-discovery method may produce high faithfulness scores, but if a trivial baseline (random circuit, top-norm heads, all-heads-in-layer) achieves the same score, the method adds no value. Incremental validity quantifies the unique contribution of a method above and beyond what is already explained by baselines.
This is the “so what?” test. Many methods can identify circuits, but the practical question is whether the additional complexity of a given approach yields better predictions than methods that are cheaper, simpler, or already established. Incremental validity answers this by hierarchical comparison.
Theoretical grounding
Section titled “Theoretical grounding”| Source | Year | Key contribution |
|---|---|---|
| Hunsley & Meyer, “The incremental validity of psychological testing” | 2003 | Framework for incremental validity in assessment |
| Sechrest, “Incremental validity: A recommendation” | 1963 | Original formulation of incremental validity |
| Haynes & Lench, “Incremental validity of new clinical assessment measures” | 2003 | Criteria for demonstrating added value |
| Conmy et al., “Towards Automated Circuit Discovery” | 2023 | Baseline comparisons in circuit discovery |
| Syed et al., “Attribution Patching Outperforms Automated Circuit Discovery” | 2023 | Hierarchical method comparison in mechanistic interpretability |
Core concept
Section titled “Core concept”Let ( \theta_{\text{method}} ) be the faithfulness of the circuit identified by our method and ( \theta_{\text{baseline}} ) the faithfulness of the best baseline circuit at the same sparsity level. The incremental validity is:
[ \Delta\theta = \theta_{\text{method}} - \theta_{\text{baseline}} ]
To test significance, we use a paired comparison across tasks:
[ t = \frac{\overline{\Delta\theta}}{SE(\Delta\theta)}, \quad SE = \frac{s_{\Delta}}{\sqrt{K}} ]
where ( K ) is the number of tasks. We also report the proportion of tasks where the method strictly dominates all baselines, and the effect size (Cohen’s ( d )):
[ d = \frac{\overline{\Delta\theta}}{s_{\text{pooled}}} ]
Instruments under F08
Section titled “Instruments under F08”Incremental Validity Analysis (36_incremental_validity.py)
Section titled “Incremental Validity Analysis (36_incremental_validity.py)”Compares the target method’s circuit against a hierarchy of baselines (random, top-norm, top-gradient, layer-wise) at matched sparsity. Reports the incremental gain, statistical significance, effect size, and the proportion of tasks where the method dominates.
What it establishes: That the method provides unique value — its circuits are better than what trivial heuristics produce. What it does not establish: Why the method works better — only that it does.
Usage:
uv run python 36_incremental_validity.py --tasks ioi sva greater_than --baselines random norm gradientReading the scores
Section titled “Reading the scores”| Pattern | What it means |
|---|---|
| d > 0.8, dominates on all tasks | Large practical gain — method clearly adds value |
| d 0.3–0.8, dominates on most tasks | Moderate gain — method is useful but not transformative |
| d < 0.3 | Small or negligible gain — method may not justify its complexity |
| Method loses to baseline on any task | Critical failure — investigate whether the method has a systematic blind spot |