Criterion M1 — Reliability
Section titled “Criterion M1 — Reliability”| Validity type | Measurement |
| Pass condition | IIA and faithfulness scores are stable across prompt splits, random seeds, and model checkpoints |
| Evidence family | Measurement |
| Minimum reporting | Bootstrap CI on faithfulness (≥100 subsamples); test-retest Pearson r across ≥3 prompt splits; seed variance |
| Common failure mode | Reporting a single point estimate with no confidence interval or variance |
What this criterion requires
Section titled “What this criterion requires”Reliability is the test-retest stability of the measurement instrument. A reliable instrument produces the same result when re-applied under the same conditions.
Three dimensions:
Prompt-sample reliability: Bootstrap 100 random subsamples (80% of full size each); compute target metric on each; report the 95% CI. A 95% CI width ≤ 0.05 (on a 0–1 metric) is a reasonable threshold.
Seed reliability: Any step involving random initialization (DAS alignment search, bootstrap sampling, prompt shuffling) produces consistent results across ≥3 seeds. SD ≤ 0.02 is a reasonable threshold.
Checkpoint reliability: The claim should hold across multiple training checkpoints (not just the final checkpoint). A result that holds at only one checkpoint may be a training-stage artifact. Particularly important for LLC-based analyses.
The c11bootstrap.py script implements bootstrap stability for circuit faithfulness. The reconstruction_metrics function in factor_analysis.py provides stability for reconstruction quality metrics.
Reliability vs. consistency (I4)
Section titled “Reliability vs. consistency (I4)”- Reliability (M1): Does the instrument give the same answer when re-applied?
- Consistency (I4): Does the causal effect hold across experimental variation?
High reliability, low consistency: instrument is stable but the effect it measures varies across prompt families. Low reliability, high apparent consistency: instrument is noisy but averages out to a consistent-looking result. Report both.
Minimum reporting rule
Section titled “Minimum reporting rule”- Bootstrap CI on all primary metrics (faithfulness, IIA, AUROC).
- Seed variance for any random process.
- Whether checkpoint stability was tested; if not, flag it.
- A result without any variance estimate fails this criterion.