Protocols
Section titled “Protocols”A protocol bundles a curated set of metrics and calibrations around a specific validity question, then interprets the results through the theoretical lens that motivated the question. Running a protocol is not required to evaluate a claim — metrics and calibrations alone are sufficient — but protocols provide structured depth where criteria are weak.
Each protocol exposes the same interface: a run_protocol() function that takes a model and task list, runs its metrics and calibrations, and returns a ProtocolResult containing scored measurements with metadata. The measurements feed into the standard criteria-scoring pipeline.
Protocol families
Section titled “Protocol families”Protocols are organized by evidence family. The naming convention uses letter prefixes (A = causal, B = structural, etc.) matching the metric families.
Causal protocols (A01–A13)
Section titled “Causal protocols (A01–A13)”These target internal validity criteria (I1–I5) by running causal metrics through specific theoretical frameworks.
| Protocol | Question | Metrics used | Criteria strengthened |
|---|---|---|---|
| A01 Pearl SCM | Is the circuit a valid structural causal model? | logit_diff, role_ablation, activation_patching, causal_scrubbing | I1 Necessity, I2 Sufficiency |
| A02 Counterfactual DAS | Does counterfactual intervention on the circuit’s subspace change behavior as predicted? | das_iia, misalignment, cross_task_transfer | I3 Specificity |
| A03 Rubin CATE | What is the average treatment effect of the circuit, with proper potential-outcomes framing? | activation_patching, effect_size, sigma_ablation | I1 Necessity, E4 Effect magnitude |
| A04 Woodward | Does the circuit satisfy invariant difference-making under intervention? | activation_patching, causal_scrubbing, path_patching | I1 Necessity, I4 Consistency |
| A05 MDC/Glennan | Does the circuit qualify as a mechanism under the mechanistic decomposition criteria? | role_ablation, path_patching, cross_task_transfer | I2 Sufficiency, V3 Narrative coherence |
| A06 Mediation | What fraction of the total effect flows through the circuit (direct vs indirect)? | mediation, path_patching, effect_size | I3 Specificity |
| A07 Granger/TE | Does past circuit activity predict future behavior beyond what other components predict? | granger_causality, transfer_entropy, pid | I1 Necessity, C5 Convergent validity |
| A08 PID | How is information about the task variable decomposed across circuit components? | pid, mutual_information, conditional_mi | I3 Specificity, C5 Convergent validity |
| A09 MDL/SLT | Does the circuit provide a minimal-description-length explanation of the behavior? | mdl, llc, effect_size | C4 Minimality |
| A10 Regularity/INUS | Is the circuit an INUS condition — insufficient but necessary part of an unnecessary but sufficient set? | activation_patching, sigma_ablation, effect_size | I1 Necessity, C4 Minimality |
| A11 Actual Cause | Does the circuit meet Halpern-Pearl’s definition of actual causation (not just but-for)? | activation_patching, causal_scrubbing, path_patching | I1 Necessity, I5 Confound control |
| A12 Transportability | Does the causal effect transport across domains (prompts, models, tasks)? | cross_task_transfer, cross_model_invariance, generalization_gap | E5 Robustness, E6 Cross-architecture |
| A13 Causal Discovery | Can the circuit’s causal graph be recovered from observational + interventional data? | notears, granger_causality, pid | I4 Consistency, C5 Convergent validity |
Structural protocols (B01–B04)
Section titled “Structural protocols (B01–B04)”These target construct and measurement criteria by analyzing weight-space properties without forward passes.
| Protocol | Question | Criteria strengthened |
|---|---|---|
| B01 Spectral/SVD | Does the circuit’s weight structure reveal interpretable spectral modes? | C2 Structural plausibility, M4 Sensitivity |
| B02 Composition | Do weight-space composition scores match the claimed information routing? | C2 Structural plausibility, I4 Consistency |
| B03 Graph analysis | Does the circuit’s connectivity graph have non-trivial topological properties? | C4 Minimality, C5 Convergent validity |
| B04 Network motifs | Does the circuit contain recognizable computational motifs (copying, inhibition, routing)? | C2 Structural plausibility, V3 Narrative coherence |
Behavioral protocols (D01–D03)
Section titled “Behavioral protocols (D01–D03)”| Protocol | Question | Criteria strengthened |
|---|---|---|
| D01 Faithfulness | Does the circuit alone reproduce the behavior? | I2 Sufficiency, E4 Effect magnitude |
| D02 Generalization | Does the circuit transfer to held-out prompts and related tasks? | E5 Robustness, C3 Task specificity |
| D03 Probing | Can the circuit’s intermediate representations be decoded by a learned classifier? | C2 Structural plausibility, E5 Robustness |
Information protocols, representational protocols, and additional families
Section titled “Information protocols, representational protocols, and additional families”Beyond the core families above, protocols exist for information-theoretic analysis (C01–C03), representational analysis (E01), and extended families drawn from specific scientific traditions:
- Molecular biology (16 protocols) — knockout hierarchies, rescue experiments, Mendelian randomization, dose-response, target engagement, sensitivity analysis, and other designs adapted from experimental biology.
- Cross-discipline (11 protocols) — control theory (settling depth, stability margin, observability), dynamical systems (Koopman/DMD, renormalization, TDA), economics (arbitrage search, game theory), and geometry (Fisher-Rao, sheaf consistency).
- Synthesis (9 protocols) — see synthesis protocols.
Choosing protocols
Section titled “Choosing protocols”Protocols are not a checklist to run exhaustively. They are targeted tools for strengthening specific criteria.
The workflow: run metrics and calibrations. Score criteria. Identify which criteria are weak. Find the protocol tagged to that criterion and run it. Re-score. The protocol inventory is organized by which criteria each protocol strengthens, so the mapping from “weak criterion” to “run this protocol” is direct.
For a typical evaluation, running 3–5 protocols from different families produces substantially richer evidence than running 15 metrics from the same family. Cross-family convergence is the goal — not exhaustive coverage within one family.
Running protocols
Section titled “Running protocols”Protocols live in the mechanistic-validity-experiments repository under experiments/protocols/. Each can be run standalone or imported:
uv run python protocols/neuroscience/a01_scm_pearl.py --tasks ioi induction --device cudafrom protocols.neuroscience.a01_scm_pearl import run_protocolresult = run_protocol(model, tasks=["ioi"], n_prompts=40)The main mechval library provides registry infrastructure (register_protocol, dispatch_protocol, list_protocols) for programmatic protocol management.