Skip to content

A protocol bundles a curated set of metrics and calibrations around a specific validity question, then interprets the results through the theoretical lens that motivated the question. Running a protocol is not required to evaluate a claim — metrics and calibrations alone are sufficient — but protocols provide structured depth where criteria are weak.

Each protocol exposes the same interface: a run_protocol() function that takes a model and task list, runs its metrics and calibrations, and returns a ProtocolResult containing scored measurements with metadata. The measurements feed into the standard criteria-scoring pipeline.

Protocols are organized by evidence family. The naming convention uses letter prefixes (A = causal, B = structural, etc.) matching the metric families.

These target internal validity criteria (I1–I5) by running causal metrics through specific theoretical frameworks.

ProtocolQuestionMetrics usedCriteria strengthened
A01 Pearl SCMIs the circuit a valid structural causal model?logit_diff, role_ablation, activation_patching, causal_scrubbingI1 Necessity, I2 Sufficiency
A02 Counterfactual DASDoes counterfactual intervention on the circuit’s subspace change behavior as predicted?das_iia, misalignment, cross_task_transferI3 Specificity
A03 Rubin CATEWhat is the average treatment effect of the circuit, with proper potential-outcomes framing?activation_patching, effect_size, sigma_ablationI1 Necessity, E4 Effect magnitude
A04 WoodwardDoes the circuit satisfy invariant difference-making under intervention?activation_patching, causal_scrubbing, path_patchingI1 Necessity, I4 Consistency
A05 MDC/GlennanDoes the circuit qualify as a mechanism under the mechanistic decomposition criteria?role_ablation, path_patching, cross_task_transferI2 Sufficiency, V3 Narrative coherence
A06 MediationWhat fraction of the total effect flows through the circuit (direct vs indirect)?mediation, path_patching, effect_sizeI3 Specificity
A07 Granger/TEDoes past circuit activity predict future behavior beyond what other components predict?granger_causality, transfer_entropy, pidI1 Necessity, C5 Convergent validity
A08 PIDHow is information about the task variable decomposed across circuit components?pid, mutual_information, conditional_miI3 Specificity, C5 Convergent validity
A09 MDL/SLTDoes the circuit provide a minimal-description-length explanation of the behavior?mdl, llc, effect_sizeC4 Minimality
A10 Regularity/INUSIs the circuit an INUS condition — insufficient but necessary part of an unnecessary but sufficient set?activation_patching, sigma_ablation, effect_sizeI1 Necessity, C4 Minimality
A11 Actual CauseDoes the circuit meet Halpern-Pearl’s definition of actual causation (not just but-for)?activation_patching, causal_scrubbing, path_patchingI1 Necessity, I5 Confound control
A12 TransportabilityDoes the causal effect transport across domains (prompts, models, tasks)?cross_task_transfer, cross_model_invariance, generalization_gapE5 Robustness, E6 Cross-architecture
A13 Causal DiscoveryCan the circuit’s causal graph be recovered from observational + interventional data?notears, granger_causality, pidI4 Consistency, C5 Convergent validity

These target construct and measurement criteria by analyzing weight-space properties without forward passes.

ProtocolQuestionCriteria strengthened
B01 Spectral/SVDDoes the circuit’s weight structure reveal interpretable spectral modes?C2 Structural plausibility, M4 Sensitivity
B02 CompositionDo weight-space composition scores match the claimed information routing?C2 Structural plausibility, I4 Consistency
B03 Graph analysisDoes the circuit’s connectivity graph have non-trivial topological properties?C4 Minimality, C5 Convergent validity
B04 Network motifsDoes the circuit contain recognizable computational motifs (copying, inhibition, routing)?C2 Structural plausibility, V3 Narrative coherence
ProtocolQuestionCriteria strengthened
D01 FaithfulnessDoes the circuit alone reproduce the behavior?I2 Sufficiency, E4 Effect magnitude
D02 GeneralizationDoes the circuit transfer to held-out prompts and related tasks?E5 Robustness, C3 Task specificity
D03 ProbingCan the circuit’s intermediate representations be decoded by a learned classifier?C2 Structural plausibility, E5 Robustness

Information protocols, representational protocols, and additional families

Section titled “Information protocols, representational protocols, and additional families”

Beyond the core families above, protocols exist for information-theoretic analysis (C01–C03), representational analysis (E01), and extended families drawn from specific scientific traditions:

  • Molecular biology (16 protocols) — knockout hierarchies, rescue experiments, Mendelian randomization, dose-response, target engagement, sensitivity analysis, and other designs adapted from experimental biology.
  • Cross-discipline (11 protocols) — control theory (settling depth, stability margin, observability), dynamical systems (Koopman/DMD, renormalization, TDA), economics (arbitrage search, game theory), and geometry (Fisher-Rao, sheaf consistency).
  • Synthesis (9 protocols) — see synthesis protocols.

Protocols are not a checklist to run exhaustively. They are targeted tools for strengthening specific criteria.

The workflow: run metrics and calibrations. Score criteria. Identify which criteria are weak. Find the protocol tagged to that criterion and run it. Re-score. The protocol inventory is organized by which criteria each protocol strengthens, so the mapping from “weak criterion” to “run this protocol” is direct.

For a typical evaluation, running 3–5 protocols from different families produces substantially richer evidence than running 15 metrics from the same family. Cross-family convergence is the goal — not exhaustive coverage within one family.

Protocols live in the mechanistic-validity-experiments repository under experiments/protocols/. Each can be run standalone or imported:

Terminal window
uv run python protocols/neuroscience/a01_scm_pearl.py --tasks ioi induction --device cuda
from protocols.neuroscience.a01_scm_pearl import run_protocol
result = run_protocol(model, tasks=["ioi"], n_prompts=40)

The main mechval library provides registry infrastructure (register_protocol, dispatch_protocol, list_protocols) for programmatic protocol management.