A07 — Granger Causality / Transfer Entropy
Section titled “A07 — Granger Causality / Transfer Entropy”This framework asks: can we discover causal structure from observational data alone — without interventions — by measuring directed information flow between components?
Granger causality and transfer entropy provide tools for identifying directed relationships from observational data. A variable ( X ) Granger-causes ( Y ) if past values of ( X ) improve prediction of ( Y ) beyond what past values of ( Y ) alone provide. Transfer entropy generalizes this to the information-theoretic setting: ( T_{X \to Y} = I(Y_t; X_{t-1} \mid Y_{t-1}) ). In transformers, “temporal precedence” maps to layer ordering — earlier-layer activations precede later-layer activations in the computational graph, making Granger-style analysis applicable.
The practical value: observational methods scale to the full model without requiring ( O(n^2) ) interventions. They serve as a discovery tool — identifying candidate causal relationships that can then be verified with interventional methods from A01/A02. The oCSE (observational Causal Structure Evaluation) approach combines conditional mutual information with stability selection to find edges that are robust across subsamples.
Theoretical grounding
Section titled “Theoretical grounding”| Source | Year | Key contribution |
|---|---|---|
| Granger, “Investigating Causal Relations by Econometric Models” | 1969 | Granger causality: predictive improvement as evidence of causation |
| Schreiber, “Measuring Information Transfer” | 2000 | Transfer entropy: information-theoretic generalization of Granger causality |
| Elhage et al., “A Mathematical Framework for Transformer Circuits” | 2021 | Layer ordering as temporal structure enabling directed analysis |
| Conmy et al., arXiv 2304.14997 | 2023 | ACDC as interventional circuit discovery (contrast to observational) |
Core concept: observational circuit discovery
Section titled “Core concept: observational circuit discovery”Transfer entropy from component ( X ) (layer ( l )) to component ( Y ) (layer ( l’ > l )) is:
[ T_{X \to Y} = I(Y; X \mid \text{Pa}(Y) \setminus X) ]
where ( \text{Pa}(Y) ) is the set of all components at layers ( \leq l’ ) that could influence ( Y ). This conditional mutual information measures the unique information that ( X ) provides about ( Y ) beyond what other parents already provide. High transfer entropy implies a directed information-flow relationship.
The oCSE algorithm applies stability selection: estimate transfer entropy on many bootstrap subsamples of the data, and retain only edges that appear consistently. This controls false discovery rate without interventions. The resulting graph is a candidate circuit that can be validated with A01/A02 methods.
The cross-task transfer variant tests whether causal information relationships discovered on one task generalize to another — evidence of a task-general computational structure rather than task-specific correlation.
Instruments under A07
Section titled “Instruments under A07”C7 — oCSE: Observational Circuit Structure Evaluation (07_ocse.py)
Section titled “C7 — oCSE: Observational Circuit Structure Evaluation (07_ocse.py)”Estimates directed information flow between all pairs of components using conditional mutual information with stability selection:
[ \hat{T}_{X \to Y} = \hat{I}(Y; X \mid Z) \quad \text{where } Z = \text{Pa}(Y) \setminus X ]
Edges are retained if they appear in more than a threshold fraction of bootstrap samples. The output is a directed graph over components that can be compared to interventionally-discovered circuits via structural Hamming distance.
What it establishes: Candidate causal edges from observational data alone. Scales to full-model analysis without per-edge interventions.
What it does not establish: True causation (observational methods cannot distinguish causation from confounding by unobserved variables). Must be validated interventionally.
Usage:
uv run python 07_ocse.py --tasks ioi sva --n-prompts 40C32 — Cross-Task IIA Transfer (32_cross_task_iia_transfer.py)
Section titled “C32 — Cross-Task IIA Transfer (32_cross_task_iia_transfer.py)”Tests whether causal relationships (measured via IIA from A02) transfer across tasks. Trains DAS alignments on one task and evaluates IIA on another, measuring how much of the causal structure is task-general:
[ \text{Transfer}(t_1 \to t_2) = \frac{\text{IIA}{t_2}(\tau{t_1})}{\text{IIA}{t_2}(\tau{t_2})} ]
What it establishes: Whether causal structure is task-specific or reflects general computational organization.
What it does not establish: The mechanism behind transfer (could be shared representations or shared algorithms).
Usage:
uv run python 32_cross_task_iia_transfer.py --tasks ioi sva --n-prompts 40Reading the scores
Section titled “Reading the scores”| Pattern | What it means |
|---|---|
| oCSE graph matches interventional circuit (low SHD) | Observational discovery recovers true causal structure |
| High transfer entropy but no interventional effect | Confounded relationship (correlation without causation) |
| High cross-task transfer ratio (> 0.8) | Causal structure is task-general |
| Low cross-task transfer (< 0.4) | Task-specific wiring; different algorithms per task |
Connection to other frameworks
Section titled “Connection to other frameworks”A07 provides a scalable discovery complement to A01’s interventional verification. The workflow is: use A07 (observational) to generate candidate circuits cheaply, then validate the most promising candidates with A01 (activation patching) and A02 (IIA). A13 (Causal Discovery / NOTEARS) offers an alternative observational approach using continuous optimization rather than information-theoretic measures. A12 (Transportability) formalizes the conditions under which cross-task transfer results from A07 are valid.