A08 — Partial Information Decomposition
Section titled “A08 — Partial Information Decomposition”This framework asks: do two circuit components carry redundant, unique, or synergistic information about the output — and what does this tell us about circuit organization?
Partial Information Decomposition (PID) takes a pair of source variables ( S_1, S_2 ) (e.g., two attention heads) and a target variable ( T ) (e.g., the logit difference) and decomposes the total mutual information ( I(S_1, S_2; T) ) into four non-negative atoms: redundancy (both carry the same information), unique information from ( S_1 ), unique information from ( S_2 ), and synergy (information available only from the joint but not from either alone). This decomposition reveals whether circuit components are backups for each other (redundant), specialized for different sub-computations (unique), or cooperating in a way that transcends their individual contributions (synergistic).
In MI terms, PID answers the “hydra effect” question: when you ablate one head and another compensates, is that because they carry redundant information (PID predicts compensation) or because they carry synergistic information (PID predicts the joint is not decomposable into individual contributions)?
Theoretical grounding
Section titled “Theoretical grounding”| Source | Year | Key contribution |
|---|---|---|
| Williams & Beer, arXiv 1004.2515 | 2010 | PID framework: redundancy, unique, synergy decomposition |
| McGrath et al., arXiv 2307.15771 | 2023 | Hydra effect: backup heads compensate when primary is ablated |
| Elhage et al., “A Mathematical Framework for Transformer Circuits” | 2021 | Compositional structure enabling information-theoretic analysis |
| Olsson et al., “In-context Learning and Induction Heads” | 2022 | Multiple induction heads as potential redundant/synergistic ensemble |
Core concept: the four information atoms
Section titled “Core concept: the four information atoms”For sources ( S_1, S_2 ) and target ( T ):
[ I(S_1, S_2; T) = \underbrace{R(S_1, S_2; T)}{\text{redundancy}} + \underbrace{U_1(S_1; T \setminus S_2)}{\text{unique to } S_1} + \underbrace{U_2(S_2; T \setminus S_1)}{\text{unique to } S_2} + \underbrace{Syn(S_1, S_2; T)}{\text{synergy}} ]
- Redundancy: Both heads carry the same information about the output. Ablating one should have minimal effect because the other provides a backup.
- Unique (S1): Only this head carries certain information. Ablating it should produce a deficit that nothing else compensates.
- Synergy: Information emerges only from the joint. Neither head alone carries it; both must be present. Ablating either one destroys the synergistic contribution entirely.
High redundancy between heads within a circuit predicts robustness to single-component ablation (the hydra effect). High synergy predicts brittleness — the circuit breaks if any synergistic component is removed.
Instruments under A08
Section titled “Instruments under A08”C8 — PID Analysis (08_pid.py)
Section titled “C8 — PID Analysis (08_pid.py)”Computes the four PID atoms for all pairs of circuit components with respect to the task output. Uses a discretized mutual information estimator (binning activations into quantiles) to compute:
[ R(S_1, S_2; T), \quad U_1, \quad U_2, \quad Syn(S_1, S_2; T) ]
Reports the redundancy/synergy ratio for each pair and identifies the dominant information structure of the circuit.
What it establishes: Whether pairs of components carry overlapping (redundant) or complementary (unique/synergistic) information about the output. Predicts ablation robustness and compensation patterns.
What it does not establish: The causal direction of information flow (PID is symmetric in the sources). Does not identify what information is shared, only how much.
Usage:
uv run python 08_pid.py --tasks ioi --n-prompts 40Reading the scores
Section titled “Reading the scores”| Pattern | What it means |
|---|---|
| High redundancy between heads A and B | Backup/compensation structure; ablating one has limited effect |
| High unique info for head A | A carries specialized information no other component has |
| High synergy for pair (A, B) | Joint computation; both needed, neither alone suffices |
| Redundancy dominant across all pairs | Robust, distributed circuit; hard to break with single ablations |
| Synergy dominant across pairs | Fragile circuit; single ablations may catastrophically degrade |
Practical considerations
Section titled “Practical considerations”PID estimation requires discretization of continuous activations. The instrument bins each component’s activation into quantiles (default: 8 bins) before computing mutual information. This introduces a bias-variance tradeoff: more bins capture finer structure but require more data for reliable estimation. For circuits with more than ~20 components, computing all pairwise PID atoms becomes expensive — the instrument supports a --top-k flag to restrict analysis to the highest-AP components from A01.
The synergy/redundancy ratio is sensitive to the choice of target variable. Using logit-difference as ( T ) measures task-specific information sharing; using full logit vector captures broader representational overlap.
Connection to other frameworks
Section titled “Connection to other frameworks”A08 provides the information-theoretic explanation for patterns observed in A01 (activation patching) and A03 (CATE). When A01 shows that ablating a component has no effect, A08 can determine whether this is because of redundancy (another component carries the same information) or because the component is genuinely irrelevant. A05 (MDC/Glennan) predicts which components should interact based on weight-space organization; A08 verifies whether this interaction is redundant or synergistic. The hydra effect (McGrath et al. 2023) is a direct prediction of high-redundancy PID structure.