Skip to content

This framework asks: Do different tasks share representational subspaces, and can causal alignments transfer across tasks?

Cross-task overlap quantifies the degree to which circuits discovered for one task also encode variables relevant to another. By performing IIA with rotations learned on task A and evaluating on task B, we test whether the model reuses representational structure — revealing shared computational primitives versus task-specific encodings.

This is the representational analog of circuit overlap: while circuit overlap counts shared heads, cross-task overlap measures whether those shared heads encode information in the same subspace. Two heads might appear in both circuits but use entirely different directions for each task.

SourceYearKey contribution
Geiger et al., “Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations”2023DAS transfer methodology
Todd et al., “Function Vectors in Large Language Models”2023Shared function representations across tasks
Hernandez et al., “Linearity of Relation Representations in Transformer Language Models”2023Linear relation directions transfer across contexts
Merullo et al., “Circuit Component Reuse Across Tasks”2023Empirical circuit overlap measurement

Given DAS rotation ( R_A ) learned on task A with subspace ( S_A ), cross-task IIA evaluates:

[ \text{IIA}{\text{transfer}}(A \to B) = \frac{1}{N_B} \sum{i=1}^{N_B} \mathbb{1}\left[ f\left(\text{do}(h^{(i)}, h^{(j)}, R_A, S_A)\right) = y^{(j)}_{V_B} \right] ]

High transfer IIA means task B’s variable is encoded in the same subspace that task A uses. The overlap coefficient between two learned subspaces ( S_A, S_B \subseteq \mathbb{R}^d ) is:

[ \text{Overlap}(S_A, S_B) = \frac{\dim(S_A \cap S_B)}{\min(\dim S_A, \dim S_B)} ]

computed via singular values of ( P_A P_B ) where ( P_A, P_B ) are the projection matrices onto each subspace.

Cross-Task IIA Transfer (32_cross_task_iia_transfer.py)

Section titled “Cross-Task IIA Transfer (32_cross_task_iia_transfer.py)”

Trains DAS on each task independently, then evaluates the learned rotations on all other tasks.

What it establishes: Whether causal encodings generalize — a shared subspace implies a shared computational primitive. What it does not establish: Whether shared representations reflect genuine reuse vs. coincidental geometric overlap.

Usage:

uv run python 32_cross_task_iia_transfer.py --tasks ioi sva greater_than

Computes the principal angles between task-specific DAS subspaces, giving a fine-grained overlap profile.

What it establishes: The dimensionality and geometry of shared representational structure between tasks. What it does not establish: Whether the overlap is functionally meaningful (could be residual stream “background”).

Usage:

uv run python 32_cross_task_iia_transfer.py --tasks ioi sva --subspace-angles
PatternWhat it means
Transfer IIA > 0.8 (A to B)Tasks share a causal encoding — strong representational reuse
Transfer IIA asymmetric (A to B high, B to A low)Task A’s subspace contains B’s, but not vice versa
Overlap ~ 0Tasks use entirely different directions — no representational sharing
High overlap but low transfer IIASubspaces intersect geometrically but encode different information
Cluster of high-overlap tasksShared computational primitive (e.g., “subject tracking”)