Skip to content
OriginPylyshyn (1984) — content vs. functional architecture; extended by Marr (1982)‘s “representation and algorithm”
QuestionWhat information does the system encode, in what format, and where?
Licensing evidenceProbing with selectivity baselines (decodable) or IIA/DAS with stated alignment map (causal)
Interpretive-validity riskConflating “decodable” with “encoded” — the first is a measurement capability, the second is a causal claim
Position in partial orderC>A>R>Ifun>Icon>ItopC > A > R > I_{\text{fun}} > I_{\text{con}} > I_{\text{top}} — above implementational, below algorithmic

A verdict tagged [representational] specifies three things: (1) content — what variables or features are represented, (2) format — how that content is encoded (linear direction, subspace, distributed, nonlinear manifold), and (3) location — where in the model (layer, position, component) the encoding is accessible.

A representational claim does not commit to how the representation is used — that would be algorithmic. It commits only to what is there. “The model linearly encodes syntactic number at layer 8, position pp, in direction v^\hat{v}” is representational. “The model uses the number encoding to route agreement information to the verb position” is algorithmic.

Let hlpRdmodelh_l^p \in \mathbb{R}^{d_{\text{model}}} be the residual-stream activation at layer ll, position pp. A linear representational claim asserts the existence of a direction v^\hat{v} such that:

hlp,v^ tracks a causal variable Z with d=μ+μσpooled>τ\langle h_l^p, \hat{v} \rangle \text{ tracks a causal variable } Z \text{ with } d' = \frac{\mu_+ - \mu_-}{\sigma_{\text{pooled}}} > \tau

for some threshold τ\tau, where μ+\mu_+ and μ\mu_- are the mean projections for the two values of ZZ.

Decodable vs. causal representational claims:

  • Decodable: a probe can extract ZZ from hlph_l^p with accuracy above a selectivity baseline (Hewitt & Liang 2019). This establishes that the information is accessible, not that it is used.
  • Causal: intervening on hlph_l^p along v^\hat{v} changes the model’s behavior in a manner consistent with changing ZZ. This is the IIA/DAS standard (Geiger et al. 2021) — the representation is not merely decodable but load-bearing.

The causal version is strictly stronger. A decodable representation might be an epiphenomenal byproduct; a causal representation is part of the computation.

  1. Geometric form stated explicitly — linear direction, kk-dimensional subspace, distributed activation pattern, or nonlinear manifold. The claim must commit to the encoding format.

  2. Decodability demonstrated with baselines:

    • For probing: selectivity (Hewitt & Liang 2019) — accuracy on the target task minus accuracy on a control task of equal difficulty
    • For IIA: random-vector baseline and untrained-model baseline
  3. If the claim is causal (stronger): Interchange intervention (IIA/DAS) demonstrating that the representation is load-bearing. Swapping the direction v^\hat{v} between two inputs that differ only in ZZ should change the model’s output accordingly.

  4. Alignment map capacity stated and varied (Wu et al. 2024): If IIA is high only with unconstrained nonlinear maps, the finding is about map flexibility, not linear geometry. The claim should specify what alignment architecture was used and show how IIA degrades as capacity is reduced.

What does NOT license a [representational] tag

Section titled “What does NOT license a [representational] tag”
  • Probe accuracy without a control task. A powerful probe can decode almost anything from high-dimensional activations. Without selectivity, high probe accuracy is not evidence of encoding.
  • IIA without stating the alignment map architecture. A 3-layer MLP alignment map can brute-force almost any interchange intervention. The claim must specify the map’s capacity.
  • “The model represents X” based on behavioral evidence alone. If the model gets the right answer on sentences requiring knowledge of X, that is behavioral (computational/algorithmic), not representational. A representational claim must point to a specific geometric structure in activation space.
  • Reporting PCA directions without testing causal relevance. PCA finds variance, not computation. High-variance directions may be irrelevant to the task.
Worked example: Number encoding in GPT-2 (decodable → causal)

Decodable claim (weaker): “A linear probe trained on GPT-2 Small residual stream at layer 8, subject position, achieves 94% accuracy at classifying singular vs. plural number. Selectivity = 94% - 51% (control task: random binary label) = 43 percentage points.” This establishes decodability with a proper baseline.

Causal claim (stronger): “IIA with a 1-dimensional linear alignment map at layer 8 achieves interchange accuracy 0.72. Swapping the number direction between ‘The cat sits’ and ‘The cats sit’ causes the model to change verb agreement predictions accordingly. Baseline (random direction swap): 0.03.” This establishes that the linear encoding is causally load-bearing.

Upgrade to algorithmic: “The number representation at layer 8 is read by attention heads at layers 9-10 via QK composition, which route agreement information to the verb position.” This crosses from representational (what is encoded) to algorithmic (how it is used).

Anti-pattern: representation without geometry

Bad claim: “The model represents the indirect object.”

Why this fails: No geometric form is specified (linear? subspace? distributed?). No location (which layer? which position?). No format commitment. This is a computational-level statement (“the model tracks this variable”) dressed up as representational. A proper representational claim would be: “The IO name is linearly decodable from the residual stream at layer 7, IO position, in a 3-dimensional subspace with d=2.4d' = 2.4.”

DirectionWhat’s required
IfunRI_{\text{fun}} \to R (upgrade from functional)Demonstrate that the component’s function is representational — encoding a specific variable — rather than just performing a mathematical operation. A head that suppresses a direction is functional; showing that direction encodes a specific causal variable makes it representational.
RAR \to A (upgrade to algorithmic)Show ordering, composition, and procedure: the representations flow through specific steps that jointly produce the behavior. How is the representation read and used?
RIfunR \to I_{\text{fun}} (downgrade)If the causal test fails (IIA is at baseline), the claim downgrades to “this component produces outputs with this geometric structure” — a functional/statistical characterization, not a representational one.

Instruments that provide representational-level evidence

Section titled “Instruments that provide representational-level evidence”
  • E01 (PCA dimensionality) — geometry of activation subspaces per layer
  • E03 (RSA) — whether neural similarity structure matches task-variable structure
  • E05 (Intrinsic dimension) — manifold complexity of representations
  • E10 (Subspace alignment) — whether circuit heads share representational geometry
  • A03 (Interchange intervention) — causal relevance of specific directions