The Geometry Lens
Section titled “The Geometry Lens”This lens asks one question: does the circuit’s activation manifold have the geometric structure that the claimed computation requires?
This lens draws on four mathematical traditions. Information geometry (Amari 2016) equips the space of activation distributions with the Fisher-Rao metric — where distance is measured by KL divergence rather than Euclidean norm. Its curvature encodes the model’s sensitivity to changes in different directions: a circuit that claims to detect a specific feature should have high Fisher curvature along the direction that distinguishes instances of that feature, and low curvature along irrelevant directions. Sheaf theory (Bredon 1997; Curry 2014) formalizes local consistency: each circuit component carries a local representation, and the sheaf consistency condition asks whether representations agree when components share information via the residual stream. Optimal transport (Villani 2003) provides a different metric on distributions — the Wasserstein distance — that measures the cost of rearranging one distribution into another, capturing structural similarity that KL divergence can miss. Representation theory asks whether symmetries of the input (permutation of names in IOI, number agreement in SVA) are reflected as symmetries in the circuit’s weight and activation structure.
Angular geometry captures a fifth property: whether task-relevant subspaces are cleanly separated or tangled in activation space. Two subspaces can be close in Euclidean distance but nearly orthogonal (well-separated), or far apart but nearly parallel (poorly separated). Because the model’s linear operations — attention projections, MLP transformations — depend on direction rather than magnitude, angular separation between subspaces is the quantity that determines whether the model can distinguish the features those subspaces encode.
There is a conceptual point worth naming. The choice of metric — Euclidean, Fisher-Rao, Wasserstein — is not neutral. Each defines a different notion of “close” and “far,” and therefore a different notion of what the circuit treats as similar or different. Euclidean distance in activation space is the default in most MI work (cosine similarity, L2 norms), but it is not the metric the model uses. The model’s output distribution changes according to the Fisher-Rao geometry, not the Euclidean geometry. Two activations that are far apart in L2 can produce nearly identical output distributions (small Fisher-Rao distance), and two that are close in L2 can produce dramatically different outputs (large Fisher-Rao distance). Using the wrong metric leads to wrong conclusions about what the circuit distinguishes. Together — curvature, parallel transport, optimal transport, symmetry, and angular structure — these tools ask: does the circuit’s geometry match its claimed function?
Key Distinctions
Section titled “Key Distinctions”Intrinsic vs extrinsic geometry
Section titled “Intrinsic vs extrinsic geometry”Gauss’s Theorema Egregium (1827) established that some geometric properties are intrinsic — measurable by an observer living on the manifold, without reference to the ambient space — while others are extrinsic — depending on how the manifold is embedded. Curvature is intrinsic: a being living on a sphere can detect its curvature without knowing about three-dimensional space. The angle between two subspaces in the ambient residual stream is extrinsic: it depends on the embedding.
In MI: the Fisher information metric on a circuit’s activation manifold is intrinsic — it measures how the model itself distinguishes nearby inputs, from the model’s own perspective. Angular separation between task-relevant subspaces is extrinsic — it measures how subspaces sit relative to each other in the ambient -dimensional residual stream. Both matter for circuit evaluation. Curvature tells you what the circuit treats as different. Angular separation tells you whether the circuit’s representations are organized to permit downstream linear readout. A circuit with the right intrinsic geometry (high curvature along task-relevant directions) but poor extrinsic geometry (tangled subspaces) may compute the right thing internally but fail to communicate it downstream.
Curvature as sensitivity
Section titled “Curvature as sensitivity”The Fisher information matrix at a point in parameter (or activation) space is:
This is the Hessian of the KL divergence, and it defines a Riemannian metric on the space of distributions. High curvature along a direction means the model’s output distribution changes rapidly when activations are perturbed in that direction — the model is sensitive to that distinction. Low curvature means the model is insensitive — perturbations in that direction do not change the output.
In MI: if the Fisher metric on a circuit’s activation manifold has high curvature along the direction that separates singular from plural subject representations, the circuit distinguishes grammatical number. If the Fisher metric is flat along that direction, the circuit does not care about the distinction — regardless of what a linear probe might find. A probe can extract information that the model does not functionally use. The Fisher metric measures what the model treats as different in terms of its output behavior, not what can be decoded from its activations by an external classifier. This is a stronger test than probing and a more geometric test than ablation.
Local consistency vs global coherence (sheaves)
Section titled “Local consistency vs global coherence (sheaves)”A sheaf on a topological space assigns data to each open set and requires that data on overlapping sets agree on the overlap. Bredon (1997) developed the algebraic formalism; Curry (2014) adapted it for network data analysis, where nodes carry local data and edges carry consistency conditions.
In MI: each circuit component carries a local representation of the information flowing through it. The residual stream is the overlap — the shared medium through which components communicate. The sheaf consistency score at an edge measures whether the representation at component , when projected through the residual stream, agrees with the representation at component . Formally, for components and connected via the residual stream, with restriction maps and from their local representations to the shared subspace:
A circuit with high sheaf consistency has coherent information flow: the meaning of a representation is preserved as it moves through the circuit. A circuit with low consistency has internal contradictions — what “plural” means at one component differs from what it means at another. High sheaf consistency is necessary for the claim that a circuit implements a unified computation rather than a sequence of unrelated transformations that happen to compose into the right answer.
KL divergence vs Wasserstein distance
Section titled “KL divergence vs Wasserstein distance”KL divergence and Wasserstein distance are both ways to measure how different two distributions are, but they capture different properties. KL divergence measures the average surprise of using one distribution to code another — it is asymmetric, can be infinite when supports don’t overlap, and is sensitive to the tails. Wasserstein distance (earth mover’s distance) measures the minimum cost of transporting mass from one distribution to another — it is a true metric, always finite for distributions with finite moments, and respects the geometry of the underlying space.
In MI: two circuits can produce activation distributions with small KL divergence (the log-likelihood ratio is small on average) but large Wasserstein distance (the distributions have the same total probability mass but arranged in different locations in activation space). KL divergence asks “does the circuit produce similar output likelihoods?” Wasserstein distance asks “does the circuit produce similar activation structure?” For circuit stability — does the circuit’s geometric structure hold up across prompt samples, tasks, or model checkpoints? — Wasserstein distance is the more informative metric, because it captures structural rearrangement that KL divergence can miss. S06 (Wasserstein Stability) uses this to test whether a circuit’s activation geometry is stable across conditions.
Symmetry as structure (representation theory)
Section titled “Symmetry as structure (representation theory)”A computation that is equivariant with respect to a symmetry group has the group’s structure baked into its weights. If swapping “Alice” and “Bob” in an IOI prompt swaps the model’s predictions symmetrically, the IOI circuit’s weights should reflect the permutation symmetry of the name positions. Representation theory (Serre 1977) provides the language: a group acts on the input space, and the circuit’s weight matrices should intertwine the input representation with the output representation — for all .
In MI: most circuit claims implicitly assume symmetries. The IOI circuit should treat name positions symmetrically. The SVA circuit should be equivariant under subject-verb position shifts. The induction circuit should generalize across token identities. Testing whether the circuit’s weight matrices actually commute with these symmetry transformations is a structural test that requires no forward passes — it is a pure weight-space check. A circuit whose weights break the expected symmetry either does not implement the claimed computation cleanly or implements it via a mechanism that does not respect the symmetry (which is a weaker, less general implementation). Metric geometry extends this by comparing the metric spaces of different circuits: the Gromov-Hausdorff distance between two circuits’ activation geometries measures how similar their structures are, without requiring them to live in the same ambient space — enabling cross-model comparison.
Angles vs distances
Section titled “Angles vs distances”Two directions in the residual stream can be close in Euclidean distance but far in angular distance (nearly orthogonal short vectors) or far in Euclidean distance but close in angular distance (parallel vectors of different magnitude). The distinction matters because the model’s linear operations — , , , , , — act on direction, not magnitude. A projection matrix applied to two vectors and preserves their angular relationship (up to the distortion introduced by ) but not their distance relationship.
In MI: angular separation between subspaces is more meaningful than Euclidean distance for representational structure. If the subspace encoding “subject is singular” and the subspace encoding “subject is plural” are nearly orthogonal, a downstream attention head can separate them with a simple linear projection. If they are nearly parallel, separation requires a nonlinear or high-dimensional operation. The angular structure of a circuit’s activation space constrains what computations can be performed downstream by linear projections — which is what attention heads and MLPs actually do. A circuit whose task-relevant subspaces are angularly tangled requires the model to use more computational resources to extract the distinctions, even if the information is present in principle.
Analytical Constructs
Section titled “Analytical Constructs”The Fisher information ellipsoid
Section titled “The Fisher information ellipsoid”At each point in activation space, the Fisher information matrix defines an ellipsoid whose principal axes are the eigenvectors of and whose axis lengths are the square roots of the eigenvalues. The shape of this ellipsoid is the geometric fingerprint of the circuit’s computation at that point.
The ellipsoid reveals structure that no single metric can:
- Anisotropy ratio — the ratio of the largest to smallest eigenvalue of . A highly anisotropic ellipsoid (ratio ) means the circuit is much more sensitive to some directions than others. This is expected for a circuit that performs a specific computation: it should be sensitive to task-relevant distinctions and insensitive to irrelevant variation.
- Principal axis alignment — do the principal curvature directions (eigenvectors of ) correspond to task-relevant features? If the direction of maximum sensitivity aligns with the direction that separates correct from incorrect completions, the circuit’s geometry is consistent with its claimed function.
- Eccentricity profile — how the anisotropy changes across the prompt distribution. A circuit with stable anisotropy (similar ellipsoid shape across prompts) has consistent geometric structure. One whose ellipsoid shape varies wildly is geometrically unstable — its sensitivity profile depends on the specific input.
To construct the ellipsoid: at each point (prompt, layer, circuit component set), compute the Fisher information matrix by taking the Jacobian of the circuit’s output logits with respect to the activations at that point. Extract eigenvalues and eigenvectors. Report the anisotropy ratio, the alignment of principal axes with known task features, and the stability of these quantities across the prompt distribution.
A circuit that detects a specific feature should have an elongated ellipsoid aligned with that feature’s direction. A circuit with a spherical ellipsoid (isotropic Fisher metric) treats all directions equally — it has no geometric preference for the claimed feature, which contradicts any claim of feature-specific computation.
Sources
Section titled “Sources”| Source | Year | Field | Principle |
|---|---|---|---|
| Rao, “Information and the accuracy attainable in the estimation of statistical parameters” | 1945 | Statistics | Fisher-Rao metric — the Fisher information matrix as a Riemannian metric on the space of probability distributions |
| do Carmo, Riemannian Geometry | 1992 | Mathematics | Riemannian curvature — intrinsic geometry of manifolds; parallel transport and geodesics |
| Amari, Information Geometry and Its Applications | 2016 | Information Geometry | Amari’s dualistic structure — the geometry of statistical manifolds, including the Fisher metric, alpha-connections, and curvature as a measure of model distinguishability |
| Bredon, Sheaf Theory | 1997 | Mathematics | Sheaf cohomology — algebraic formalism for local-to-global consistency; data on overlapping regions must agree on intersections |
| Curry, “Sheaves, cosheaves and applications” | 2014 | Applied Mathematics | Sheaves on networks — adaptation of sheaf theory to data on graphs and networks; consistency conditions as a measure of coherent information flow |
| Lee, Introduction to Smooth Manifolds | 2012 | Mathematics | Smooth manifold theory — tangent spaces, differential forms, and the coordinate-free framework for analyzing high-dimensional structure |
| Villani, Topics in Optimal Transport | 2003 | Mathematics | Optimal transport and Wasserstein distance — the cost of rearranging one distribution into another; a true metric that respects the geometry of the underlying space |
| Serre, Linear Representations of Finite Groups | 1977 | Mathematics | Representation theory — how symmetry groups act on vector spaces; equivariance conditions on linear maps |
| Gromov, Metric Structures for Riemannian and Non-Riemannian Spaces | 1999 | Mathematics | Gromov-Hausdorff distance — comparison of metric spaces without requiring a common embedding; enables cross-model geometric comparison |
Fisher-Rao geometry (Amari 2016): The Fisher information metric is the unique Riemannian metric (up to scaling) that is invariant under sufficient statistics. It measures the intrinsic distinguishability of nearby distributions. In MI: the Fisher metric on a circuit’s activation manifold measures what the circuit treats as different — not what can be decoded from it, but what changes its output behavior.
This lens contributes primarily to construct validity. The question is not whether the circuit is causally necessary (internal validity) or whether the effect generalizes (external validity), but whether the circuit has the geometric structure that the claimed computation requires. A circuit claimed to detect grammatical number should have high Fisher curvature along the singular-plural direction. A circuit claimed to implement coherent information flow should have high sheaf consistency. A circuit claimed to distinguish between two semantic categories should have angularly separated subspaces for those categories. A circuit claimed to be equivariant under name permutation should have weights that commute with the permutation representation. These are structural predictions derived from the computational claim, testable without intervention.
The lens also contributes three criteria to measurement validity: M10 (parallel transport fidelity) characterizes whether a representation measurement at one circuit component can be meaningfully compared to a measurement at another; M11 (Wasserstein stability) characterizes whether the circuit’s geometric structure is stable across conditions; and M12 (metric space comparison) enables cross-model geometric comparison without a shared ambient space.
Criteria
Section titled “Criteria”| Code | Criterion | What it asks | Validity type |
|---|---|---|---|
| C10 | Curvature coherence | Does the Fisher metric have structure matching the task — high curvature along task-relevant directions? | Construct |
| M10 | Parallel transport fidelity | Does a representation maintain its meaning as it moves through circuit components? | Measurement |
| C11 | Angular separability | Are task-relevant subspaces angularly separated in activation space? | Construct |
| M11 | Wasserstein stability | Is the circuit’s activation geometry stable across prompt samples and conditions? | Measurement |
| M12 | Metric space comparison | Can the circuit’s geometric structure be meaningfully compared across models? | Measurement |
| C13 | Symmetry equivariance | Do the circuit’s weights respect the symmetries implied by the computational claim? | Construct |
C10 — Curvature coherence
Section titled “C10 — Curvature coherence”The Fisher information metric on the circuit’s activation manifold should have structure matching the task: high curvature directions should align with task-relevant distinctions, and the curvature should be anisotropic rather than uniform.
A circuit that claims to detect grammatical number should be highly sensitive to perturbations along the singular-plural direction and insensitive to perturbations along orthogonal directions. The Fisher metric formalizes this: its eigenvectors define the directions of maximum and minimum sensitivity, and its eigenvalues quantify the sensitivity in each direction. Curvature coherence asks whether this sensitivity profile matches the task.
What it establishes. The circuit’s output is differentially sensitive to input perturbations in task-relevant directions. The geometry of the circuit’s activation manifold — its intrinsic notion of “which inputs are different” — is aligned with the task structure. This is a structural prediction: if the circuit computes what it claims to compute, it should have this geometric property. Meeting it provides convergent evidence that the circuit’s internal geometry is organized around the claimed computation.
What it does not establish. That the circuit is causally necessary for the task (that is I1), that the geometric structure generalizes to other models (that is E6), or that the curvature is the mechanism by which the circuit performs the computation. Curvature coherence is correlational evidence at the geometric level — it shows that the circuit’s sensitivity profile is consistent with the claim, not that the sensitivity causes the behavior.
Threshold. The top-3 principal curvature directions (eigenvectors of ) should predict task-relevant features with in a linear regression. The curvature anisotropy ratio (ratio of the largest to smallest principal curvature, i.e., the condition number of ) should be , indicating that the circuit treats some directions as substantially more important than others.
Minimum reporting.
- Eigenvalue spectrum of the Fisher information matrix, with the top-3 eigenvectors identified
- of linear regression from top-3 eigenvector projections to task-relevant labels, with bootstrap 95% CI
- Anisotropy ratio (max/min eigenvalue)
- Comparison to a size-matched random component set — if random components show comparable anisotropy, the curvature structure is architectural rather than circuit-specific
M10 — Parallel transport fidelity
Section titled “M10 — Parallel transport fidelity”A representation should maintain its meaning as it moves through circuit components. If a direction in activation space encodes “plural” at one circuit component, and that direction is transported through the residual stream to a downstream component, it should still encode “plural” at the destination. The sheaf consistency score formalizes this: it measures whether local representations at connected components agree when projected onto their shared subspace.
This criterion belongs to measurement validity rather than construct validity because it characterizes whether representational measurements at different circuit locations are commensurable. If parallel transport fidelity is low, a probe finding “this direction encodes X” at layer 5 cannot be meaningfully compared to a probe finding at layer 9 — the representational coordinate systems are incommensurable, and any comparison is a measurement artifact.
What it establishes. The circuit’s components share a consistent representational language. Information encoded at one location arrives at another location with its meaning intact. This is a prerequisite for any claim that the circuit implements a unified computation: if each component re-encodes information in an incompatible format, the “circuit” is a sequence of independent transformations rather than a coherent computational mechanism.
What it does not establish. That the transported representation is causally used by downstream components (that requires intervention evidence), or that the specific meaning of the representation has been correctly identified (that requires interpretive validity). A circuit can have perfect parallel transport fidelity while the analyst misidentifies what is being transported.
Threshold. Mean sheaf consistency across all edges in the circuit graph. No individual edge drops below . These thresholds reflect the requirement that representational coherence should be the norm across the circuit, not an occasional property of a few edges.
Minimum reporting.
- Mean sheaf consistency across circuit edges, with standard deviation
- Distribution of per-edge consistency scores, with the minimum edge identified
- Comparison to a baseline where the circuit components are connected in random order (shuffled circuit graph) — if the shuffled baseline shows comparable consistency, the coherence is a property of the residual stream in general, not of this circuit’s information flow
- Identification of any edges with consistency below , with interpretation of why those edges break representational coherence
C11 — Angular separability
Section titled “C11 — Angular separability”Task-relevant subspaces should be angularly separated in activation space. If a circuit is claimed to distinguish two categories — singular vs. plural, correct name vs. incorrect name, true vs. false — the subspaces encoding those categories should be separated by a substantial angle. Angular separation determines whether downstream linear operations (attention projections, MLP transformations) can extract the distinction without requiring nonlinear computation or high-dimensional projection.
What it establishes. The circuit’s representations are organized so that task-relevant distinctions are geometrically accessible to downstream linear readout. This is a necessary condition for any claim that the circuit enables a specific computation via the model’s standard linear-algebraic operations. If the subspaces are nearly parallel, the distinction exists only in magnitude (not direction), and separating them requires the downstream component to threshold on magnitude — a fragile operation that is sensitive to activation scale.
What it does not establish. That the model actually uses the angular separation (it may use a different mechanism entirely), or that the separation is unique to this circuit (other component sets may show comparable separation). Angular separability is a geometric affordance — it shows that the circuit’s geometry permits the claimed computation, not that the computation occurs.
Threshold. Mean pairwise angle between task-relevant subspaces . The angular separation between subspaces should be the angular spread within each subspace (where angular spread is the standard deviation of directions within a single category’s subspace). The ratio ensures that between-category separation exceeds within-category variability — the geometric analog of a signal-to-noise ratio.
Minimum reporting.
- Mean and standard deviation of pairwise angles between task-relevant subspaces
- Angular spread within each subspace, with the ratio computed
- Comparison to a random-subspace baseline: sample subspaces of the same dimension from a uniform distribution on the Grassmannian and report their expected pairwise angles. In high dimensions, random subspaces tend to be nearly orthogonal, so the baseline is informative only if task-relevant subspaces are expected to be closer together than random
- Visualization of the principal angles (canonical angles) between subspaces if the subspaces are low-dimensional
M11 — Wasserstein stability
Section titled “M11 — Wasserstein stability”The circuit’s activation geometry should be stable across prompt samples, tasks, and experimental conditions. Wasserstein distance (optimal transport cost) between the circuit’s activation distributions under different conditions measures whether the geometric structure is a robust property of the circuit or an artifact of the specific prompt sample.
This criterion uses Wasserstein distance rather than KL divergence because Wasserstein is a true metric (symmetric, satisfies triangle inequality) and respects the geometry of activation space — it measures the cost of physically rearranging one distribution into another, capturing structural differences that KL divergence can miss (e.g., two distributions with the same entropy but different spatial arrangement).
What it establishes. The circuit’s geometric properties — curvature, subspace angles, activation distributions — are reproducible across conditions. This is a measurement reliability criterion: if the geometry changes dramatically when you resample prompts or switch to a related task, the geometric properties measured by C10 and C11 may be artifacts of the sample rather than properties of the circuit.
What it does not establish. That the geometry is correct (that is C10), meaningful (that is C11), or causally relevant (that is I1). A circuit with perfect Wasserstein stability can have consistently wrong geometry.
Threshold. Wasserstein-1 distance between activation distributions from two independent prompt samples of size should be the Wasserstein distance between the circuit’s activations and a size-matched random component set. This ensures that within-circuit variability is small relative to the circuit’s distinctiveness from background.
Minimum reporting.
- Wasserstein-1 distance between bootstrap-resampled prompt sets (at least 50 resamples), with mean and 95% CI
- Wasserstein-1 distance between the circuit’s activations and a random component set of equal size (the reference scale)
- Ratio of within-circuit to circuit-vs-random Wasserstein distance
- If cross-task stability is claimed, Wasserstein distance between activation distributions on different tasks
M12 — Metric space comparison
Section titled “M12 — Metric space comparison”The circuit’s geometric structure should be comparable across models without requiring a shared ambient space. Gromov-Hausdorff distance measures the similarity between two metric spaces (circuits in different models) by finding the best alignment between them — the smallest distortion needed to embed both spaces in a common metric space.
This criterion enables cross-model geometric comparison: does the IOI circuit in GPT-2 Small have the same geometric structure as the IOI circuit in GPT-2 Medium? Standard comparisons (cosine similarity, CKA) require activations to live in the same vector space or to have the same dimensionality. Gromov-Hausdorff distance requires only that each circuit’s activations define a metric space — pairwise distances between activation vectors — and compares the distance structures directly.
What it establishes. Two circuits in different models have similar internal geometric structure — the pairwise distance relationships between their activations are preserved. This is evidence that the circuits implement similar computations at the geometric level, even if their ambient dimensions, weight magnitudes, and coordinate systems differ entirely.
What it does not establish. That the circuits implement the same computation functionally (that requires behavioral comparison), or that the geometric similarity is causal (that requires cross-model intervention). Geometric similarity is necessary but not sufficient for functional equivalence.
Threshold. Gromov-Hausdorff distance between the circuit’s activation metric spaces in two models should be the diameter of either metric space. The diameter is the maximum pairwise distance within each circuit’s activations.
Minimum reporting.
- Gromov-Hausdorff distance between circuit activation metric spaces in two or more models
- Diameter of each metric space (for normalization)
- The optimal correspondence (which activations in model A map to which activations in model B)
- Comparison to the Gromov-Hausdorff distance between the circuits and random component sets of equal size in each model
C13 — Symmetry equivariance
Section titled “C13 — Symmetry equivariance”The circuit’s weight matrices should respect the symmetries implied by the computational claim. If the claimed computation is equivariant under a group — e.g., permutation of name positions in IOI, shift of subject-verb positions in SVA — then the circuit’s weight matrices should commute with the group’s representations: for all .
This is a pure weight-space test: no forward passes required. It asks whether the circuit’s structure has the symmetry that its claimed computation demands. A circuit that implements IOI should treat the two name positions symmetrically in its weights. A circuit that detects grammatical number should have weights that are invariant under content-preserving transformations that preserve number. If the weights break the expected symmetry, the circuit either does not implement the claimed computation cleanly or implements it via a mechanism that is less general than the claim implies.
What it establishes. The circuit’s weight structure has the algebraic properties consistent with the claimed computation. The symmetries of the input-output map are reflected in the circuit’s internal structure. This is a structural plausibility test (strengthening C2) that is purely algebraic — it does not depend on the distribution of inputs, the choice of metric, or the activation dynamics.
What it does not establish. That the circuit uses the symmetry (the weights may commute with the group action accidentally), or that the symmetry is exact rather than approximate (real circuits may break symmetry slightly due to training dynamics). Approximate equivariance — weights that nearly commute with the group action — is the realistic expectation.
Threshold. Equivariance error for all generators of the symmetry group . The error is normalized by the weight norm to make it scale-invariant.
Minimum reporting.
- The symmetry group and its generators, with justification from the computational claim
- Equivariance error for each generator, with the Frobenius norm formula
- Comparison to the equivariance error of a random weight matrix of the same shape (the null distribution)
- If the symmetry is approximate, report the degree of symmetry breaking and whether it is consistent across layers
Evidence Patterns
Section titled “Evidence Patterns”| Evidence pattern | What it establishes | Recommended language |
|---|---|---|
| High curvature coherence + angular separability (C10 + C11) | Geometrically structured computation | ”The circuit’s activation geometry has curvature aligned with [feature] and angularly separated [category] subspaces” |
| Curvature coherence without angular separability (C10 only) | Sensitivity without clean readout | ”The circuit is sensitive to [feature] but the subspaces are not angularly separated for downstream linear readout” |
| High parallel transport fidelity + causal evidence (M10 + I1) | Coherent and causally relevant information flow | ”The circuit maintains representational coherence across components and is causally necessary for [behavior]“ |
| Wasserstein stability + curvature coherence (M11 + C10) | Robust geometric structure | ”The circuit’s curvature structure is stable across prompt samples (W₁ ratio < 0.2)“ |
| Symmetry equivariance + angular separability (C13 + C11) | Algebraically and geometrically structured computation | ”The circuit’s weights commute with [group] and its subspaces are angularly separated” |
| Cross-model metric comparison (M12) | Geometric universality | ”The circuit’s geometric structure is preserved across models (GH distance < 0.3× diameter)“ |
| Isotropic Fisher metric, low sheaf consistency | No geometric structure | ”No evidence of task-aligned geometry; the circuit’s activation manifold lacks both curvature structure and representational coherence” |
| All six criteria met | Full geometric validity | ”The circuit has curvature-aligned, stable, coherent, angularly separated, symmetric, and cross-model-comparable geometric structure” |
Verdicts
Section titled “Verdicts”- Proposed to Causally suggestive: The geometry lens does not gate the first transition. Geometric structure is convergent evidence, not a substitute for causal evidence from the neuroscience lens. However, an isotropic Fisher metric (C10 failed) or broken symmetry (C13 failed) is a warning: the circuit’s structure does not match the claim.
- Causally suggestive to Mechanistically supported: C10 (curvature coherence) and C13 (symmetry equivariance) strengthen the case that the circuit is not just causally relevant but structurally organized around the claimed computation. M10 (parallel transport fidelity) and M11 (Wasserstein stability) strengthen the case that representational measurements are commensurable and reproducible.
- Mechanistically supported to Triangulated: C11 (angular separability) provides convergent geometric evidence from a different evidence family. M12 (metric space comparison) contributes to cross-model generalization (E6). A circuit that passes causal criteria (I1-I5), curvature coherence (C10), parallel transport fidelity (M10), angular separability (C11), Wasserstein stability (M11), and symmetry equivariance (C13) has been triangulated across methods that share no methodological assumptions.
- Triangulated to Validated: M12 (metric space comparison) provides cross-model geometric evidence that complements behavioral cross-model tests (E6). A circuit with preserved geometric structure across model architectures has stronger evidence for universality.
Protocol
Section titled “Protocol”For a proposed circuit and behavior :
-
Fisher information computation. For each prompt in the evaluation set (at least 200 prompts), compute the Fisher information matrix at the circuit’s activation manifold by taking the Jacobian of the output log-probabilities with respect to the circuit’s activations. Extract eigenvalues and eigenvectors. Report the anisotropy ratio and the eigenvalue spectrum. Compare against a size-matched random component set.
-
Curvature-feature alignment. Project the top-3 eigenvectors of onto the task-relevant feature space (e.g., the direction separating singular from plural, or correct from incorrect completions). Fit a linear regression from these projections to the task labels. Report with bootstrap confidence intervals.
-
Sheaf consistency. Define the circuit graph: nodes are circuit components (heads, MLP layers), edges connect components that communicate via the residual stream. For each edge, compute the restriction maps from the local representations to the shared residual-stream subspace. Compute the consistency score at each edge. Report the mean, standard deviation, and minimum. Compare to a shuffled-graph baseline.
-
Angular separation. Identify the task-relevant subspaces (e.g., by collecting activations for each category and computing the principal subspace of each category’s activations). Compute pairwise angles between subspaces using canonical angles (principal angles). Report the mean pairwise angle and the angular spread within each subspace. Compute the separation-to-spread ratio.
-
Wasserstein stability. Split the prompt set into two independent halves. Compute the Wasserstein-1 distance between the circuit’s activation distributions on each half. Repeat with bootstrap resampling (at least 50 resamples). Report the mean W₁ distance and compare to the circuit-vs-random-components W₁ distance.
-
Symmetry equivariance. Identify the symmetry group implied by the computational claim (e.g., name permutation for IOI). For each generator of , construct the input and output representations and . Compute the equivariance error for each weight matrix in the circuit. Compare to random matrices.
-
Cross-model comparison (optional). If the same circuit has been identified in a second model, compute the Gromov-Hausdorff distance between the two circuits’ activation metric spaces. Report the distance normalized by the diameter of each space.
-
Integration. Assess whether the geometric evidence is consistent across all criteria. Identify discrepancies and interpret what they mean for the computational claim.
-
A skipped step must be named in the verdict.
Case Studies
Section titled “Case Studies”For full worked examples applying all lenses (including differential geometry) to published claims:
- IOI Circuit — the IOI circuit’s Fisher curvature can be analyzed via WC_M6 (Fisher-Rao), its parallel transport via WC_M7 (Sheaf Consistency), and its angular structure via WC_M2 (Angular Steering)
- Induction Heads — sheaf consistency is expected to be high for the two-layer composition (previous-token head to induction head)