Skip to content

This framework asks: what is the effective complexity of each circuit component — is it geometrically simple (specialized) or degenerate (polyfunctional)?

Singular Learning Theory (SLT) provides a geometric characterization of neural network parameters at convergence. The Local Learning Coefficient (LLC) — also called the real log canonical threshold (RLCT) — measures how many effective parameters a component uses relative to its nominal parameter count. A component with low LLC is geometrically simple: it sits near a low-dimensional singularity in parameter space, consistent with implementing a single clean function. A component with high LLC is geometrically complex: it occupies a high-dimensional region, consistent with polyfunctionality or redundancy.

The MDL connection: LLC directly controls the model’s Bayesian Information Criterion at the component level. Lower LLC means shorter description length — the component can be described with fewer bits. This maps onto the intuition that interpretable circuits should be compressible: a head that implements a single algorithmic role (e.g., “copy the previous token”) should have lower effective complexity than a head juggling multiple unrelated tasks.

SourceYearKey contribution
Watanabe, Algebraic Geometry and Statistical Learning Theory2009SLT: real log canonical threshold governs generalization in singular models
Lau et al., arXiv 2310.194702023devinterp: practical LLC estimation for neural networks
Elhage et al., “A Mathematical Framework for Transformer Circuits”2021Circuit components as functional units amenable to complexity analysis
Olsson et al., “In-context Learning and Induction Heads”2022Phase transitions in learning; induction heads as discrete mechanistic structures

Core concept: the Local Learning Coefficient

Section titled “Core concept: the Local Learning Coefficient”

For a model with parameter ( w ) near a critical point ( w_0 ), the LLC ( \lambda ) governs the asymptotic free energy:

[ F_n = nL_n(w_0) + \lambda \log n + O(\log \log n) ]

where ( L_n ) is the empirical loss. The LLC is estimated via the SGLD (Stochastic Gradient Langevin Dynamics) trace:

[ \hat{\lambda} = \frac{n}{m} \left( \hat{L}_n^{\text{SGLD}} - L_n(w_0) \right) ]

For a regular (non-singular) model, ( \lambda = d/2 ) (half the parameter count). For singular models — which neural networks always are — ( \lambda ) can be much smaller, reflecting the low effective dimensionality of the parameter region. Components that implement clean, specialized functions tend to sit near lower-dimensional singularities and thus have lower LLC.

Hyperparam sensitivity (varying learning rate, data subset, initialization) complements LLC by measuring functional stability: a component whose role changes drastically with small hyperparameter changes is likely sitting in a high-dimensional, degenerate region.

Estimates the local learning coefficient for each circuit component by running SGLD chains from the trained weights and measuring the gap between the SGLD average loss and the MAP loss:

[ \hat{\lambda}(c) = \frac{n}{m} \left( \frac{1}{m} \sum_{t=1}^m L_n(w_t) - L_n(w_0) \right) ]

where ( w_t ) are SGLD samples restricted to component ( c )‘s parameters.

What it establishes: Effective complexity/dimensionality of each component’s parameter region. Low LLC indicates a specialized, interpretable role; high LLC indicates polyfunctionality or redundancy.

What it does not establish: What function the component implements (only its geometric complexity). Must be paired with A05 or A01 for functional characterization.

Usage:

uv run python 10_llc.py --tasks ioi sva --n-prompts 40

C29 — Hyperparameter Sensitivity (29_hyperparam_sensitivity.py)

Section titled “C29 — Hyperparameter Sensitivity (29_hyperparam_sensitivity.py)”

Measures how stable a component’s causal importance score is across hyperparameter variations (ablation method, prompt count, corruption type). High sensitivity indicates the component’s role is fragile or method-dependent:

[ \text{Sensitivity}(c) = \text{CV}\left[ AP(c; \theta_1), \ldots, AP(c; \theta_k) \right] ]

where ( \theta_i ) are different hyperparameter settings and CV is the coefficient of variation.

What it establishes: Robustness of causal conclusions to methodological choices.

What it does not establish: Which hyperparameter setting is “correct” — only the variance across them.

Usage:

uv run python 29_hyperparam_sensitivity.py --tasks ioi sva --n-prompts 40
PatternWhat it means
Low LLC, low sensitivityClean specialized component; likely interpretable
High LLC, low sensitivityComplex but stable; polyfunctional component reliably used
Low LLC, high sensitivitySpecialized but fragile; role depends on exact conditions
High LLC, high sensitivityDegenerate and unstable; difficult to interpret reliably

A09 provides a complexity characterization that complements A05’s (MDC/Glennan) mechanistic claims: a component hypothesized to implement a simple logic gate should have low LLC. A04 (Woodward) measures robustness to intervention method; A09’s hyperparam sensitivity measures robustness to evaluation method — both address reliability of causal claims. A08 (PID) identifies redundancy between components; high redundancy predicts that the redundant components sit in a degenerate (high-LLC) parameter region where permuting them does not change the loss.