Skip to content

This framework asks: What is the true geometric dimensionality of the manifold on which activations live?

Intrinsic dimension (ID) goes beyond PCA by measuring the dimensionality of the data manifold rather than its linear span. A set of activations might occupy a high-dimensional ambient space but actually lie on a low-dimensional curved surface. ID estimation reveals this hidden structure — connecting to model complexity, generalization, and the local learning coefficient.

The local learning coefficient (LLC) from singular learning theory provides a related measure: it quantifies the effective number of parameters the model uses near a given point in weight space. Low LLC relative to parameter count indicates the model has found a low-dimensional solution.

SourceYearKey contribution
Ansuini et al., “Intrinsic dimension of data representations in deep neural networks”2019Two-NN estimator for layer-wise ID
Facco et al., “Estimating the intrinsic dimension of datasets”2017Two-NN method foundation
Watanabe, “Algebraic Geometry and Statistical Learning Theory”2009Singular learning theory and RLCT
Lau et al., “Quantifying Degeneracy in Singular Models via the Learning Coefficient”2023LLC estimation for neural networks

The Two-NN estimator computes ID from the ratio of distances to second and first nearest neighbors. For each point ( x_i ), let ( r_1(i) ) and ( r_2(i) ) be the distances to its nearest and second-nearest neighbors. Then:

[ \mu_i = \frac{r_2(i)}{r_1(i)}, \qquad \hat{d} = \frac{n}{\sum_{i=1}^n \log \mu_i} ]

This maximum-likelihood estimator assumes locally uniform density on a ( d )-dimensional manifold. The LLC provides a complementary perspective from weight space:

[ \lambda = \text{LLC}(\theta^) \approx \frac{\text{effective parameters near } \theta^}{2} ]

Low ( \lambda ) indicates the loss landscape near the solution is highly degenerate — the model found a low-complexity representation.

Estimates the local learning coefficient via MCMC sampling around the trained weights, providing a complexity measure for each circuit component.

What it establishes: The effective geometric complexity of the learned solution — how many dimensions the model truly uses in weight space. What it does not establish: Which specific directions are “active” (combine with E06/E08 for that).

Usage:

uv run python 10_llc.py --tasks ioi sva

Estimates activation manifold dimensionality at each layer using the Two-NN method on cached activations.

What it establishes: The manifold dimensionality of representations — how many independent axes of variation exist. What it does not establish: Whether those axes correspond to interpretable variables.

Usage:

uv run python 10_llc.py --tasks ioi sva --two-nn
PatternWhat it means
ID ( \ll ) ambient dimensionActivations on a low-dimensional manifold; compact structure
ID increases through layersNetwork progressively unfolds compressed representations
LLC ( \ll ) parameter countModel solution is highly degenerate — simpler than capacity allows
LLC matches task variable countModel complexity aligns with task complexity
ID spike at specific layerRepresentational expansion — new dimensions computed at that layer