research
Interests and projects
AI interpretability
I am currently working on AI interpretability, with a particular focus on transformer-based large language models (LLMs). I am actively working on two projects:
Geometry of LLMs’ latent representations and emergence of compositional features.
Previous work point to a nontrivial intrinsic geometry of token embeddings: they can be seen as points of a stratified space (Robinson et al., 2024), and different models (even trained on different modalities) have comparable embeddings (Roads & Love, 2020; Luo et al., 2024). I have computed that in language models such as GPT-2, k-means clustering or hierarchical clustering of the token embeddings reveal some degree of organization that combines semantics and potential syntactic roles. As a first approximation, a primitive feature is a characteristic of an embedded token (possibly shared by members of cluster) that is exploited by some attention or MLP block. Instead of taking the matrices of these blocks as the main objects of interest, my focus is on the coordinate-independent geometric entities (e.g. linear subspaces) that the matrices define in residual space, e.g. via their kernels, images, and singular vectors. Rather than interpreting the singular vectors directly in terms of input tokens, as in (Elhage et al., 2021) or (Dar et al., 2023), I interpret them as providing incremental modifications to the initial embeddings, “constructing” new context-dependent features from the primitives (thus obtaining compositional features); the interaction with layer norm might contribute to the emergence of discrete classes (Winsor, 2022). Since deeper layers do not only read the initial token embeddings but also the modifications introduced by previous blocks, we need new tools to quantify the relevance of modifications introduced at each layer. A combination of hierarchical clustering and matrix-based input-independent analysis of heads will give us a hierarchy of coarse grained descriptions of the action of each block on tokens and a coarse grained view of the interaction between different blocks.
Understanding internal representations of syntax in LLMs via mechanistic interpretability
Joint project with Aman Burman (undergraduate student) and Matilde Marcolli. Transformer-based language models are able to produce text that follows syntactic rules; such ability suddenly emerges in a brief period during training (Chen et al., 2024). By analogy with some toy models (see e.g. (Li et al., 2023)), we can hypothesize that a transformer encodes syntax as part of an internal “world representation”. We are using tools from mechanistic interpretability on small language models to extract encoded syntactic features, understand how they are distributed across layers, and identify which subnetworks or circuits utilize this information. We are investigating whether the syntactic processing in language models is hierarchical and analogous to syntactic trees in linguistics. Here we have in mind, in particular, the mathematical formalization of Chomsky’s minimalist program in the language of Hopf algebras (Marcolli et al., 2023; Marcolli et al., 2023). We want to refine the analysis in (Manning et al., 2020), which shows that the activations of attention heads are correlated with syntactic binary relations, but did not explore the mechanisms by which these relations are assembled into more complex trees.
Mathematics of information
More broadly, I’m interested in mathematical aspects of information theory, particularly in connection with category theory and geometry (metric geometry, geometric measure theory, …). My work in this regard can be organized around three axes:
- Topological characterization of information measures
- Information dimension and measures with geometric structure
- Magnitude and diversity
Each of these is described in more detail below. I’ve included links to videos and slides of relevant presentations, along other resources.
Topological characterization of information measures
In “simple” terms, information topology regards a statistical system as a generalized topological space (a topos) and identifies Shannon entropy, along other important “measures of information” used in information theory, as an invariant associated to this space.
Toposes or topoi are an abstraction of topological spaces in the language of category theory and sheaves introduced by Grothendieck and his collaborators (Artin, Verdier,…). Toposes allow richer cohomology theories than set-theoretic topological spaces, and some of these theories (e.g. étale cohomology) play a key role in modern algebraic geometry. Moreover, these Grothendieck toposes are particular cases of elementary toposes, which are “nice” categories with properties analogous to those of the category of sets that play an important role in logic.
Baudot and Bennequin (Baudot & Bennequin, 2015) first identified Shannon’s discrete entropy as a toposic invariant of certain categories of discrete observables. My Ph.D. thesis (Vigneaux, 2019) and a series of articles extended their results in several directions. Namely, the general homological constructions were abstracted from the concrete setting of discrete variables via information structures (categories that encode the relations of refinement between observables), allowing seamless extensions and generalizations to other settings such as continuous vector-valued observables (Vigneaux, 2020).
When the information structure encodes discrete observables, the classical information functions—– Shannon entropy, Tsallis $\alpha$-entropy, Kullback-Leibler divergence—–appear as 1-cocycles; the corresponding “coefficients” of the cohomology are probabilistic functionals (i.e. functions of probability laws). There is also a combinatorial version of the theory (coefficients are functions of histograms) where the only 0-cocycle is the exponential function and the 1-cocycles are generalized multinomial coefficients (Fontené-Ward) (Vigneaux, 2023). There is an asymptotic relation between the combinatorial and probabilistic cocycles.
For information structures that contain continuous vector-valued observables (besides discrete ones), the only new degree-one cocycles are Shannon’s differential entropy entropy and the dimension (of the support of the measure) (Vigneaux, 2021). This constitutes a novel algebraic characterization of differential entropy.
Information cohomology has seen some advances in the last years. Marcolli and Manin (Manin & Marcolli, 2020) related information structures with other homotopy- and category-theoretic models of neural information networks. Similar perspectives have been developed more recently by Belfiore and Bennequin (Belfiore & Bennequin, 2021) to tackle the problem of interpretability of neural networks. They associate to each neural network a certain category equipped with a Grothendieck topology (determined by the connectivity of the neurons), and study the category of sheaves on it, which is a topos. Every topos has an internal logic, and they are linking this internal toposic logic with the classification capabilities that emerge in each layer of a trained neural network (these were previously studied in the experimental article (Belfiore et al., 2021)).
Presentations:
- “Cohomological Aspects of Information” [video], Topos Institute, 2024: I summarize the main results that I have obtained in this domain.
- “Information Cohomology of Classical Vector-valued Observables” [video], GSI2021: I provide details on the characterization of the differential entropy and the dimension as the only cohomology classes in degree 1 for systems of vector-valued observables.
- “Entropy under disintegrations” [video], GSI 2021: I explain how every disintegration of a reference measure $\lambda$ induces a chain rule for the generalized differential entropy $S(\rho) = -\int \log (\frac{d\rho}{d\lambda}) d\rho$, which gives a foundation to the extension of information cohomology to more general observables e.g. with values in locally compact topological groups.
- “Variations on Information Theory: Categories, Cohomology, Entropy” [video], IHES, 2016: an older presentation, aimed at probabilistists, where I introduce the notion of (de Rham) cohomology and it’s analogue in our theory.
Other references:
- “On the Structure of Information Cohomology”, Ph.D. thesis by Hubert Dubé (U. Toronto), which introduces the Mayer-Vietoris long exact sequence, Shapiro’s lemma and Hochschild-Serre spectral sequence in the framework of information cohomology, and provides some bounds on the cohomological dimension along with new cohomological computations.
- “Information cohomology and Entropy”, master thesis by Luca Castiglioni (University of Milan).
Information dimension and measures with geometric structure
From an analytic perspective, the dimension has played an important in information theory since its inception, mainly in connection with quantization. By partitioning $\Rr^d$ into cubes with vertexes in $\mathbb Z^d/n$, one might quantize a continuous probability measure $\rho$ into a measure $\rho_n$ with countable support, whose entropy satisfies \begin{equation}\label{eq:expansion_law} H(\rho_n) = D\ln n + h + o(1), \end{equation} where $D=d$ and $h=h(\rho)$ is the differential entropy of $\rho$ (Kolmogorov & Shiryayev, 1993). Renyi (Rényi, 1959) turned this into a definition: if $\rho$ is now a general law and the expansion \eqref{eq:expansion_law} holds for some constants $D,h\in \Rr$, one calls $D$ the information dimension of $\rho$ and $h$ its $d$-dimensional entropy. He wondered about the “topological meaning” of the entropic dimension, which might be noninteger.
In (Vigneaux, 2023) I introduced an asymptotic equipartition property for discrete-continuous mixtures or, more generally, of convex combination of rectifiable measures on $\Rr^d$. In particular, it gives an interpretation for the information dimension $D$ of one of these measures $\rho$: the product $(\Rr^d)^n$ naturally splits into strata of different dimensions, and the typical realizations of $\rho^{\otimes n}$ concentrate on strata of a few dimensions close to $nD$. I also obtained volume estimates (in terms of Hausdorff measures) for the typical realizations in each typical stratum. (A measure $\rho$ is $m$-rectifiable if there exists a set $E$, equal to a countable union of $C^1$ manifolds, such that $\rho$ has a density with respect to the restricted HAusdorff measure $\mathcal H^m|_E$, which is the natural notion of $m$-dimensional volume on $E$.)
Presentations:
- “On the entropy of rectifiable and stratified measures” [slides], GSI 2023, Saint Malò, France.
- “Typicality for stratified measures” [slides], ETH Zurich, 2023.
Magnitude and diversity
Magnitude (Leinster, 2008) is a common categorical generalization of cardinality and of the Euler characteristic of a simplicial complex. It applies to enriched categories, of which metric spaces are a notable example, and in that case gives a new isometric invariant of metric spaces (Leinster, 2013). Applied to infinite metric spaces, this metric invariant—somehow surprisingly—encodes a lot of nontrivial geometric information, such as Minkowski dimension, volume, surface area, etc.(Meckes, 2015; Barceló & Carbery, 2018; Gimperlein & Goffeng, 2021). Partial differential equations, pseudodifferential operators and potential theory have played an important role in establishing these results.
In joint work with Stephanie Chen (Chen & Vigneaux, 2023) (SURF program 2022), we gave a new formula for the magnitude of a finite category $\cat{A}$ in terms of the pseudoinverse of the matrix \begin{equation} \zeta:\Ob\cat{A}\times \Ob \cat{A}\to \Zz, \, (a,b)\mapsto |\Hom(a,b)|. \end{equation} This was closer to the definition for posets (Rota, 1964) that had inspired Leinster. Our work also rederived algebraic properties of the magnitude from properties of the pseudoinverse.
In (Vigneaux, 2024) I propose a novel combinatorial interpretation for the inverse or pseudoinverse of $\zeta$, along the lines of (Brualdi & Cvetkovic, 2008). The interpretation generalizes a celebrated theorem by Philip Hall (Rota, 1964): \begin{equation} \zeta^{-1}(a,b)=\sum_{k\geq 0} (-1)^k \# \{ \text{nondegenerate paths between }a\text{ and }b \} \end{equation} when $a$ and $b$ are elements of a finite poset (in this case $\zeta$ is invertible; its inverse is known as Möbius function).
What does this have to do with information? Following Boltzmann ideas, entropy can be seen as an extension of cardinality: when all elements of a finite set $X$ are equiprobable, the entropy is $\ln |X|$. In turn, magnitude is a generalization of cardinality, and it is natural to introduce a probabilistic extension of it: “categorical entropy”. Stephanie and I (Chen & Vigneaux, 2023) proposed that categorical entropy is defined on finite categories equipped with a probability $p$ on objects and a “kernel” $\theta:\Ob\cat{A} \times \Ob \cat{A} \to [0,\infty)$ such that $\theta(a,a’)=0$ whenever $a\not\to a’$ via the formula \begin{equation}\label{eq:cat_entropy} \mathcal H(A,p,\theta) = - \sum_{a\in \Ob \cat A} p(a) \ln \left(\sum_{b\in \Ob \cat A} \theta(a,b)p(b) \right). \end{equation} This function shares many “nice” properties with Shannon entropy. In the context of metric spaces equipped with probability, \eqref{eq:cat_entropy} appears as a measure of diversity between species when $p$ is its relative abundance and $\theta$ measures their similarity (Leinster & Cobbold, 2012).
Presentations:
- “A Combinatorial Approach to Categorical Möbius Inversion and Magnitude” [video], Applied Algebraic Topology Network, 2024.
- “Categorical Magnitude and Entropy” [slides], GSI 2023, Saint Malo, France.
Bibliography
- Robinson, M., Dey, S., & Sweet, S. (2024). The Structure of the Token Space for Large Language Models (Number arXiv:2410.08993). arXiv.
- Roads, B. D., & Love, B. C. (2020). Learning as the Unsupervised Alignment of Conceptual Systems. Nature Machine Intelligence, 2(1), 76–82. https://doi.org/10.1038/s42256-019-0132-2
- Luo, K., Zhang, B., Xiao, Y., & Lake, B. M. (2024). Finding Unsupervised Alignment of Conceptual Systems in Image-Word Representations. Proceedings of the Annual Meeting of the Cognitive Science Society, 46(0).
- Elhage, N., Nanda, N., Olsson, C., Henighan, T., Joseph, N., Mann, B., Askell, A., Bai, Y., Chen, A., Conerly, T., DasSarma, N., Drain, D., Ganguli, D., Hatfield-Dodds, Z., Hernandez, D., Jones, A., Kernion, J., Lovitt, L., Ndousse, K., … Olah, C. (2021). A Mathematical Framework for Transformer Circuits. Transformer Circuits Thread.
- Dar, G., Geva, M., Gupta, A., & Berant, J. (2023). Analyzing Transformers in Embedding Space. In A. Rogers, J. Boyd-Graber, & N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 16124–16170). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.893
- Winsor, E. (2022). Re-Examining LayerNorm.
- Chen, A., Shwartz-Ziv, R., Cho, K., Leavitt, M. L., & Saphra, N. (2024). Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in Mlms. The Twelfth International Conference on Learning Representations.
- Li, K., Hopkins, A. K., Bau, D., Viégas, F., Pfister, H., & Wattenberg, M. (2023). Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task. The Eleventh International Conference on Learning Representations.
- Marcolli, M., Chomsky, N., & Berwick, R. (2023). Mathematical Structure of Syntactic Merge (Number arXiv:2305.18278). arXiv.
- Marcolli, M., Berwick, R. C., & Chomsky, N. (2023). Syntax-Semantics Interface: An Algebraic Model (Number arXiv:2311.06189). arXiv.
- Manning, C. D., Clark, K., Hewitt, J., Khandelwal, U., & Levy, O. (2020). Emergent Linguistic Structure in Artificial Neural Networks Trained by Self-Supervision. Proceedings of the National Academy of Sciences, 117(48), 30046–30054. https://doi.org/10.1073/pnas.1907367117
- Baudot, P., & Bennequin, D. (2015). The Homological Nature of Entropy. Entropy, 17(5), 3253–3318.
- Vigneaux, J. P. (2019). Topology of Statistical Systems: A Cohomological Approach to Information Theory [PhD thesis]. Université de Paris.
- Vigneaux, J. P. (2020). Information structures and their cohomology. Theory and Applications of Categories, 35(38), 1476–1529.
- Vigneaux, J. P. (2023). A characterization of generalized multinomial coefficients related to the entropic chain rule. Aequationes Mathematicae, 97(2), 231–255.
- Vigneaux, J. P. (2021). Information cohomology of classical vector-valued observables. In F. "Nielsen & F. Barbaresco (Eds.), GSI 2021: Geometric Science of Information (Vol. 12829, pp. 537–546). Springer.
- Manin, Y., & Marcolli, M. (2020). Homotopy Theoretic and Categorical Models of Neural Information Networks. ArXiv Preprint ArXiv:2006.15136.
- Belfiore, J.-C., & Bennequin, D. (2021). Topos and stacks of deep neural networks. ArXiv Preprint ArXiv:2106.14587.
- Belfiore, J.-C., Bennequin, D., & Giraud, X. (2021). Logical Information Cells I. ArXiv Preprint ArXiv:2108.04751.
- Kolmogorov, A. N., & Shiryayev, A. N. (1993). Selected Works of A. N. Kolmogorov. Volume III: Information Theory and the Theory of Algorithms. Kluwer Academic Publishers.
- Rényi, A. (1959). On the dimension and entropy of probability distributions. Acta Mathematica Academiae Scientiarum Hungarica, 10(1), 193–215.
- Vigneaux, J. P. (2023). Typicality for stratified measures. IEEE Transactions on Information Theory, 69(11), 6922–6940.
- Leinster, T. (2008). The Euler Characteristic of a Category. Documenta Mathematica, 13, 21–49.
- Leinster, T. (2013). The magnitude of metric spaces. Documenta Mathematica, 18, 857–905.
- Meckes, M. W. (2015). Magnitude, diversity, capacities, and dimensions of metric spaces. Potential Analysis, 42, 549–572.
- Barceló, J. A., & Carbery, A. (2018). On the magnitudes of compact sets in Euclidean spaces. American Journal of Mathematics, 140(2), 449–494.
- Gimperlein, H., & Goffeng, M. (2021). On the magnitude function of domains in Euclidean space. American Journal of Mathematics, 143(3), 939–967.
- Chen, S., & Vigneaux, J. P. (2023). A formula for the categorical magnitude in terms of the Moore-Penrose pseudoinverse. Bulletin of the Belgian Mathematical Society - Simon Stevin, 30(3), 341–353.
- Rota, G.-C. (1964). On the foundations of combinatorial theory I. Theory of Möbius functions. Probability Theory and Related Fields, 2(4), 340–368.
- Vigneaux, J. P. (2024). A combinatorial approach to categorical Möbius inversion and pseudoinversion. ArXiv Preprint 2407.14647.
- Brualdi, R. A., & Cvetkovic, D. (2008). A Combinatorial Approach to Matrix Theory and Its Applications. CRC Press. https://books.google.com/books?id=pwx6t8QfZU8C
- Leinster, T., & Cobbold, C. A. (2012). Measuring diversity: the importance of species similarity. Ecology, 93(3), 477–489.