research
Interests and projects
I’m broadly interested in mathematical aspects of information theory, particularly in connection with category theory and geometry (metric geometry, geometric measure theory, …).
My published work can be divided in three different areas:
- Topological characterization of information measures
- Information dimension and measures with geometric structure
- Magnitude and diversity
I’ve included links to videos and slides of relevant presentations that I have given in each area, along other resources.
I’m currently involved in two research projects in AI:
- Identification of syntactic features within LLMs (with Matilde Marcolli and Aman Burman), using layer-wise relevance propagation and some tools from mechanistic interpretability (probes, activation patching,…).
- Explanation of complex classifiers based on LLMs using an enhanced version of layer-wise relevance propagation (with Markus Marks).
Topological characterization of information measures
In “simple” terms, information topology regards a statistical system as a generalized topological space (a topos) and identifies Shannon entropy, along other important “measures of information” used in information theory, as an invariant associated to this space.
Toposes or topoi are an abstraction of topological spaces in the language of category theory and sheaves introduced by Grothendieck and his collaborators (Artin, Verdier,…). Toposes allow richer cohomology theories than set-theoretic topological spaces, and some of these theories (e.g. étale cohomology) play a key role in modern algebraic geometry. Moreover, these Grothendieck toposes are particular cases of elementary toposes, which are “nice” categories with properties analogous to those of the category of sets that play an important role in logic.
Baudot and Bennequin (Baudot & Bennequin, 2015) first identified Shannon’s discrete entropy as a toposic invariant of certain categories of discrete observables. My Ph.D. thesis (Vigneaux, 2019) and a series of articles extended their results in several directions. Namely, the general homological constructions were abstracted from the concrete setting of discrete variables via information structures (categories that encode the relations of refinement between observables), allowing seamless extensions and generalizations to other settings such as continuous vector-valued observables (Vigneaux, 2020).
When the information structure encodes discrete observables, the classical information functions—– Shannon entropy, Tsallis $\alpha$-entropy, Kullback-Leibler divergence—–appear as 1-cocycles; the corresponding “coefficients” of the cohomology are probabilistic functionals (i.e. functions of probability laws). There is also a combinatorial version of the theory (coefficients are functions of histograms) where the only 0-cocycle is the exponential function and the 1-cocycles are generalized multinomial coefficients (Fontené-Ward) (Vigneaux, 2023). There is an asymptotic relation between the combinatorial and probabilistic cocycles.
For information structures that contain continuous vector-valued observables (besides discrete ones), the only new degree-one cocycles are Shannon’s differential entropy entropy and the dimension (of the support of the measure) (Vigneaux, 2021). This constitutes a novel algebraic characterization of differential entropy.
Information cohomology has seen some advances in the last years. Marcolli and Manin (Manin & Marcolli, 2020) related information structures with other homotopy- and category-theoretic models of neural information networks. Similar perspectives have been developed more recently by Belfiore and Bennequin (Belfiore & Bennequin, 2021) to tackle the problem of interpretability of neural networks. They associate to each neural network a certain category equipped with a Grothendieck topology (determined by the connectivity of the neurons), and study the category of sheaves on it, which is a topos. Every topos has an internal logic, and they are linking this internal toposic logic with the classification capabilities that emerge in each layer of a trained neural network (these were previously studied in the experimental article \cite{belfiore2021logical}).
Presentations:
- “Cohomological Aspects of Information” [video], Topos Institute, 2024: I summarize the main results that I have obtained in this domain.
- “Information Cohomology of Classical Vector-valued Observables” [video], GSI2021: I provide details on the characterization of the differential entropy and the dimension as the only cohomology classes in degree 1 for systems of vector-valued observables.
- “Entropy under disintegrations” [video], GSI 2021: I explain how every disintegration of a reference measure $\lambda$ induces a chain rule for the generalized differential entropy $S(\rho) = -\int \log (\frac{d\rho}{d\lambda}) d\rho$, which gives a foundation to the extension of information cohomology to more general observables e.g. with values in locally compact topological groups.
- “Variations on Information Theory: Categories, Cohomology, Entropy” [video], IHES, 2016: an older presentation, aimed at probabilistists, where I introduce the notion of (de Rham) cohomology and it’s analogue in our theory.
Other references:
- “On the Structure of Information Cohomology”, Ph.D. thesis by Hubert Dubé (U. Toronto), which introduces the Mayer-Vietoris long exact sequence, Shapiro’s lemma and Hochschild-Serre spectral sequence in the framework of information cohomology, and provides some bounds on the cohomological dimension along with new cohomological computations.
- “Information cohomology and Entropy”, master thesis by Luca Castiglioni (University of Milan).
Information dimension and measures with geometric structure
From an analytic perspective, the dimension has played an important in information theory since its inception, mainly in connection with quantization. By partitioning $\Rr^d$ into cubes with vertexes in $\mathbb Z^d/n$, one might quantize a continuous probability measure $\rho$ into a measure $\rho_n$ with countable support, whose entropy satisfies \begin{equation}\label{eq:expansion_law} H(\rho_n) = D\ln n + h + o(1), \end{equation} where $D=d$ and $h=h(\rho)$ is the differential entropy of $\rho$ (Kolmogorov & Shiryayev, 1993). Renyi (Rényi, 1959) turned this into a definition: if $\rho$ is now a general law and the expansion \eqref{eq:expansion_law} holds for some constants $D,h\in \Rr$, one calls $D$ the information dimension of $\rho$ and $h$ its $d$-dimensional entropy. He wondered about the “topological meaning” of the entropic dimension, which might be noninteger.
In (Vigneaux, 2023) I introduced an asymptotic equipartition property for discrete-continuous mixtures or, more generally, of convex combination of rectifiable measures on $\Rr^d$. In particular, it gives an interpretation for the information dimension $D$ of one of these measures $\rho$: the product $(\Rr^d)^n$ naturally splits into strata of different dimensions, and the typical realizations of $\rho^{\otimes n}$ concentrate on strata of a few dimensions close to $nD$. I also obtained volume estimates (in terms of Hausdorff measures) for the typical realizations in each typical stratum. (A measure $\rho$ is $m$-rectifiable if there exists a set $E$, equal to a countable union of $C^1$ manifolds, such that $\rho$ has a density with respect to the restricted HAusdorff measure $\mathcal H^m|_E$, which is the natural notion of $m$-dimensional volume on $E$.)
Presentations:
- “On the entropy of rectifiable and stratified measures” [slides], GSI 2023, Saint Malò, France.
- “Typicality for stratified measures” [slides], ETH Zurich, 2023.
Magnitude and diversity
Magnitude (Leinster, 2008) is a common categorical generalization of cardinality and of the Euler characteristic of a simplicial complex. It applies to enriched categories, of which metric spaces are a notable example, and in that case gives a new isometric invariant of metric spaces (Leinster, 2013). Applied to infinite metric spaces, this metric invariant—somehow surprisingly—encodes a lot of nontrivial geometric information, such as Minkowski dimension, volume, surface area, etc.(Meckes, 2015; Barceló & Carbery, 2018; Gimperlein & Goffeng, 2021). Partial differential equations, pseudodifferential operators and potential theory have played an important role in establishing these results.
In joint work with Stephanie Chen (Chen & Vigneaux, 2023) (SURF program 2022), we gave a new formula for the magnitude of a finite category $\cat{A}$ in terms of the pseudoinverse of the matrix \begin{equation} \zeta:\Ob\cat{A}\times \Ob \cat{A}\to \Zz, \, (a,b)\mapsto |\Hom(a,b)|. \end{equation} This was closer to the definition for posets (Rota, 1964) that had inspired Leinster. Our work also rederived algebraic properties of the magnitude from properties of the pseudoinverse.
In (Vigneaux, 2024) I propose a novel combinatorial interpretation for the inverse or pseudoinverse of $\zeta$, along the lines of (Brualdi & Cvetkovic, 2008). The interpretation generalizes a celebrated theorem by Philip Hall (Rota, 1964): \begin{equation} \zeta^{-1}(a,b)=\sum_{k\geq 0} (-1)^k \# \{ \text{nondegenerate paths between }a\text{ and }b \} \end{equation} when $a$ and $b$ are elements of a finite poset (in this case $\zeta$ is invertible; its inverse is known as Möbius function).
What does this have to do with information? Following Boltzmann ideas, entropy can be seen as an extension of cardinality: when all elements of a finite set $X$ are equiprobable, the entropy is $\ln |X|$. In turn, magnitude is a generalization of cardinality, and it is natural to introduce a probabilistic extension of it: “categorical entropy”. Stephanie and I (Chen & Vigneaux, 2023) proposed that categorical entropy is defined on finite categories equipped with a probability $p$ on objects and a “kernel” $\theta:\Ob\cat{A} \times \Ob \cat{A} \to [0,\infty)$ such that $\theta(a,a’)=0$ whenever $a\not\to a’$ via the formula \begin{equation}\label{eq:cat_entropy} \mathcal H(A,p,\theta) = - \sum_{a\in \Ob \cat A} p(a) \ln \left(\sum_{b\in \Ob \cat A} \theta(a,b)p(b) \right). \end{equation} This function shares many “nice” properties with Shannon entropy. In the context of metric spaces equipped with probability, \eqref{eq:cat_entropy} appears as a measure of diversity between species when $p$ is its relative abundance and $\theta$ measures their similarity (Leinster & Cobbold, 2012).
Presentations:
- “A Combinatorial Approach to Categorical Möbius Inversion and Magnitude” [video], Applied Algebraic Topology Network, 2024.
- “Categorical Magnitude and Entropy” [slides], GSI 2023, Saint Malo, France.
Bibliography
- Baudot, P., & Bennequin, D. (2015). The Homological Nature of Entropy. Entropy, 17(5), 3253–3318.
- Vigneaux, J. P. (2019). Topology of Statistical Systems: A Cohomological Approach to Information Theory [PhD thesis]. Université de Paris.
- Vigneaux, J. P. (2020). Information structures and their cohomology. Theory and Applications of Categories, 35(38), 1476–1529.
- Vigneaux, J. P. (2023). A characterization of generalized multinomial coefficients related to the entropic chain rule. Aequationes Mathematicae, 97(2), 231–255.
- Vigneaux, J. P. (2021). Information cohomology of classical vector-valued observables. In F. "Nielsen & F. Barbaresco (Eds.), GSI 2021: Geometric Science of Information (Vol. 12829, pp. 537–546). Springer.
- Manin, Y., & Marcolli, M. (2020). Homotopy Theoretic and Categorical Models of Neural Information Networks. ArXiv Preprint ArXiv:2006.15136.
- Belfiore, J.-C., & Bennequin, D. (2021). Topos and stacks of deep neural networks. ArXiv Preprint ArXiv:2106.14587.
- Kolmogorov, A. N., & Shiryayev, A. N. (1993). Selected Works of A. N. Kolmogorov. Volume III: Information Theory and the Theory of Algorithms. Kluwer Academic Publishers.
- Rényi, A. (1959). On the dimension and entropy of probability distributions. Acta Mathematica Academiae Scientiarum Hungarica, 10(1), 193–215.
- Vigneaux, J. P. (2023). Typicality for stratified measures. IEEE Transactions on Information Theory, 69(11), 6922–6940.
- Leinster, T. (2008). The Euler Characteristic of a Category. Documenta Mathematica, 13, 21–49.
- Leinster, T. (2013). The magnitude of metric spaces. Documenta Mathematica, 18, 857–905.
- Meckes, M. W. (2015). Magnitude, diversity, capacities, and dimensions of metric spaces. Potential Analysis, 42, 549–572.
- Barceló, J. A., & Carbery, A. (2018). On the magnitudes of compact sets in Euclidean spaces. American Journal of Mathematics, 140(2), 449–494.
- Gimperlein, H., & Goffeng, M. (2021). On the magnitude function of domains in Euclidean space. American Journal of Mathematics, 143(3), 939–967.
- Chen, S., & Vigneaux, J. P. (2023). A formula for the categorical magnitude in terms of the Moore-Penrose pseudoinverse. Bulletin of the Belgian Mathematical Society - Simon Stevin, 30(3), 341–353.
- Rota, G.-C. (1964). On the foundations of combinatorial theory I. Theory of Möbius functions. Probability Theory and Related Fields, 2(4), 340–368.
- Vigneaux, J. P. (2024). A combinatorial approach to categorical Möbius inversion and pseudoinversion. ArXiv Preprint 2407.14647.
- Brualdi, R. A., & Cvetkovic, D. (2008). A Combinatorial Approach to Matrix Theory and Its Applications. CRC Press. https://books.google.com/books?id=pwx6t8QfZU8C
- Leinster, T., & Cobbold, C. A. (2012). Measuring diversity: the importance of species similarity. Ecology, 93(3), 477–489.