Learning emergence from statistics

Published in eLife in 2022
What if we could uncover organizing principles of biology not by cataloguing every gene or molecule, but by learning from the statistical structure of the system itself?
In the first paper from our lab we showed that the hierarchy of genomic organization—often assumed to be encoded in molecular mechanisms or evolved architectures—can instead be inferred purely from data. Without prior knowledge of biological function, we used dimensionality reduction and spectral decomposition across the entirety of the UniProt database of bacteria (>7,000 bacteria) to reveal a deep, hierarchical structure of protein interactions. This structure built genomic complexity in a layered way—the bottom layer corresponded to each protein in a genome individually; the subsequent higher layers grouped proteins together into protein complexes, pathways, and meta-pathways.
This result carried several profound implications. First, it showed that emergent structures leaves a statistical trace, and that trace can be extracted without knowing anything about the underlying biology. Second, it overturned a decades-old dogma in the analysis of complex data: that only the top principal components are meaningful. In contrast, our work showed that meaning often resides in statistical areas that were previously dismissed as noise.
More broadly, this work presented one of the first statistical descriptions of emergence in a real biological context. It offers a radically different view of complexity—not as randomness or disorder to be averaged over, but as a consequence of deeply embedded, hierarchical constraints. In doing so, it laid the conceptual foundation for an alternative to the dominant statistical mechanical formalisms of complexity and emergence such as those proposed by Giorgio Parisi and others focused on disorder-generated complexity and ruggedness.
This work has formed the basis for many subsequent research efforts of our laboratory. It reframed how we think about biology, complexity, and the nature of inference itself. If emergence has a structure, this paper describes what that structure could statistically look like.
Relevant links
1. Defining hierarchical protein interaction networks from spectral analysis of bacterial proteomes
2. Systems biologist Arjun Raman looks at the big picture to build microbiomes that adapt
