Domain-combination based pan-genomic Tree of Life
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Reconstructing the Tree of Life (ToL) remains a central challenge in evolutionary biology. While sequence-based methods have resolved much of life’s diversity, deep evolutionary relationships often remain elusive, particularly across the three superkingdoms of life. Here, we present a complementary approach using genome-wide occurrence of protein domain combinations as phylogenetic signal. We analyzed 5,343 complete proteomes spanning Bacteria, Archaea, and Eukaryota, and inferred a pan-genomic ToL from ∼77,000 domain combinations, including isolated domains and domain pairs. Our reconstruction recovers major clades and captures both vertical inheritance and signals from non-vertical processes such as lateral gene transfer, endosymbiosis, and genome reduction. Ancestral reconstructions reveal functional innovations that mark key evolutionary transitions—such as the emergence of vesicle trafficking in Eukaryota or methanogenesis in Archaea. We found that organisms with reduced genomes, like Chlamydiota, tend to be misplaced in domain combination-based phylogenies due to sparse combinatorial data. Our results demonstrate that domain combinations carry a strong, evolutionary signal that is both biologically and functionally informative. While not a substitute for sequence-based phylogenetics, domain combination-based trees offer an alternative perspective, capable of integrating both vertical and lateral processes into a unified evolutionary framework.
Author Summary
Traditional evolutionary trees are typically built from sequences, but this approach struggles to resolve deep branches in the Tree of Life. Here, we use combinations of protein domains—the functional modules that make up proteins to resolve major relationships in the tree. By analyzing over 5,000 complete proteomes from Bacteria, Archaea, and Eukaryotes, we reconstruct a Tree of Life that reflects both vertical inheritance and key evolutionary processes like endosymbiosis and gene transfer. This domain-based approach not only recovers known relationships but also reveals how major biological innovations emerged. Our results show that domain combinations provide a powerful and interpretable signal for studying evolution at a genomic scale.