The information content of species: formal definitions of pangenome complexity track with bacterial lifestyle

Apurva Narechania
Dean Bobo
M Thomas P Gilbert
Shyam Gopalakrishnan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Genes and other genomic elements have variable presence absence patterns across most bacterial species. Pangenome fluidity is often invoked to measure this genome flux. Fluid pangenomes contain genes found only in subsets of species strains. Tighter pangenomes contain more genes that define a shared core. Species definitions are often tied to this pangenome diversity. In any global comparative framework, pangenomes must be calculated across all known species. But defining pangenomes is fraught with computational and biological challenges, requiring assembly, annotation, alignment, and phylogenetics of millions of orthologs. We offer an alternate view that de-centers the gene and emphasizes the raw information content of sequences. Information is data that reduces uncertainty. Tight pangenomes, with elements repeated across every strain in a species ensemble, contain more complete information. In contrast, fluid pangenomes have more uncertainty, higher complexity, and higher information diversity. Bacterial lifestyle has been shown to drive this information diversity. For example, challenging environments often increase information diversity by encouraging the accrual of auxiliary genes. Here, we employ agile complexity metrics to quantify this increase. Ensembles of free-living, motile, and non-pathogenic species have high genomic complexity. Ensemble complexity decreases in species bound to specific hosts. Because we eliminate annotation and alignment, our method is fast enough to evaluate species across all known bacterial genomes. The approach democratizes classification and our results highlight how broad the term “species” has become.

Version published to 10.1101/2025.03.28.645969 on bioRxiv
Apr 2, 2025

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

This article has 7 authors:
1. Grazia Visci
2. Elisabetta Notario
3. Giuseppe Defazio
4. Mariano Francesco Caratozzolo
5. Bruno Fosso
6. Marinella Marzano
7. Graziano Pesole
This article has no evaluationsLatest version Jan 30, 2026
Divergent Bacteriophages from Wastewater Reveal an Open Pan-Genome with No Shared Gene Families

This article has 4 authors:
1. Malihe Hamidzade
2. Kimia Sharifian
3. Seyed Jalal Kiani
4. Alieza Mohebbi
This article has no evaluationsLatest version Dec 19, 2025
The heterogeneous selection landscape of genome evolution in prokaryotes

This article has 5 authors:
1. Eugene Koonin
2. Sofiya Garushyants
3. Svetlana Karamycheva
4. Nash Rochman
5. Yuri Wolf
This article has no evaluationsLatest version Dec 12, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

Divergent Bacteriophages from Wastewater Reveal an Open Pan-Genome with No Shared Gene Families

The heterogeneous selection landscape of genome evolution in prokaryotes