Pitfalls of bacterial pan-genome analysis approaches: a case study of Mycobacterium tuberculosis and two less clonal bacterial species

Maximillian G Marin
Natalia Quinones-Olvera
Christoph Wippel
Mahboobeh Behruznia
Brendan M Jeffrey
Michael Harris
Brendon C Mann
Alex Rosenthal
Karen R Jacobson
Robin M Warren
Heng Li
Conor J Meehan
Maha R Farhat

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Summary

Pan-genome analysis is a fundamental tool for studying bacterial genome evolution; however, the variety in methods used to define and measure the pan-genome poses challenges to the interpretation and reliability of results. Using Mycobacterium tuberculosis, a clonally evolving bacterium with a small accessory genome, as a model system, we systematically evaluated sources of variability in pan-genome estimates. Our analysis revealed that differences in assembly type (short-read versus hybrid), annotation pipeline, and pan-genome software, significantly impact predictions of core and accessory genome size. Extending our analysis to two additional bacterial species, Escherichia coli and Staphylococcus aureus, we observed consistent tool-dependent biases but species-specific patterns in pan-genome variability. Our findings highlight the importance of integrating nucleotide- and protein-level analyses to improve the reliability and reproducibility of pan-genome studies across diverse bacterial populations.

Availability and implementation

Panqc is freely available under an MIT license at https://github.com/maxgmarin/panqc.

Version published to 10.1093/bioinformatics/btaf219
May 1, 2025
Version published to 10.1101/2024.03.21.586149 on bioRxiv
Mar 25, 2024

The heterogeneous selection landscape of genome evolution in prokaryotes

This article has 5 authors:
1. Eugene Koonin
2. Sofiya Garushyants
3. Svetlana Karamycheva
4. Nash Rochman
5. Yuri Wolf
This article has no evaluationsLatest version Dec 12, 2025
Genomic characterisation of Mycoplasma genitalium in Victoria, Australia, reveals lineage diversification and drivers of antimicrobial resistance.

This article has 17 authors:
1. Francesca Azzato
2. George Taiaroa
3. Janath Fernando
4. Mona L. Taouk
5. Vesna De Petra
6. Lenka A. Vodstrcil
7. Erica L. Plummer
8. Kerry Raios
9. Niamh Meagher
10. Jacqueline Prestedge
11. Eloise Williams
12. Leon Caly
13. Danielle J. Ingle
14. Benjamin P. Howden
15. Shivani Pasricha
16. Catriona S. Bradshaw
17. Deborah A. Williamson
This article has no evaluationsLatest version Jan 19, 2026
Whole-Genome Sequencing of Staphylococcus cohnii Isolated from Healthy Human Skin: Insights into Genomic Features and Antibacterial Potential

This article has 10 authors:
1. Duc Huy Nguyen
2. Thi Chau Anh Nguyen
3. Thi Minh Nga Nguyen
4. Ha Minh Nhat Truong
5. Thi Khanh Linh Nguyen
6. Thi Tuyen Nguyen
7. Thi Ngoc Mai Duong
8. Thi Hai Dinh
9. Van An Le
10. Dinh Binh Tran
This article has no evaluationsLatest version Jan 1, 2026

Discuss this preprint

Listed in

Abstract

Summary

Availability and implementation

Article activity feed

Related articles

The heterogeneous selection landscape of genome evolution in prokaryotes

Genomic characterisation of Mycoplasma genitalium in Victoria, Australia, reveals lineage diversification and drivers of antimicrobial resistance.

Whole-Genome Sequencing of Staphylococcus cohnii Isolated from Healthy Human Skin: Insights into Genomic Features and Antibacterial Potential