CMAPLE 2: Fast and Accurate Phylogenetic Inference for Millions of Pathogen Genomes

Nhan Ly-Trong
Samuel Martin
Nick Goldman
Nicola De Maio
Bui Quang Minh

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Phylogenetic analysis is essential to genomic epidemiology, for example in tracing the origin and evolution of SARS-CoV-2 variants during the COVID-19 pandemic. We previously introduced CMAPLE, a single-threaded implementation of the MAPLE algorithm designed for large-scale epidemiological genomic datasets. CMAPLE can reconstruct phylogenetic trees from up to one million SARS-CoV-2 genomes. Here, we present CMAPLE 2, a multi-threaded version of CMAPLE with parallel sample placement and subtree pruning and regrafting (SPR) search algorithms. CMAPLE 2 also reduces memory consumption by compressing data structures using multiple references along the tree instead of a single reference genome. It further implements two advanced models of highly site- and nucleotide-specific mutation patterns as observed in pandemic-scale genome data. Additionally, CMAPLE 2 parallelizes SPR-based Tree Assessment (SPRTA), an efficient and interpretable approach for assessing phylogenetic tree uncertainty, and supports ancestral state and mutation inference via mutation-annotated tree (MAT) reconstruction. When inferring a phylogeny from 500,000 SARS-CoV-2 genomes using 48 CPU cores, CMAPLE 2 reduces runtime from 5 days (with sequential CMAPLE) to 9 hours (a 13-fold speedup) while decreasing peak RAM usage from 11.1 GB to 7.3 GB. CMAPLE 2 can now reconstruct a tree of nearly four million SARS-CoV-2 genomes from scratch within 12 days using 41 GB of RAM, a task that the sequential CMAPLE and MAPLE cannot realistically complete. CMAPLE 2 is applicable to many pathogen genome datasets and enhances our preparedness for future pandemics.

Version published to 10.64898/2026.06.15.732229 on bioRxiv
Jun 16, 2026

Verticall: A fast and robust tool for recombination detection in large-scale bacterial genomic datasets

This article has 3 authors:
1. Erkison Ewomazino Odih
2. Ryan R. Wick
3. Kathryn E. Holt
This article has no evaluationsLatest version Apr 24, 2026
Modeling Site-Specific Mutation Patterns in Pandemic-Scale Phylogenetics

This article has 5 authors:
1. Samuel Martin
2. Nhan Ly-Trong
3. Bui Quang Minh
4. Nick Goldman
5. Nicola De Maio
This article has no evaluationsLatest version May 4, 2026
Rapid phylogenomic analysis for viral surveillance and metagenomic profiling with Omni2Tree

This article has 9 authors:
1. Sina Majidian
2. Adrian Chalco
3. Xinchang Zheng
4. Richard J Webby
5. Andrew S Bowman
6. Rebecca L Poulson
7. Nicole M Nemeth
8. Fritz J Sedlazeck
9. Daniel P Agustinho
Reviewed by Rapid Reviews Infectious Diseases

This article has 3 evaluationsAppears in 1 listLatest version May 1, 2026Latest activity Jun 11, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Verticall: A fast and robust tool for recombination detection in large-scale bacterial genomic datasets

Modeling Site-Specific Mutation Patterns in Pandemic-Scale Phylogenetics

Rapid phylogenomic analysis for viral surveillance and metagenomic profiling with Omni2Tree