Strainify: Strain-Level Microbiome Profiling for Low-Coverage Short-Read Metagenomic Datasets
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Strain-level microbiome profiling has revealed key insights into microbial community composition and strain dynamics. However, accurate strain-level analysis remains challenging due to limited linkage information, ambiguous read mapping, and complicating factors such as genome similarity, sequencing depth, and community complexity. These challenges are especially pronounced for short-read metagenomic data when estimating the relative abundances of multiple strains, a task critical for genotype-phenotype association studies.
Results
To address this gap, we present Strainify, which enables accurate strain-level abundance estimation from short-read metagenomes with as little as 1% genome coverage. Specifically, Strainify combines (1) identification of informative variants via core genome alignment, (2) filtering of confounding variants via a window-based test, and (3) maximum likelihood estimation of strain abundances. A Shannon entropy-weighted version of the model further improves robustness in noisy, low-coverage settings by downweighting sites with low information content. Across simulated communities of varying complexity, Strainify consistently outperformed existing approaches. On mock community sequencing data, Strainify’s estimates aligned more closely with reference abundances. When applied to a longitudinal gut microbiome dataset, Strainify successfully recapitulated the reported temporal dynamics of Bacteroides ovatus strain groups, demonstrating its ability to recover biologically meaningful patterns from real-world metagenomes. Together, these results establish Strainify as a robust and versatile solution for accurate strain-level abundance estimation in short-read, low-coverage microbiome studies.
Availability
The Strainify code and results are available at: https://github.com/treangenlab/Strainify