Delphy: scalable, near-real-time Bayesian phylogenetics for outbreaks
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Pathogen genomic analysis is central to tracking, understanding, and containing outbreaks, but complexity and high costs of state-of-the-art (SOTA) phylogenetic tools limit global access and impact. We introduce Delphy, an exact reformulation of Bayesian phylogenetics designed to transform its speed, scalability and accessibility while retaining SOTA accuracy. Delphy's central data structure, an Explicit Mutation Annotated Tree, exploits the high sequence similarity in large-scale epidemic datasets for efficient tree exploration and convergence. By reproducing key analyses from recent major epidemics (Ebola, Zika, SARS-CoV-2, mpox, and H5N1), we demonstrate SOTA accuracy with up to 1,000x speedups. Assessing Delphy's scalability, we show that a simulated dataset of 100,000 sequences can be analyzed in under a day—the largest such computation to date. We distribute Delphy as a client-side web application, enabling users worldwide to turn raw data into interactive results within minutes, without the data ever leaving the user's machine. Delphy automatically identifies key viral lineages and mutations, as well as their emergence and prevalence through time, all with quantified uncertainties derived from a solid theoretical foundation. Delphy shows the power of Bayesian phylogenetics as a fast, accessible frontline tool for tackling future outbreaks.