Ultrafast and Ultralarge Distance-Based Phylogenetics Using DIPPER
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Distance-based methods are commonly used to reconstruct phylogenies for a variety of applications, owing to their excellent speed, scalability, and theoretical guarantees. However, classical de novo algorithms are hindered by cubic time and quadratic memory complexity, which makes them impractical for emerging datasets containing millions of sequences. Recent placement-based alternatives provide better algorithmic scalability, but they also face practical scaling challenges due to their high cost to compute evolutionary distances and significant memory usage. Current tools also do not fully utilize the parallel processing capabilities of modern CPU and GPU architectures.
Results
We present DIPPER , a novel distance-based phylogenetic tool for ultrafast and ultralarge phylogenetic reconstruction on GPUs, designed to maintain high accuracy and a small memory footprint. DIPPER introduces several novel innovations, including a divide-and-conquer strategy, a placement strategy, and an on-the-fly distance calculator that greatly improve the runtime and memory complexity. These allow DIPPER to achieve runtime and space complexity of O(N. log( N)) and O(N) , respectively, with N taxa. With divide-and-conquer, DIPPER is also able to maintain a low memory footprint on the GPU, independent of the number of taxa. DIPPER consistently outperforms existing methods in speed, accuracy, and memory efficiency, and scales to tree sizes 1–2 orders of magnitude beyond the limits of existing tools. With the help of a single NVIDIA RTX A6000 GPU, DIPPER is able to reconstruct a phylogeny from 10 million unaligned sequences in under 7 hours, making it the only distance-based method to operate at this scale and efficiency.
Availability
DIPPER’s code is freely available under the MIT license at https://github.com/TurakhiaLab/DIPPER , and the documentation for DIPPER is available at https://turakhia.ucsd.edu/DIPPER . The test datasets and experimental results are available at https://zenodo.org/records/16803048 .