Accurate and efficient phylogenetic inference through end-to-end deep learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate phylogenetic inference is crucial for understanding evolutionary relationships among species. Deep learning technique has been introduced for phylogenetic inference; however, the existing deep learning-based approaches either suffer from limited accuracy as they split inference into several disjoint stages, or exhibit low efficiency and hardly apply to the cases with over 20 species. Here we present an accurate and efficient approach to phylogenetic inference. Our approach, called NeuralNJ, employs an end-to-end framework that directly constructs phylogenetic trees from the input taxa, thus effectively avoiding the inaccuracy incurred by the split inference stages. The key innovation of NeuralNJ lies in its learnable neighbor joining mechanism, which iteratively joins neighbors guided by learned priority scores and thereby achieves accurate tree reconstruction. The inference accuracy is further enhanced through incorporating reinforcement learning-based tree search. Using both simulated and empirical data as representatives, we demonstrate that NeuralNJ can effectively infer phylogenetic tree with improved computational efficiency and reconstruction accuracy. The study paves the way to accurate and efficient phylogenetic inference for hundreds of taxa in complex evolutionary scenarios.