DeePhy: A Deep Learning Model to Reconstruct Phylogenetic Tree from Unaligned Nucleotide Sequences

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

  • Inferring phylogenetic trees for a set of taxa is one of the primary objective in the evolutionary biology. Numerous approaches exist to reconstruct the phylogenetic trees by considering different biological data, such as DNA sequence, protein sequence, protein-protein interaction graph, etc. However, each method has its own strengths and weaknesses. Till date, no existing method guarantees to determine true phylogenetic trees all the times. Various studies identified distinct branch length configurations where the existing methods are inefficient to infer the correct tree topologies.

  • Here, we propose a novel deep convolutional neural network (CNN)-based model, DeePhy , to reconstruct the phylogenetic trees from the unaligned sequences. The sequences are repre- sented on a two-dimensional coordinate plane by utilizing a biological semantics-based map- ping. Additionally, to assess the robustness of a method, here we also propose a novel boot- strapping technique to generate replicas from the unaligned sequences. We train the model on the triplet sequences, where the output is a triplet tree topology.

  • We show that the well-trained DeePhy outperforms the state-of-the-art methods in inferring triplet tree topology. We experiment DeePhy on data simulated under numerous critical conditions and various branch length configurations. We conduct the McNemar test for comparing the performance of DeePhy and the state-of-the-art methods. The results exhibit that DeePhy is significantly more accurate and remarkably robust in determining the triplet tree topologies for most of the cases than that of the conventional methods. Again, various comparison metrics show that DeePhy also outperforms the conventional methods in inferring trees. Finally, to analyze the performance of DeePhy on real biological dataset, we apply it on Gadiformes dataset. Reassuringly, DeePhy reconstructs the phylogenetic tree from real biological data with known or widely accepted topologies.

  • Although various practical challenges still need to be taken care of, the outcomes of our study suggest that the deep learning approaches be a successful endeavour in inferring the accurate phylogenetic trees.

  • Article activity feed