Integrative taxonomy using traits and genomic data for Species Delimitation with Deep learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recognizing species boundaries in complex speciation scenarios, including those that involve gene flow and demographic fluctuations, combined with the plethora of existing species concepts, is a challenge that has recently been brought to attention. Promising recent approaches consider an integrative taxonomy with multiple sources of evidence (e.g., genetic, morphology, geographic distributions), which can be related to diverse properties associated with the dynamics of the speciation continuum. Also, the use of statistical inferential methods for model comparison, such as approximate Bayesian computation, approximate likelihoods, and machine learning, has allowed a better assessment of species boundaries in such cases. However, most approaches involve analyzing genetic and phenotypic/geographical information separately, followed by visual/qualitative comparison. Methods that integrate genetic information with other sources of evidence have been limited to simple evolutionary models and are not able to analyze more than a few hundred loci across a maximum of a few tens of samples. Here, we present a deep learning approach (DeepID) that combines convolutional neural networks and multilayer perceptrons to integrate both genomic data (thousands of loci or single nucleotide polymorphisms, SNPs) and trait information into a unified framework. By using simulated and empirical datasets, we evaluate the power and accuracy of our approach for discriminating among competing allopatric speciation scenarios when varying the number of SNPs and traits, and the impact of missing data. We found that the accuracy of our method was lower for datasets that included only trait data, but when we combined both genomic and trait data types, the accuracy was similar to when we considered genomic data alone. However, when we violated the simple allopatric speciation model by including migration, the approach based on traits was less affected than analyzing datasets including the genomic information. Moreover, using both sources of data can incorporate complementary information associated with different stages of the speciation process. Our approach was able to recover the expected delimitation scenarios in empirical datasets of one plant ( Euphorbia balsamifera ) and one fish ( Lepomis megalotis ) species complex. We argue that our method is a flexible and promising approach, allowing for complex scenario comparison and the use of multiple types of data.

Article activity feed