The reasonable effectiveness of domain adaptation for inference of introgression
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Supervised machine learning approaches have proven powerful in population genetics. To use such approaches, training data with known inputs and outputs are required. Since such data are generally unavailable in population genetics, researchers typically rely on simulations under the models of interest to train machine learning algorithms. While powerful, this approach depends heavily on the models used to generate training data. Because of the variety and complexity of processes shaping genetic variation, it is inevitable that not all processes important in an empirical system will be included when generating training data. This leads to a mismatch between the data used to train a machine learning algorithm and the data to which the trained model is ultimately applied–i.e., a domain shift– and can negatively impact inference. Here, we train a Convolutional Neural Network (CNN) to detect introgression between sister populations and demonstrate that it has near perfect accuracy when applied to data generated under the models used for training. To evaluate the impacts of domain shifts on inference, we generated new data with introgression from a third, unsampled population into one of the two focal populations (i.e., ghost introgression), and accuracy was substantially reduced on these data. Finally, we used domain adaptation, which aims to train a network that performs well in the presence of a domain shift. Notably, this requires no knowledge of the target or empirical domain. Our domain adaptation network was able to accurately detect introgression, even in the presence of unmodelled ghost introgression. We also applied this approach to empirical data to detect introgression between ABC Island brown bears and other populations of brown bears. Previous work has suggested that introgression between ABC Island bears and polar bears can mislead tests of introgression between populations of brown bears. We found that using domain adaptation reduced support for introgression between geographically isolated populations of brown bears, suggesting that our approach reduces false inferences of introgression due to ghost introgression.