Accuracy and Scalability of Machine Learning Methods for Genotype-Phenotype Association Data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Many machine learning methods can be applied to predicting phenotypes from genetic data. Which of these methods work best remains an open question, however. To answer this question, we propose to compare a variety of approaches’ ability to predict a simulated non-linear complex trait. Specifically, we evaluate these methods on their accuracy and scalability with respect to the amount training data available, the noise present in the data, the complexity of the simulated (trait) functions, and their ability to provide insight into the simulated trait. We then compare the best approach to state-of-the-art models in real data, predicting gout in the UK Biobank. We find that transformer encoders outperform all other methods in simulations, and perform comparably to the state-of-the-art with real data, with a promise to scale to significantly larger datasets.

Article activity feed