Inferring genotype-phenotype maps using attention models

Krishna Rijal
Caroline M. Holmes
Samantha Petti
Gautam Reddy
Michael M. Desai
Pankaj Mehta

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (Arcadia Science)

Abstract

Predicting phenotype from genotype is a central challenge in genetics. Traditional approaches in quantitative genetics typically analyze this problem using methods based on linear regression. These methods generally assume that the genetic architecture of complex traits can be parameterized in terms of an additive model, where the effects of loci are independent, plus (in some cases) pair-wise epistatic interactions between loci. However, these models struggle to analyze more complex patterns of epistasis or subtle gene-environment interactions. Recent advances in machine learning, particularly attention-based models, offer a promising alternative. Initially developed for natural language processing, attention-based models excel at capturing context-dependent interactions and have shown exceptional performance in predicting protein structure and function. Here, we apply attention-based models to quantitative genetics. We analyze the performance of this attention-based approach in predicting phenotype from genotype using simulated data across a range of models with increasing epistatic complexity, and using experimental data from a recent quantitative trait locus mapping study in budding yeast. We find that our model demonstrates superior out-of-sample predictions in epistatic regimes compared to standard methods. We also explore a more general multi-environment attention-based model to jointly analyze genotype-phenotype maps across multiple environments and show that such architectures can be used for “transfer learning” – predicting phenotypes in novel environments with limited training data.

Arcadia Science
May 2, 2025

Inferring genotype-phenotype maps using attention models

Our group read this paper as part of our journal club and found the approach of using attention very interesting. We dug into a couple aspects that we wanted to explore further and released a notebook publication here: https://research.arcadiascience.com/pub/observation-geno-pheno-attention/release/1/, thanks for a great paper that inspired us to explore this topic further!

Read the original source
Arcadia Science
May 2, 2025

All models are extremely good at predicting average effect sizes as measured by Pearson correlation (attention: r = 0.82, linear: r = 0.83, linear+pairwise: r = 0.87)

These are well correlated with true effect size, but can these estimates be used to accurately identify causal loci?

Read the original source
Arcadia Science
May 2, 2025

type as , which is the difference between the predicted phenotype with the l-th locus being 1 versus -1 in that genetic background at all other lo

This is essentially a linear prediction of effect size. Given interaction effects may manifest as non-linear effects, would mutual information be a better estimate of the 'true' effect size?

Read the original source
Arcadia Science
May 2, 2025

increasing epistatic complexity

In addition to epistatic complexity, it may be valuable to include differing types of direct, non-linear influences on phenotypes.

Read the original source
Arcadia Science
Apr 18, 2025

ATTENTION-BASED ARCHITECTURE FOR G-P MAPPING

The model is a stack of attention layers, but I was surprised to see it omit all the typical components that brought attention into the limelight via transformers: multi-head attention, residual connections, layer norm, and position-wise FFNs. These have become standard and widely adopted, and largely for good reason, as they've shown to be very effective across many distinct domains.

Was there a particular reason this specific custom architecture was preferred over implementing or at least comparing to a standard transformer encoder?

Read the original source
Arcadia Science
Apr 18, 2025

Genotype vectors are converted to one-hot embeddings X(g) and transformed into d-dimensional embeddings Z(g)

Constructing X^(g) is an extremely expensive way to associate an embedding with each locus. You should simply use a lookup table (i.e. nn.Embedding).

Read the original source
Arcadia Science
Apr 18, 2025

Such stacking of attention layers is commonly used in large language models, including those used to model proteins. We use three layers because they collectively capture both pairwise and higher-order interactions, and empirical tests showed that adding more layers did not improve performance.

Did you consider the use of (potentially gated) residual skip connections (Savarese & Figueiredo 2017)? This (or a related approach) will likely help to increase/improve the expressivity of these attention layers and prevent oversmoothing by allowing for a more persistent signal from first- and second-order epistatic interactions, potentially allowing for the use of additional layers (if necessary).

Read the original source
Arcadia Science
Apr 18, 2025

As expected, the performance of the attention-based model, as characterized by R2 on the test dataset, is much better than that of the linear model (see Fig. 3)

It would have been interesting to see how a simpler say vanilla MLP based approach would stack here to really sell the advantage of attention over other deep learning approaches.

Read the original source
Arcadia Science
Apr 18, 2025

I therefore wonder if feature selection based on heavily regularized linear models could provide some gain over simple LD pruning in this setting.

Read the original source
Arcadia Science
Apr 18, 2025

Predicting phenotype from genotype is a central challenge in genetics. Traditional approaches in quantitative genetics typically analyze this problem using methods based on linear regression.

I greatly enjoyed reading this paper. The rigorous and rational approach to testing model performance on simulated data, reasonable model architecture, and smart dataset choice are a much needed advance beyond haphazardly applying deep learning networks to G-P datasets with minimal performance gain.

Read the original source
Arcadia Science
Apr 18, 2025

With this in mind, we subsample the loci (effectively combining highly correlated loci) to create a representative set of L = 1, 164

Did you experiment with how LD based pruning affects model performance? For linear genomic prediction models the relationship between marker number and predictive performance is well characterized, as long as you capture LD structure well, marker number is not very critical. However, this is not characterized well for deep learning models in this context. Epistatic interactions in particular will depend on products of LD between marker/causal QTL's which could cause performance degradation if causal QTL's are not very well tagged.

Read the original source
Arcadia Science
Apr 18, 2025

higher-order epistatic interactions

I was curious why you chose to simulate fourth order epistatic interactions. Statistically one expects higher order epistatic interactions to contribute progressively less to genetic variance, so most studies tend to focus on pairwise epistasis.

Read the original source
Version published to 10.1101/2025.04.11.648465v1 on bioRxiv
Apr 11, 2025

From Likelihood to Fitness: Improving Variant Effect Prediction in Protein and Genome Language Models

This article has 4 authors:
1. Charles W. J. Pugh
2. Paulina G. Nuñez-Valencia
3. Mafalda Dias
4. Jonathan Frazer
Reviewed by Arcadia Science

This article has 4 evaluationsAppears in 1 listLatest version May 24, 2025Latest activity Jun 6, 2025
Discriminating models of trait evolution

This article has 4 authors:
1. Jenniffer Roa Lozano
2. Michael DeGiorgio
3. Raquel Assis
4. Rich Adams
This article has no evaluationsLatest version Jun 13, 2025
Comprehensive molecular impact mapping of common and rare variants at GWAS loci

This article has 10 authors:
1. Brad Balderson
2. Sanjana Tule
3. Mei-Lin Okino
4. William JF Rieger
5. Sierra Corban
6. Jeff Jaureguy
7. Nathan Palpant
8. Kyle J. Gaulton
9. Mikael Boden
10. Graham McVicker
This article has no evaluationsLatest version Jun 6, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Related articles

From Likelihood to Fitness: Improving Variant Effect Prediction in Protein and Genome Language Models

Discriminating models of trait evolution

Comprehensive molecular impact mapping of common and rare variants at GWAS loci