gRely: Relyability for genome trained sequence-to-expression models

Abdul Muntakim Rafi
Gokcen Eraslan
Kipper Fletez-Brant

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Sequence-to-function (S2F) models predict molecular phenotypes from DNA sequence and are increasingly applied to variant effect prediction (VEP), where the goal is to quantify how genetic variants alter gene expression. However, S2F model predictions are not uniformly reliable: accuracy varies substantially across variants, genes, and tissues, and current practice relies on crude magnitude thresholding to enrich for trustworthy predictions, which discards the majority of variants where S2F models could still provide signal. We developed gRely, a meta-modeling framework that estimates the probability that a given Borzoi VEP correctly predicts eQTL direction, using 1,121 features derived from the target variant, gene, and model outputs. On held-out tissues, gRely achieves a mean average precision of 0.885 (random baseline 0.744). Critically, within the low-magnitude regime where thresholding fails entirely, gRely identifies a high-confidence subset with 76% accuracy compared to a 58% baseline, recovering reliable predictions that magnitude filtering would discard. Interpretation via SHAP reveals that in this low-magnitude regime, gene expression level and cross-replicate signal concentration replace VEP magnitude as the primary discriminators of reliability. gRely is the first framework to provide per-prediction confidence scores for S2F model VEPs, and generalizes across architectures, producing consistent improvements on AlphaGenome predictions. By making reliability quantifiable, gRely enables principled filtering rather than blanket thresholding, and marks a step toward trustworthy deployment of S2F models in genomic research and clinical applications.

Version published to 10.64898/2026.05.23.727431 on bioRxiv
May 27, 2026

EVEE: Interpretable variant effect prediction from genomic foundation model embeddings

This article has 22 authors:
1. Michael T. Pearce
2. Thomas Dooms
3. Ryo Yamamoto
4. Joshua Meehl
5. Carl Molnar
6. Mark Bissell
7. Dron Hazra
8. Ching Fang
9. Nam Nguyen
10. Michael Anderson
11. Collin Osborne
12. Patrick Duffy
13. Bridget Toomey
14. Eric Klee
15. Elena Myasoedova
16. Alexander J. Ryu
17. Shant Ayanian
18. Panos Korfiatis
19. Matt Redlon
20. Archa Jain
21. Daniel Balsam
22. Nicholas K. Wang
This article has no evaluationsLatest version Apr 11, 2026
Evolutionary constraints improve protein large language model predictions for protein stability, binding regions and epistasis

This article has 3 authors:
1. Konstantina Tzavella
2. Catharina Olsen
3. Wim Vranken
This article has no evaluationsLatest version May 26, 2026
Deep-Plant: a supervised foundation model for plant regulatory genomics

This article has 10 authors:
1. Ahmed Daoud
2. Soumyadip Roy
3. Haoxuan Zeng
4. Xinyu Bao
5. Zhenhao Zhang
6. Jiakang Wang
7. Paul Parodi
8. Anireddy SN Reddy
9. Jie Liu
10. Asa Ben-Hur
This article has no evaluationsLatest version Apr 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

EVEE: Interpretable variant effect prediction from genomic foundation model embeddings

Evolutionary constraints improve protein large language model predictions for protein stability, binding regions and epistasis

Deep-Plant: a supervised foundation model for plant regulatory genomics