Fine-tuning sequence to function deep learning models on large-scale proteomic data improves the accuracy of variant effect prediction

Eduarda Vaz
Lena Wang
Jake Galvin
Rebecca Keener
Alexis Battle

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Fine-tuning sequence to function models has shown promise for variant effect prediction, but accuracy and generalization to unseen genes and unseen individuals remains a standing challenge. We fine-tuned Borzoi on 54,219 individuals and 2,923 circulating plasma proteins from the UK Biobank Plasma Proteomic Project. Across 150 single-gene models where the genes had a range of cis-heritability we observed that the fine-tuned Borzoi model improved variant effect prediction for 86% of the genes compared to an Elastic Net baseline model. We demonstrated that the improved prediction stems from increased sample size which provides tremendous amounts of rare genetic variants (MAF < 0.01) to the training data. Masking rare and uncommon variants nullified improved performance of fine-tuned Borzoi and we showed that fine-tuned Borzoi highly weights rare variants (MAF < 0.01) while the Elastic Net model highly weights common variants (MAF > 0.05) that are enriched for regulatory regions. We evaluated the generalizability of our model on a fine-tuned Borzoi model trained jointly on varying numbers of genes and observed that these models consistently outperform the pre-trained Borzoi model, the single-gene models yield more accurate results. Together this work demonstrates the importance of including larger sample sizes and rare variants in sequence to function models for variant effect prediction and demonstrates feasibility that these models are capable of highly accurate variant effect prediction.

Version published to 10.1101/2025.09.26.678908 on bioRxiv
Sep 27, 2025

Uncertainty-quantified deep learning enables reliable protein-drug interaction prediction

This article has 1 author:
1. Akshay Balaji
This article has no evaluationsLatest version Dec 17, 2025
VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

This article has 6 authors:
1. Jiawei Wu
2. Marissa Stutzman
3. Michael Muriello
4. Joy Lincoln
5. Donald G. Basel
6. Xiaowu Gai
This article has no evaluationsLatest version Jan 21, 2026
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Uncertainty-quantified deep learning enables reliable protein-drug interaction prediction

VUS. Life: Leveraging Vector Embeddings for Rapid and Accurate Pathogenicity Prediction of Genetic Variants

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction