Calibrated Variant Effect Prediction at the Residue Level Using Conditional Score Distributions

Gal Passi
Sapir Amittai
Dina Schneidman-Duhovny

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Effective clinical use of variant effect prediction (VEP) requires models that are both accurate and well-calibrated. Calibration refers to a model’s ability to produce meaningful and reliable probability estimates. Here, we propose a practical path toward robust VEP calibration by calibrating at the residue-level rather than using global or per-protein schemes. We identify variant subgroups that benefit from targeted calibration and show that, while VEPs appear well calibrated on average, they remain markedly miscalibrated within these subgroups. Leveraging these insights, we develop RaCoon (Residue-aware Calibration via Conditional distributions), implemented on ESM1b, which provides multicalibrated and interpretable predictions across diverse variant subgroups and significantly improves performance across multiple benchmarks. Targeted residue-level calibration not only improves overall calibration but, for most models, also yields gains in global AUROC. Specifically, RaCoon increases AUCROC from 0.912 to 0.924. Our calibration strategy, guided by model-specific feature distributions, is readily transferable to other VEPs.

Version published to 10.1101/2025.11.24.690189 on bioRxiv
Nov 26, 2025

Blind Challenges Let Us See the Path Forward for Predictive Models

This article has 4 authors:
1. John D. Chodera
2. W. Patrick Walters
3. Sriram Kosuri
4. James S. Fraser
This article has no evaluationsLatest version Jan 27, 2026
Blind Challenges Let Us See the Path Forward for Predictive Models

This article has 4 authors:
1. John D. Chodera
2. W. Patrick Walters
3. Sriram Kosuri
4. James S. Fraser
This article has no evaluationsLatest version Jan 27, 2026
Approximating prediction error variances and reliabilities in multiple-trait genomic prediction model using Monte Carlo sampling

This article has 5 authors:
1. Antero Heikkilä
2. Ismo Strandèn
3. Martin Lidauer
4. Klaus Nordhausen
5. Sara Taskinen
This article has no evaluationsLatest version Dec 15, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Blind Challenges Let Us See the Path Forward for Predictive Models

Blind Challenges Let Us See the Path Forward for Predictive Models

Approximating prediction error variances and reliabilities in multiple-trait genomic prediction model using Monte Carlo sampling