Genomic heterogeneity inflates the performance of variant pathogenicity predictions
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recent studies have reported unprecedented accuracy predicting pathogenic variants across the genome, including in noncoding regions, using large AI models trained on vast genomic data. We present a comprehensive evaluation of these frontier models, showing that performance is inflated by differences in the prevalence of pathogenic variants across genomic contexts. We identify the best-performing models for each variant type and establish a benchmark to guide future progress.