Benchmarking of variant pathogenicity prediction methods using a population genetics approach

Mikhail Gudkov
Loïc Thibaut
Steven Monger
Debjani Das
Congenital Heart Disease Synergy Study group
David S. Winlaw
Sally L. Dunwoodie
Eleni Giannoulatou

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

Variant pathogenicity predictors are essential for identifying new associations between genetic variants and rare diseases. However, despite the availability of numerous predictors, there is no clear consensus on which methods provide the most reliable results. The common practice of training, testing, and benchmarking these predictors using known variant sets from disease or mutagenesis studies raises concerns about ascertainment bias and data circularity.

Results

We benchmarked commonly used pathogenicity predictors using an orthogonal approach that does not rely on predefined “ground truth” datasets. By leveraging population-level genomic data from gnomAD and the Context-Adjusted Proportion of Singletons (CAPS) metric, we identified CADD and REVEL as the best-performing predictors for distinguishing extremely deleterious variants from moderately deleterious ones. REVEL demonstrated superior calibration. Additionally, we show that CAPS can serve as a meta-analysis tool for interpreting variant annotations and highlight biases in ClinVar-based predictor training.

Availability and Implementation

CAPS analysis and benchmarking results are available at https://github.com/mgudVCCRI/PopGenVariantFiltering

Contact

e.giannoulatou@victorchang.edu.au

Version published to 10.1101/2025.03.16.643565v1 on bioRxiv
Mar 17, 2025

Prevalence of Pathogenic Germline Variants in Cancer Susceptibility Genes using the All of Us Dataset

This article has 6 authors:
1. Gideon Idumah
2. Daphne Newell
3. Madeleine Hadrys
4. Isabella Ribaudo
5. Ying Ni
6. Joshua Arbesman
This article has no evaluationsLatest version Mar 30, 2025
Evaluating a Standard Benchmark for Gene Prioritization: The InheriNext® Algorithm’s Integration of Genomic and Phenotypic Information

This article has 13 authors:
1. JY Chang
2. KT Li
3. M Kubal
4. YS Tsai
5. A Hamby
6. N Thomson
7. J Sheridan
8. S Barfield
9. R Rutz
10. FS Ong
11. R Felciano
12. S Kahn
13. SM Wu
This article has no evaluationsLatest version Feb 28, 2025
Archipelago method for variant set association test statistics

This article has 4 authors:
1. Dylan Lawless
2. Ali Saadat
3. Mariam Ait Oumelloul
4. Jacques Fellay
This article has no evaluationsLatest version Mar 17, 2025

Listed in

Abstract

Motivation

Results

Availability and Implementation

Contact

Article activity feed

Related articles

Prevalence of Pathogenic Germline Variants in Cancer Susceptibility Genes using the All of Us Dataset

Evaluating a Standard Benchmark for Gene Prioritization: The InheriNext® Algorithm’s Integration of Genomic and Phenotypic Information

Archipelago method for variant set association test statistics