Benchmarking of variant pathogenicity prediction methods using a population genetics approach
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Variant pathogenicity predictors are essential for identifying new associations between genetic variants and rare diseases. However, despite the availability of numerous predictors, there is no clear consensus on which methods provide the most reliable results. The common practice of training, testing, and benchmarking these predictors using known variant sets from disease or mutagenesis studies raises concerns about ascertainment bias and data circularity.
Results
We benchmarked commonly used pathogenicity predictors using an orthogonal approach that does not rely on predefined “ground truth” datasets. By leveraging population-level genomic data from gnomAD and the Context-Adjusted Proportion of Singletons (CAPS) metric, we identified CADD and REVEL as the best-performing predictors for distinguishing extremely deleterious variants from moderately deleterious ones. REVEL demonstrated superior calibration. Additionally, we show that CAPS can serve as a meta-analysis tool for interpreting variant annotations and highlight biases in ClinVar-based predictor training.
Availability and Implementation
CAPS analysis and benchmarking results are available at https://github.com/mgudVCCRI/PopGenVariantFiltering
Contact
e.giannoulatou@victorchang.edu.au