Simplifying causal gene identification in GWAS loci
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genome-wide association studies (GWAS) help to identify disease-linked genetic variants, but pinpointing the most likely causal genes in GWAS loci remains challenging. Existing GWAS gene prioritization tools are powerful but often use complex black box models trained on datasets containing unaddressed biases. Here, we use a data-driven approach to construct a truth set of causal genes in 406 GWAS loci. We train a gene prioritization tool, CALDERA, that uses a simple logistic regression model with L1 regularization and corrects for potential confounders. Using three independent benchmarking datasets of resolved GWAS loci, we compare the performance of CALDERA with three other methods (FLAMES, L2G, and cS2G). CALDERA outperforms all these methods in two out of three datasets and ranks second in the remaining dataset. We demonstrate that CALDERA prioritizes genes with expected properties, such as mutation intolerance (OR = 1.751 for pLI > 90%, P = 8.45x10 -3 ). Overall, CALDERA provides a powerful solution for prioritizing potentially causal genes in GWAS loci and may help identify novel genetics-driven drug targets.