A unified genome constraint, pathogenicity, and pLoF model identifies new genes associated with epilepsy

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Epilepsy is a highly heterogeneous disorder thought to have strong genetic components. However, identifying these risk factors using whole-exome sequencing studies requires very large sample sizes and good signal-to-noise ratio in order to assess the association between rare variants in any given gene and disease.

Methods

We present a novel approach for predicting constraint in the human genome – sections of the genome where any mutation can cause a severe disorder. Through application of a Hidden Markov Model (HMM) to the Regeneron Genetics Center Million Exome dataset and the AllofUs whole genome sequencing data, we predict the probability of observing no variants across the population for each position in the genome. Next, we aggregate the constraint predictions by gene and assess its association to epilepsy. Finally, we extend our analysis model to incorporate pathogenicity predictions from AlphaMissense (AM) and pLoFs, and compare against published results.

Results

We identified a set of (p < 1×10 −4 ) genes with stronger signals than previously published studies including KDM5B, KCNQ2, CACNA1A, CACNA1B, RYR2, and ATP2B2. Our models allow us to evaluate the contribution of constraint, protein structure based pathogenicity prediction from AM, and pLoFs jointly.

Conclusion

We showed that relatively simple sequence-dependent constraint prediction models can complement structure-based missense variant pathogenicity predictions and pLoFs for population cohort studies which require additional statistical power in the identification of gene-based signals for neurogenetic and psychiatric disorders.

Article activity feed