Functional dissection of complex and molecular trait variants at single nucleotide resolution

This article has been Reviewed by the following groups

Read the full article

Listed in

Log in to save this article

Abstract

Identifying the causal variants and mechanisms that drive complex traits and diseases remains a core problem in human genetics. The majority of these variants have individually weak effects and lie in non-coding gene-regulatory elements where we lack a complete understanding of how single nucleotide alterations modulate transcriptional processes to affect human phenotypes. To address this, we measured the activity of 221,412 trait-associated variants that had been statistically fine-mapped using a Massively Parallel Reporter Assay (MPRA) in 5 diverse cell-types. We show that MPRA is able to discriminate between likely causal variants and controls, identifying 12,025 regulatory variants with high precision. Although the effects of these variants largely agree with orthogonal measures of function, only 69% can plausibly be explained by the disruption of a known transcription factor (TF) binding motif. We dissect the mechanisms of 136 variants using saturation mutagenesis and assign impacted TFs for 91% of variants without a clear canonical mechanism. Finally, we provide evidence that epistasis is prevalent for variants in close proximity and identify multiple functional variants on the same haplotype at a small, but important, subset of trait-associated loci. Overall, our study provides a systematic functional characterization of likely causal common variants underlying complex and molecular human traits, enabling new insights into the regulatory grammar underlying disease risk.

Article activity feed

  1. Using Enformer87, a transformer-based neural network, we created a combined score of variant effects on chromatin accessibility and TF occupancy

    What mechanisms besides chromatin accessibility and interactions with specific transcription factors could potentially be at play mediating the effects of cis-regulatory elements on gene expression?

  2. Ultimately, our results provide support for a model where trait associated variants reside in CREs that can be either cell-type specific or cell-type agnostic, the latter being an underappreciated mechanism of causal variants

    I think it’s exciting that you can identify cell-type agnostic as well as cell-type specific transcriptional changes. While we often think about cell-type differences in gene expression in the context of disease, I’m looking forward to seeing how this work will improve our ability to consider system-wide changes that are nonetheless more or less likely to manifest pathology in specific tissues.

  3. the magnitude of cell-type agnostic transcriptional activation in our assay correlates well with quantitative measures (maximum) of the chromatin accessibility of DNA across 438 cell-types

    I’m curious about this. Why does chromatin accessibility correlate with transcriptional activation for a non-integrated plasmid?

  4. with the majority enhancing rather than repressing transcription (96.6%), reflecting the design of our reporter assay

    Why does the design of the reporter assay make it more likely to identify enhancers? Have you considered an assay design that enables discovery of more suppressors? Are enhancers and suppressors equally likely in human polymorphisms, or is there a biological reason why more of one or the other would be able to persist in the population?

  5. Libraries of MPRA constructs were transfected into four diverse cell-types

    This is a very smart approach to start cataloguing the effects of different non-coding variants, but the use of plasmid DNA necessarily removes the sequence of interest from its true genomic context. Is there a way to evaluate or predict the impact of this effect computationally or in a high-throughput experiment? Are you thinking at all about how version 2.0 of this technology might be able to close this gap?