Identifying deleterious noncoding variation through gain and loss of CTCF binding activity

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Noncoding single nucleotide variants are the predominant class of genetic variation in whole genome sequencing and are key drivers of phenotypic variation. However, their functional annotation remains challenging. To address this, we develop a hypothesis-driven functional annotation scheme for CTCF binding sites given CTCF’s critical roles in gene regulation and extensive profiling in regulatory datasets. We synthesize CTCF’s binding patterns at 1,063,879 genomic loci across 214 biological contexts into a summary metric, which we refer to as binding activity. We find that binding activity is significantly enriched for both conserved nucleotides (Pearson R = 0.31, p < 2.2 x 10 -16 ) and sequences that contain high-quality CTCF binding motifs (Pearson R = 0.63, p = 2.9 x 10 -12 ). We then integrate binding activity with high confidence change in precision weight matrix scores. By applying this framework to 1,253,330 SNVs in gnomAD, we explore signatures of selection acting against the disruption of CTCF binding. We find a strong, positive relationship between the mutability adjusted proportion of singletons (MAPS) metric and the loss of CTCF binding at loci with high in vitro activity (Pearson R = 0.67, p = 1.5 x 10 -14 ). To contextualize these findings, we apply MAPS to other functional classes of variation and find that a subset of 198,149 loss of CTCF binding variants are observed as infrequently as missense variants. This work implicates these thousands of rare, noncoding variants that disrupt CTCF binding for further functional studies while providing a blueprint for the interpretable annotation of noncoding variants.

Article activity feed