Massively parallel reporter assay for mapping gene-specific regulatory regions at single nucleotide resolution
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Precise gene regulation is essential for the development and function of complex tissues, yet comprehensive mapping of cis -regulatory modules (CRMs) remains challenging due to limitations in throughput, resolution, and the ability to assay within specific cell types. Here, we introduce two complementary approaches—a locus-specific massively parallel reporter assay (LS-MPRA) and a degenerate MPRA (d-MPRA)—specifically designed to address some of these shortcomings. LS-MPRA leverages bacterial artificial chromosomes (BACs) to generate high-complexity libraries spanning large genomic regions, enabling unbiased interrogation of millions of DNA fragments potentially relevant for the regulation of a specific gene or set of genes. The d-MPRA employs systematic mutagenesis to resolve the functional architecture of CRMs at nucleotide resolution, thereby nominating critical nucleotides as potential TF binding sites, or for other regulatory roles.
We applied these methods to retinal genes that are stably expressed in differentiated cells of the retina, in rod photoreceptors and in subsets of bipolar interneurons, using both in vivo and ex vivo preparations of mouse tissue. LS-MPRA recapitulated some of the known CRMs for these genes—such as the proximal promoter region of Rho —and identified potentially novel CRMs, including those located within neighboring genes. The method was then applied to a gene that is dynamically expressed in subsets of retinal progenitor cells, Olig2, where it identified three distinct CRM regions (Olig2-NR1, NR2, and NR3). D-MPRA and subsequent motif analyses nominated critical TF binding sites within these regions. CUT&RUN experiments confirmed direct binding of these candidates. Moreover, extending LS-MPRA to chick retina and spinal cord demonstrated the applicability of these methods across species and tissues.
Together, the integrated LS-MPRA and d-MPRA strategies provide a robust, high-resolution platform for discovery of the cis -regulatory code underlying tissue-specific gene expression. It does not require prior knowledge of potential CRMs, and is quite rapid and straightforward to deploy, using typical molecular biology methods. The fragment size can be scaled to create short CRMs, e.g. for cell type-specific expression within viral vectors. It should enable CRM discovery at a scale and affordability for laboratories wishing to focus on a particular locus or set of loci.