Quantitative prediction of variant effects on alternative splicing using endogenous pre-messenger RNA structure probing

Read the full article See related articles


Splicing is a highly regulated process that depends on numerous factors. It is particularly challenging to quantitatively predict how a mutation will affect precursor messenger RNA (mRNA) structure and the subsequent functional consequences. Here we use a novel Mutational Profiling (-MaP) methodology to obtain highly reproducible endogenous precursor and mature mRNA structural probing data in vivo. We use these data to estimate Boltzmann suboptimal ensembles, and predict the structural consequences of mutations on precursor mRNA structure. Together with a structural analysis of recent cryo-EM spliceosome structures at different stages of the splicing cycle, we determined that the footprint of the B act complex on precursor mRNA is best able to predict splicing outcomes for exon 10 inclusion of the alternatively spliced MAPT gene. However, structure alone only achieves 74% accuracy. We therefore developed a β-regression weighting framework that incorporates splice site strength, structure and exonic/intronic splicing regulatory elements which together achieves 90% accuracy for 47 known and six newly discovered splice-altering variants. This combined experimental/computational framework represents a path forward for accurate prediction of splicing related disease-causing variants.

Article activity feed

  1. Evaluation Summary:

    This manuscript will be of interest to biologists who study RNA structure-function relationships in a broad range of systems, splicing researchers, and RNA structure bioinformaticians. An integrative analysis of RNA structure probing, model-based RNA folding energetics, cryo-EM data, and protein binding sequence motifs serves as the basis for a comprehensive, accurate, and robust framework for predictive models of splicing dynamics in a well-studied system. The modeling is leveraged by in silico mutagenesis that reveals novel insights into the mechanisms and tradeoffs that underlie the impact of disease-associated mutations on alternative splicing.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

  2. Reviewer #1 (Public Review):

    Kumar et al., combined both RNA structure probing methods and computational approaches to generate the prediction model for splicing. The framework is novel, and the prediction model is interesting. The weakness is the interpretation of DMS reactivity profile which may be affected by the interacting proteins. Overall, the authors achieved their aims but the results require careful interpretations. The framework and the model established in this manuscript is very useful to the field. The capability for predicting splicing will enhance our knowledge on the disease-related mutations.

  3. Reviewer #2 (Public Review):

    Kumar et al. developed an intricate framework to predict how a given mutation will affect splicing at exon-intron junctions. The leveraged the power of Mutational Profiling to obtain in vivo secondary and tertiary structure of RNA molecules and the spliceosomal footprints at these junctions. The structural data alone was only able to predict splicing outcome with 72% accuracy. The authors then explored how well the strength of splice sites and splicing regulatory elements predicted splicing outcome. These two features only predicted accuracy of R2 = 50 in splicing events. However, when combining splice site strength, splicing regulatory element strength, and structural data yielded the highest prediction accuracy of R2 = 89. The authors demonstrate the importance in considering both structural and splice site and regulatory element strength when assessing splice-altering mutations. The data adds to the ongoing studies on how to predict, manipulate, and measure splicing.

    The framework used to create the predictive models are thoughtful and rigorous in the context of MAPT gene. Upon training the model, the authors chose to run 55 variants of unknown significance (VUSs) through the model to predict which of the 3 splicing outcomes (3R, 4R, WT) would occur. They then used a splicing assay to experimentally confirm the computationally predicted outcomes of 6 of these VUSs.
    The techniques in principle are widely applicable across cell types and any genomic sequences of interest. This framework could be leveraged to determine how known mutations affect splicing, or how gene editing can be used to manipulate splicing events at exon-intron junctions of interest.
    As a clinically relevant exon-intron junction, the exon10-intron10 junction of MAPT gene serves as an ideal model for developing the framework.

    The authors places emphasis on MAPT gene and its relation to neurodegenerative diseases in the introduction and first figure, but then make broad claims about the applicability of the framework. While known and unknown variants within and surrounding the exon10-intron junction in MAPT were extensively studied and add to the validity of the models, applying the framework to a second exon-intron junction within a different gene would have strengthened the claim that this framework can be used as a general predictive model of splicing.

  4. Reviewer #3 (Public Review):

    The manuscript by Kumar et al. describes a modeling study of a well-studied system, namely, the MAPT Exon 10-Intron 10 junction, whereby splicing disruption has been associated with multiple neurodegenerative diseases. It thus addresses an important, fundamental problem, which is of interest to a broad readership. Despite having been studied extensively, splicing is still poorly understood, and in particular, the role of RNA structure is difficult to systematically or quantitatively highlight. The authors developed a comprehensive and highly accurate predictive model of MAPT splicing, which integrates both structural and regulatory sequence motifs. They not only used it to predict measured PSI (percent spliced in index) values, but more interestingly, they used it to dissect the contributions of structure and SRE RBP binding sites to splicing disregulation and to reveal their counteracting effects. This, in turn, sheds light on how disease variants affect splicing outcomes. Additionally, they experimentally validated select model predictions.

    Overall, this is an insightful manuscript that introduces several novelties. On the experimental side, the authors performed the first experimental structure determination in vivo of MAPT's two relevant isoforms and its pre-mRNA (as opposed to previous in vitro or computational studies). There is also innovative use of the unique features of the MaP structure probing strategy (as opposed to truncation-based probing) to determine structure at the isoform level while obviating the need to deal with read mapping ambiguities. On the computational side, I liked the novel use of t-SNE and density contour plots to visualize shifts in ensemble composition and how the authors integrated cryo-EM data with computational ensemble-based structure predictions. Both innovations can be useful in a range of other structure-function studies. I also find the final model's accuracy and its success in making robust predictions for all mutation types impressive. Although additional DMS-MaP experiments that could validate some of the predicted mutation-induced structural changes would be ideal, because changes take place at the pre-mRNA level, it is much more complicated to perform such in vivo assays.