Computational prediction of human deep intronic variation

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The adoption of whole genome sequencing in genetic screens has facilitated the detection of genetic variation in the intronic regions of genes, far from annotated splice sites. However, selecting an appropriate computational tool to differentiate functionally relevant genetic variants from those with no effect is challenging, particularly for deep intronic regions where independent benchmarks are scarce.

In this study, we have provided an overview of the computational methods available and the extent to which they can be used to analyze deep intronic variation. We leveraged diverse datasets to extensively evaluate tool performance across different intronic regions, distinguishing between variants that are expected to disrupt splicing through different molecular mechanisms. Notably, we compared the performance of SpliceAI, a widely used sequence-based deep learning model, with that of more recent methods that extend its original implementation. We observed considerable differences in tool performance depending on the region considered, with variants generating cryptic splice sites being better predicted than those that affect splicing regulatory elements or the branchpoint region. Finally, we devised a novel quantitative assessment of tool interpretability and found that tools providing mechanistic explanations of their predictions are often correct with respect to the ground truth information, but the use of these tools results in decreased predictive power when compared to black box methods.

Our findings translate into practical recommendations for tool usage and provide a reference framework for applying prediction tools in deep intronic regions, enabling more informed decision-making by practitioners.

Article activity feed

  1. AbstractThe adoption of whole genome sequencing in genetic screens has facilitated the detection of genetic variation in the intronic regions of genes, far from annotated splice sites. However, selecting an appropriate computational tool to differentiate functionally relevant genetic variants from those with no effect is challenging, particularly for deep intronic regions where independent benchmarks are scarce.In this study, we have provided an overview of the computational methods available and the extent to which they can be used to analyze deep intronic variation. We leveraged diverse datasets to extensively evaluate tool performance across different intronic regions, distinguishing between variants that are expected to disrupt splicing through different molecular mechanisms. Notably, we compared the performance of SpliceAI, a widely used sequence-based deep learning model, with that of more recent methods that extend its original implementation. We observed considerable differences in tool performance depending on the region considered, with variants generating cryptic splice sites being better predicted than those that affect splicing regulatory elements or the branchpoint region. Finally, we devised a novel quantitative assessment of tool interpretability and found that tools providing mechanistic explanations of their predictions are often correct with respect to the ground truth information, but the use of these tools results in decreased predictive power when compared to black box methods.Our findings translate into practical recommendations for tool usage and provide a reference framework for applying prediction tools in deep intronic regions, enabling more informed decision-making by practitioners.

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giad085 ), which carries out open, named peer-review. The review is published under a CC-BY 4.0 license:

    Reviewer name: Raphael Leman

    Summary: In this work Barbosa et al., presented a benchmarking of several splicing predictors for human intronic variants. Overall, the results of this study shown that deep learning based tools such as SpliceAI outperformed the other splicing predictors to detect splicing disturbing variants and so pathogenic variants.

    The authors also detailed the performances of these tools on several subsets of data according to the collection origins of variants and according to the genomic localization of variants. This work is one of the first large and independent studies about splicing prediction performances among intronic variants and in particular among deep intronic variants in a context of molecular diagnosis. This work also highlights the need to have reliable prediction tools for these variants and that the splicing impact of these variants are often underestimated. However, I estimated that major points should to be solved before considering the article to publication.

    **Major points ** 1 The most important point is that authors shown results in the main text but in following paragraphs they claimed that these results were biased. In addition, the results, taking into account these biases, were only shown in supplementary data and the readers should make the correction themselves to get the "true" results. Indeed, the interpretation of biased results and "true" results changes drastically. The two main biases were: i) the use of ClinVar data already used for the training of CAPICE (see my following comment n°2-), ii) the intronic tags of variants and the relative distance to the nearest splice site were wrong (see my following comment n°5-). Consequently, the authors should remove these biased results and only show results after bias correction.

    2 Importantly, several tools used ClinVar variants or published data to train and/or validate their models. Therefore, to perform a benchmark on true independent collection of variants, the authors should ensure the lack of overlapping between variants used for the tool development and this present study.

    3 As authors shown by the comparison between the ClinVar classification (N = 54,117 variants) and impact on RNA from in vitro studies (N = 162 variants), there was discrepancies between this two information (N = 13/74 common variants, 18%). Consequently, using ClinVar classification to assay the performance of splicing prediction tools is not optimal. To partially fix this point, I think further studying (ex: get minor allele frequency, availability of in vitro RNA studies, …) the intronic variants with positive splicing predictions from two or more tools with a ClinVar classification benign or likely benign and inversely, the intronic variants with negative splicing predictions from two or more tools with a ClinVar classification pathogenic or likely pathogenic could be interesting.

    4 The authors used pre-computed databases for 19 tools, but the most of these databases do not include small insdels and so add artificially missing data in disfavor of the tool although the same tool could score these indels variants in de novo way.

    5 The authors said that "We hypothesized that variability in transcript structures could be the reason [increase in performance in the deepest intronic bins]: despite these variants being assigned as occurring very deep within introns (> 500bp from the splice site of the canonical isoform) in the reference isoform, they may be exonic or near-splice site variants of other isoforms of the associated gene". To solve this transcript structure variability, firstly the authors could use weighted relative distance as following: |(|Pos_(nearest splice site)-Pos_variant |)-Intron_Size |â•„(Intron_Size ). Secondly, the ClinVar data contains the RefSeq transcript ID on which the variant was annotated (except for large duplications/deletions), so the authors should make the correspondence between these RefSeq transcript IDs and the transcripts used to perform splicing predictions.

    6 With respect to the six categories of splice-altering variants, it is unclear how the authors considered cases in which variants alter physiological splice motives (e.g., natural consensus sequences 3'SS/5'SS, branch point, or ESR) but, instead of exon skipping, the spliceosome recruits another distant splice site that is partially or not affected by the variant.

    7 In the table 1 listing the tools considered for this study, please explicit for each tool on which collections of data (ClinVar or splicing altering variants) and for which genomic regions the benchmark was done. This information will facilitate the reading of the article.

    8 Accordingly to my comment n°3-, all spliceogenic variants are not necessary pathogenic. The mutant allele could produce aberrant transcripts without a frame-shift and without impact the functional domains of the protein. In addition, the transcription could also lead to a mix between aberrant transcript and full-length transcript. As a result, the main goal of splicing prediction tools is to detect splicing altering varaints. Considering variants with positive splicing prediction as pathogenic is a dangerous shortcut and only an in vitro RNA study could confirm the pathogenicity of a variant. The discussion section should be update in this sense.

    9 The authors claimed that: "The models [SQUIRLS and SPiP] were frequently able to correctly identify the type of splicing alteration, yet they still fail to propose higher-order mechanistic hypotheses for such predictions.". I think that the authors over-interpreted the results (see my comment n° 21-).

    10 The authors recommended prioritizing intronic variants using CAPICE, It is still true once the bias was corrected (see my comment n°1-).

    **Minor points **

    11 In the introduction the authors could clearly define the canonical splice site regions (AG/GT dinucleotides in 3'SS: -1/-2 and 5'SS: +1/+2) to make the difference with the consensus splice sites commonly define as: 3'SS: -12 (or -18)/+2 and 5'SS: -3/+6. 12 In the introduction, please also add that splice site activation could be also due to disruption of silencer motif. 13 In the ref [17], the authors did not say that the enrichment of splicing related variants within splice site regions was linked to exons and splice sites sequencing. They proved that whole genome sequencing increased the diagnostic rate of rare genetic disease, actually they did not focus on splicing variants. This enrichment was more probably induced by the fact that geneticists mainly studied variants with positive splicing predictions. 14 In the paragraph 'The prediction tools studied are diverse in methodology and objectives', please add that most of prediction tools target consensus splice sites (ex: MES, SSF, SPiCE, HSF, Adaboost, …).

    15 In the paragraph 'The prediction tools studied are diverse in methodology and objectives', the authors claimed that 'sequence-based deep learning models such as SpliceAI, which do not accept genetic variants as input.' but it is wrong as SpliceAI could accept VCF file as input. 16 In the paragraph 'Pathogenic splicing-affecting variants are captured well by deep learning based methods', this is further explained in the section method, but I think a sentence explaining that the 243 variants were from 81 variants described in ref [19] and 162 variants from a new collection will clarify the reading of article 17 In the paragraph 'Pathogenic splicing-affecting variants are captured well by deep learning based methods', among the 13 variants incorrectly classified, please detailed how many variants were classified as benign and VUS. 18 Due to the blue gradient, the Fig 1C is hard to analyze. 19 In the paragraph 'Branchpoint-associated variants', the variant rapported in the ref [79] were studied within tumoral context and so the observed impact could not be the same in healthy tissue. 20 In the paragraph 'Exonic-like variants', the authors changed the parameters of SpliceAI predictions, from the original prarameters used for the precomputed scores, to take into account variants located deep inside the pseudoexon. Please ensure whether other prediction tools have also user-defined optimizable parameters to take into account these variants. 21 In the paragraph 'Assessing interpretability', the authors observed that non-informative SPiP annotations presented a high score level. This could be explained by the fact of the tool report a positive prediction without annotation only because the model score was high without a relation to a particular splicing mechanism. 22 In the paragraph 'Assessing interpretability', the authors could compare the SpliceAI annotations regarding the abolition/creation of splice sites and their relative positions to the variants to the observed effect on RNA. 23 In the paragraph 'Predicting splicing changes across tissues', by my count the analysis of AbSpliceDNA predictions was done on 89 variants (154 - 65 = 89), if true please indicate clearly in the text. 24 In the method section, paragraph "ClinVar", the 13 variants with discordance between the classification and the observed splicing impact, how many did they have confidence stars. 25 In the method section, paragraph "Disease-causing intronic variants affecting RNA splicing", the authors filtered out variants within the 10 pb around the nearest splice site, please explicit why. 26 In the method section, paragraph "Disease-causing intronic variants affecting RNA splicing", the authors used gnomAD variants as control set, however their threshold of variant frequency is too low (1%). Indeed, some pathogenic variants involved in recessive genetic disorders have a high frequency in population. A threshold of 5% is more appropriate. 27 In the method section, paragraph "Variants that affect RNA splicing", the authors should describe how they considered variants leading to multiple aberrant transcripts and variants with partial effect (i.e., allele mutant still producing full length transcript). 28 In the method section, paragraph "Variants that affect RNA splicing", regarding the six categories defined by the authors: How the indels variants were annotated if they overlapped between several categories.

    The new splice donor/acceptor categories included only variants creating new AG/GT or variants occurring within the consensus sequences of cryptic splice sites. Among the category Donor-downstream, please make the distinction between variants located between [+3; +6] bp (i.e. consensus sequence) and variant beyond +6 bp. The exonic-like variants could be variants that did not impact ESRs motives (see my comment n°6-). 29 In the method section, paragraph "Variants that affect RNA splicing", the authors select for the control datasets, variants generating the CAGGT and GGTAAG motives. However, this approach lead to an over-enrichment of false positives. Moreover, it could be also interesting if among the variants creating new splice sites or pseudoexons to identify the presence of GC donor motif or U12-minor spliceosome motif (AT/AC) and how the different splicing tools can detect them. 30 In Fig S3C, scale the gnomAD population frequency in -logₕ₀(P) to make the figure more readable. 31 I saw several times double spaces in the text please correct them. English is not my native language so I am not the best judge, but some sentences seem syntactically incorrect (ex: "The splicing tools with the smallest and largest performance drop between the splice site bin ("1-2") and the "11-40" bin were Pangolin and TraP, with weighted F1 scores decreasing by 0.334 and 0.793, respectively"). Please have the article proofread by someone who is fluent in English.

  2. The adoption of whole genome sequencing in genetic screens has facilitated the detection of genetic variation in the intronic regions of genes, far from annotated splice sites. However, selecting an appropriate computational tool to differentiate functionally relevant genetic variants from those with no effect is challenging, particularly for deep intronic regions where independent benchmarks are scarce.In this study, we have provided an overview of the computational methods available and the extent to which they can be used to analyze deep intronic variation. We leveraged diverse datasets to extensively evaluate tool performance across different intronic regions, distinguishing between variants that are expected to disrupt splicing through different molecular mechanisms. Notably, we compared the performance of SpliceAI, a widely used sequence-based deep learning model, with that of more recent methods that extend its original implementation. We observed considerable differences in tool performance depending on the region considered, with variants generating cryptic splice sites being better predicted than those that affect splicing regulatory elements or the branchpoint region. Finally, we devised a novel quantitative assessment of tool interpretability and found that tools providing mechanistic explanations of their predictions are often correct with respect to the ground truth information, but the use of these tools results in decreased predictive power when compared to black box methods.Our findings translate into practical recommendations for tool usage and provide a reference framework for applying prediction tools in deep intronic regions, enabling more informed decision-making by practitioners.

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giad085 ), which carries out open, named peer-review. The review is published under a CC-BY 4.0 license:

    **Reviewer name: Jean-Madeleine de Sainte Agathe **

    This manuscript presents an important and very exhaustive benchmark concerning intronic variant splicing predictors. The focus on deep-intronic variants is highly appreciated as it addresses a very crucial challenge of today's genetics. The authors present the different tools in a very clear and pedagogical way. I should add that this manuscript is pleasant to read. The authors use the average precision score, allowing a refined comparison between tools.

    They give practical recommendations. They emphasize the use of SpliceAI and pangolin for intronic variants. For branchpoint regions, they recommend Pangolin and LabRanchoR. It should be noted that this study is to my knowledge the first independent benchmark of Pangolin, CISpliceAI, ConSpliceML, AbSplice-DNA, SQUIRLS, BPHunter, LaBranchoR and SPiP together. Overall, this study is important as it will be very helpful for the interpretation of intronic variants. I hence fully and strongly support its publication. I have several comments that (I think) should be addressed before publication, especially the first point:

    1. I admit that the curation of such large datasets is challenging, however, I failed to find some of the Table S6 variants in the referenced work. Please, could you kindly point me to the referenced variation for the following variants? - The variant "1 hg38_156872925 C T NTRK1 ENST00000524377.1:c.851-708C>T pseudoexon_inclusion keegan_2022" is classified as 'affects_splicing'. However, I did not find it in Keegan 2022 (reference 20). In Keegan, the table S1 mentions NTRK1 variants but not c.851-708C>T. For these NTRK1 variants, keegan et al refers to another publication Geng et al 2018 (PMC6009080), where I can't find the ENST00000524377.1:c.851-708C>T variants neither. - Same for "COL4A3 ENST00000396578.3:c.4462+443A>G 2:g.228173078A>G" - Same for "ABCA4 ENST00000370225.3:c.1937+435C>G 1:g.94527698G>C" - Same for "FECH ENST00000382873.3:c.332+668A>C 18:g.55239810T>G" - Concerning "MYBPC3 ENST00000545968.1:c.1224-52G>A 11:g.47364865C>T" , I did not find it in pbarbosa as stated, but in another reference which, I think, should be mentioned in this manuscript: https://pubmed.ncbi.nlm.nih.gov/33657327/ - "BRCA2 ENST00000544455.1:c.8332-13T>G 13:g.32944526T>G" is classified as splicing neutral based on moles-fernández_2021, but it has previously been shown to alter splicing (https://pubmed.ncbi.nlm.nih.gov/31343793/), please clarify. If these variants were somehow erroneously included, the authors should reprocess their results with the corrected datasets.

    2. Although it has been done before, the usage of gnomAD variants as a base of splicing-neutral variants is questionable. Indeed, it is theoretically possible that such variants truly alter splicing. For example, genuine splicing alterations can result in mild inframe consequences on the gene products. Or splicing alterations can damage non-essential genes. I suggest that the authors: -either select another gnomAD variants list located in disease-associated genes, where benign splicing alterations seem less plausible. -or discuss this putative limitation in their results.

    3. Table S8: "Variants above 0.05, the optimized SpliceAI threshold for non-canonical intronic splicing variation" Is that a recommendation of this work? Or was it found elsewhere? Please clarify. More generally, this manuscript uses Average Precision scores, but the authors should explain to their non-statistician readers how it relates to the delta scores of each tool (Fig 3C). Indeed, any indication (or even recommendation, but not necessarily) concerning the use of cut-off values would be very appreciated by the geneticist community.

    4. p.3 "If the model is run twice, once with the reference and once with the mutated sequence, it is possible to measure splice site alterations caused by genetic variants." This study makes only use of the delta scores, which have previously been shown to be misleading in some rare cases (PMID 36765386). The authors would be wise to mention this. For example, in Table S3, "ENST00000267622.4:c.5457+81T>A 14(hg19):g.92441435A>T" is predicted by SpliceAI DG=0.16, but as the reference prediction is already at 0.84, this 0.16 is the maximal delta score possible, yielding donor score = 1.

    5. p.12 "Among the tools that predict across whole introns, SQUIRLS and SPiP are the only ones designed to provide some interpretation of the outcome." Concerning the nature of the mis-splicing event, I think the authors should mention SpliceVault, which has been specifically built for this task (pmid 36747048).

    6. p.14: "SpliceAI and Pangolin […]. If usability is a concern and users do not have a large number of predictions to make, SpliceAI is preferred since the Broad Institute has made available a web app for the task" Now, the broad institute web app includes pangolin (at least for hg38 variants). Please, rephrase of delete this sentence.

    7. Concerning complex delins, which are not annotated with the current version of SpliceAI, the authors should give recommendations. For example, the complex delins from tableS9 "hg19_chr7 5354081 GC AT" is correctly predicted by CI-SpliceAI and SpliceAI-visual, both tools allowing the annotation of complex delins with the SpliceAI model.

    8. p.8 "Unfortunately, BPHunter only reported the variants predicted to disrupt the BP, rendering the Precision-Recall Curves (PR Curves) analysis impossible." I agree with the authors. However, I think it is sometimes assumed (wrongly?) that all variants unannotated by BPhunter have BPH_score=0. Maybe the authors could explicit this. For example, by saying that the lack of prediction cannot be safely equated with a negative prediction.