Robust and annotation-free analysis of alternative splicing across diverse cell types in mice

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The manuscript presents a potentially interesting new method to study alternative splicing at the single-cell level in the mouse. With further testing and benchmarking, this method would be of interest to researchers working with single-cell data and/or interested in alternative splicing.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Although alternative splicing is a fundamental and pervasive aspect of gene expression in higher eukaryotes, it is often omitted from single-cell studies due to quantification challenges inherent to commonly used short-read sequencing technologies. Here, we undertake the analysis of alternative splicing across numerous diverse murine cell types from two large-scale single-cell datasets—the Tabula Muris and BRAIN Initiative Cell Census Network—while accounting for understudied technical artifacts and unannotated events. We find strong and general cell-type-specific alternative splicing, complementary to total gene expression but of similar discriminatory value, and identify a large volume of novel splicing events. We specifically highlight splicing variation across different cell types in primary motor cortex neurons, bone marrow B cells, and various epithelial cells, and we show that the implicated transcripts include many genes which do not display total expression differences. To elucidate the regulation of alternative splicing, we build a custom predictive model based on splicing factor activity, recovering several known interactions while generating new hypotheses, including potential regulatory roles for novel alternative splicing events in critical genes like Khdrbs3 and Rbfox1 . We make our results available using public interactive browsers to spur further exploration by the community.

Article activity feed

  1. Author Response:

    Reviewer #1:

    The authors have developed a method (scQuint) for analyzing alternative splicing using scRNA-Seq. The method performs both visualization and clustering and also differential analysis, although the differential analysis modeling is not novel and has been adopted from a bulk RNAseq method (Leafcutter) and applied to pseudobulked data by grouping reads from cells within a cell type. Therefore, the method is not able to capture the true splicing variation at the single-cell level. Also, authors have only applied the method to Smart-Seq2 data and therefore it is not clear if their method is applicable to 10x data which has much higher throughout compared to Smart-Seq2 and is able to capture rare cell types but is more challenging for splicing analysis due to its 3' bias and lower coverage.

    Authors have applied their method to two mouse scRNA-Seq datasets: Tabula muris (from multiple tissues) and BICCN (from brain) and provided a comprehensive analysis of alternative splicing in mouse cell types. They have found that cell-type-specific splicing is ubiquitous in mouse cell types and splicing variation augments the total gene expression variation as there is little overlap between top differentially spliced and differentially expressed genes. They also found that a considerable fraction of cell-type-splicing events involve novel transcripts. They applied predictive machine learning models to show that cell types can be well distinguished by the splicing information and identifies relationships between the splicing changes in known splicing factors and the splicing changes in their target genes.

    The authors provide several biological findings regarding alternative splicing at cell-type-level and have shown how scRNA-Seq (despite being underutilized for splicing analysis so far) can expand our understanding of splicing mechanisms in single cells. Additionally, authors have made their data publicly available through interactive data browsers that can serve as a resource tool for future studies.

    We thank reviewer 1 for their thoughtful consideration of our manuscript and for their comments and recommendations. We wish to briefly address three remarks made in the reviewer’s summary. First, although our differential splicing model is based on that of LeafCutter, we make several substantive changes for the setting of single-cell experiments which affect its scalability and statistical performance. Furthermore, we do not use pseudobulked data with the exception of our splicing factor regression analysis at the end of the results section. These facts have been made more clear in the manuscript. Finally, we discuss the challenges of working with 10x data in comment 5 response.

  2. Evaluation Summary:

    The manuscript presents a potentially interesting new method to study alternative splicing at the single-cell level in the mouse. With further testing and benchmarking, this method would be of interest to researchers working with single-cell data and/or interested in alternative splicing.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

  3. Reviewer #1 (Public Review):

    The authors have developed a method (scQuint) for analyzing alternative splicing using scRNA-Seq. The method performs both visualization and clustering and also differential analysis, although the differential analysis modeling is not novel and has been adopted from a bulk RNAseq method (Leafcutter) and applied to pseudobulked data by grouping reads from cells within a cell type. Therefore, the method is not able to capture the true splicing variation at the single-cell level. Also, authors have only applied the method to Smart-Seq2 data and therefore it is not clear if their method is applicable to 10x data which has much higher throughout compared to Smart-Seq2 and is able to capture rare cell types but is more challenging for splicing analysis due to its 3' bias and lower coverage.

    Authors have applied their method to two mouse scRNA-Seq datasets: Tabula muris (from multiple tissues) and BICCN (from brain) and provided a comprehensive analysis of alternative splicing in mouse cell types. They have found that cell-type-specific splicing is ubiquitous in mouse cell types and splicing variation augments the total gene expression variation as there is little overlap between top differentially spliced and differentially expressed genes. They also found that a considerable fraction of cell-type-splicing events involve novel transcripts. They applied predictive machine learning models to show that cell types can be well distinguished by the splicing information and identifies relationships between the splicing changes in known splicing factors and the splicing changes in their target genes.

    The authors provide several biological findings regarding alternative splicing at cell-type-level and have shown how scRNA-Seq (despite being underutilized for splicing analysis so far) can expand our understanding of splicing mechanisms in single cells. Additionally, authors have made their data publicly available through interactive data browsers that can serve as a resource tool for future studies.

  4. Reviewer #2 (Public Review):

    This paper proposes novel methods to study alternative splicing at the single-cell level. Their approach addresses the problem of non-uniform read coverage of transcript sequences by Smart-seq2 data and proposes a metric that evaluates differential intron expression by looking at intron groups or groups of AS events that share the same 3´splice site. In this way, coverage biases are canceled out for the evaluated intron group. Using this metric, they propose a VAE method for dimension reduction and show that it effectively clusters cell types better than PCA. They also propose a method for differential splicing analysis, which is an adaptation of a previous approach used for bulk RNA-seq. The methods are applied to a variety of datasets. Finally, an analysis of splice factor regulation is proposed.

    Strengths:

    • The concept of using intron groups for control of coverage biases is interesting and results indicate that this is an effective way to capture the AS patterns that discriminate between cell types
    • The authors use a diversity of datasets (neural cell types and tabula Muris) to illustrate the applicability of their methods and show multiple cases of cell-type-specific alternative exon/intro usage and TSS.

    Weaknesses:

    • Generally, I find that the statistical methods are insufficiently benchmarked as no controlled datasets are used, only indirect demonstration on experimental data. This is particularly true for the differential splicing methods as well as the splice factor regulatory models. A more formal demonstration of the performance of their approach is needed.
    • I am not sure of the biological novelty and significance of the biological insights presented in this paper. Some general messages are already known: cases of splicing dissociated from expression, the fact that splicing contributes to cell identity, or that many unannotated splicing events still can be found in neural tissues, is already known thanks to other approaches both based on short or long reads.

    Some specific remarks are:

    1. The authors state that most methods do not consider the different reads coverage along with the transcript sequence. This is not true, some popular RNA-seq methods such as RSEM do model the RNA degradation pattern, and hence non-uniform coverage.

    2. The author state that their method allows for novel isoform discovery since it does not rely on transcriptome annotations. However, this is not accurate: they might be able to detect new splice junctions or introns, but not isoforms, which refers to the concatenation of exons. This sentence should be amended.

    3. Some aspects of the methods are poorly described. For example, the procedure to obtain and use a pseudo count vector of PSI values to apply PCA for cell clustering is not clear.

    4. The authors implement a multinomial GLM for differential splicing proposed by LeafCutter. It is unclear how they deal with sparsity. While they mention this problem for PCA analysis but not for differential splicing. At some point, authors indicate that they only test introns detected in at least 50 cells in each condition, but this is an arbitrary value and it is unclear how this cutoff impacts accuracy in their analysis. A more formal explanation of the treatment of sparsity should be provided. Finally, in Figure 3, they show that their p-values are better calibrated than in LeafCutter, but it is not described how calibration was done. Moreover, this analysis does not reveal how well their method performs in terms of sensitivity, specificity, or FDR. In general, it is unclear how methods are benchmarked. I would suggest using synthetic data, where different intro PSI across cell types are modeled for single cells, and then use established performance metrics to evaluate their method.

    • It is not clear to me that the novelty of this paper is in terms of the discovered biology. The authors found the cell types can be separated by splicing patterns and highlight several genes with cell-type-specific alternative introns, which is consistent with previous knowledge. Also, the fact that neural cell types have extensive AS and contain many un-annotated splicing events has already been described. It would be interesting to know what novel biology is discovered by this approach that is not possible to identify by other approaches.

    5. The dendrogram comparison for the Tabula Muris dataset is interesting, but it is unclear what the biological significance of the finding is.

    6. I find the splicing factor analysis is rather speculative, as an association between splicing patterns of splicing factors and genes does not imply necessarily imply a regulatory role, especially taking into account that information about splicing factor binding sites were not used. The fact that authors recover one known association is not sufficient to validate the approach. For this analysis to be reliable, additional evidence of regulation should be provided. For example, is there an enrichment of SFBS for those associations where the logistic model was significant?

    In summary, the paper presents a promising concept for the alternative splicing analysis in single cells, but the method requires a more elaborated benchmarking and a better explanation of the novel biology discovered by this approach in comparison to existing approaches. Without this assessment, it is unclear whether the work can have a real impact on the community.