Direct Comparative Analyses of 10X Genomics Chromium and Smart-Seq2
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (PREreview)
Abstract
Single-cell RNA sequencing (scRNA-seq) is generally used for profiling transcriptome of individual cells. The droplet-based 10X Genomics Chromium (10X) approach and the plate-based Smart-seq2 full-length method are two frequently used scRNA-seq platforms, yet there are only a few thorough and systematic comparisons of their advantages and limitations. Here, by directly comparing the scRNA-seq data generated by these two platforms from the same samples of CD45− cells, we systematically evaluated their features using a wide spectrum of analyses. Smart-seq2 detected more genes in a cell, especially low abundance transcripts as well as alternatively spliced transcripts, but captured higher proportion of mitochondrial genes. The composite of Smart-seq2 data also resembled bulk RNA-seq data more. For 10X-based data, we observed higher noise for mRNAs with low expression levels. Approximately 10%−30% of all detected transcripts by both platforms were from non-coding genes, with long non-coding RNAs (lncRNAs) accounting for a higher proportion in 10X. 10X-based data displayed more severe dropout problem, especially for genes with lower expression levels. However, 10X-data can detect rare cell types given its ability to cover a large number of cells. In addition, each platform detected distinct groups of differentially expressed genes between cell clusters, indicating the different characteristics of these technologies. Our study promotes better understanding of these two platforms and offers the basis for an informed choice of these widely used technologies.
Article activity feed
-
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/17856063.
This manuscript presents a head-to-head comparison of Smart-seq2 and 10X Genomics Chromium using CD45⁻ cells from four tumor-related liver and rectal cancer samples, with matched bulk RNA-seq as a reference. The authors systematically examine read-level QC, gene detection sensitivity, mitochondrial and ribosomal content, non-coding and lncRNA detection, highly variable gene selection, clustering and cell-type annotation, differential expression and KEGG pathway enrichment, dropout behavior, and gene structure information including splice junctions. A key message is that Smart-seq2 provides higher sensitivity, better detection of lowly expressed and alternatively spliced transcripts, and …
This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/17856063.
This manuscript presents a head-to-head comparison of Smart-seq2 and 10X Genomics Chromium using CD45⁻ cells from four tumor-related liver and rectal cancer samples, with matched bulk RNA-seq as a reference. The authors systematically examine read-level QC, gene detection sensitivity, mitochondrial and ribosomal content, non-coding and lncRNA detection, highly variable gene selection, clustering and cell-type annotation, differential expression and KEGG pathway enrichment, dropout behavior, and gene structure information including splice junctions. A key message is that Smart-seq2 provides higher sensitivity, better detection of lowly expressed and alternatively spliced transcripts, and closer similarity to bulk RNA-seq, while 10X offers higher throughput, better power to identify rare cell types, but also higher noise and more severe dropout, especially for low-abundance genes. The comparison is timely and will be of interest to researchers designing scRNA-seq experiments in cancer biology, immunology, and other fields where platform choice and study design need to be carefully balanced. At the same time, several aspects of the analysis (especially around sequencing depth, normalization, clustering and DEG strategies, and the statistical treatment of dropout and variability) would benefit from clarification or additional benchmarking to sharpen the conclusions and make the recommendations more robust.
Major issues
In "10X had higher dropout ratio than Smart-seq2", the comparison of dropout and noise is confounded by very different sequencing depths between platforms. Since Smart-seq2 cells receive ~10-fold more uniquely mapped reads than 10X cells, it is difficult to disentangle platform-specific chemistry effects from simple coverage differences; could you perform depth-matched analyses (e.g., downsampling Smart-seq2 to 10X-like coverage, or using saturation curves and model-based extrapolation) to show how much of the observed difference in dropout ratio, CV distribution, and low-expression bimodality would remain at comparable read depths, and please specify in more detail which metrics (dropout vs expression, mitochondrial and ribosomal proportions, HVG selection) were tested for robustness to sequencing depth and which were not?
In "Smart-seq2 detected more genes and 10X identified more cell clusters", the conclusions about clustering power and DEGs are strongly influenced by the very different cell numbers per platform. For the main claims about the number of clusters, detection of rare populations, and the limited overlap of sample-wise and cell-type–wise DEGs and KEGG pathways, could you add analyses based on subsampling 10X cells to similar cell numbers as Smart-seq2 and/or pooling Smart-seq2 cells to equalize per-cluster cell counts, and please describe more explicitly how the clustering parameters (choice of HVGs, CCs/PCs, resolution, UMAP settings) and DEG thresholds were chosen, whether alternative parameter settings or clustering methods were tested, and whether the key platform-specific findings (e.g., CAF and myofibroblast subdivisions) remain stable under such sensitivity analyses?
Minor issues
In the "Quality control for scRNA" section, the QC thresholds (Smart-seq2: <800 genes or >50% mitochondrial reads; 10X: <500 genes, <900 or >8000 UMI, >20% mitochondrial UMI) are quite specific but not fully justified; could you briefly explain how these cutoffs were selected (e.g., visual inspection of distributions, prior publications, pilot tests), whether alternative thresholds were evaluated, and whether the main conclusions (especially mitochondrial content comparisons and the identification of the "mitochondria-high" fibroblast cluster 10) are robust to relaxing or tightening these QC filters?
In "10X detected a higher proportion of lncRNA and Smart-seq2 identified more lncRNA as highly variable genes", the enrichment and interpretation of lncRNA- versus protein-coding–driven HVGs could be clearer. It would help if you could provide more detail on the exact background gene set used for KEGG enrichment, the multiple-testing correction procedure and cutoffs, and whether you considered lncRNA-specific annotation resources or compared your lncRNA HVGs with published single-cell lncRNA catalogs, so that readers can better understand how much of the pathway-level difference arises from biology versus annotation coverage and detection bias.
In the "Cell clustering" description, the integration and batch correction approach using CCA (Seurat v2.3) is only briefly mentioned. Given that patients and tissues differ, could you expand on how you assessed the success of batch correction (e.g., mixing of patients within shared cell types, silhouette scores, or differential expression before vs after correction), whether you tried alternative integration methods (e.g., Seurat v3/v4 anchors, MNN-based approaches) or parameter settings, and whether platform-specific cell-type differences or cluster numbers are robust to different integration strategies?
In the "Discussion" section, the remarks on cost and experimental design ("cost is still prohibitive", "now standard practice to investigate tens of thousands of cells") are conceptually reasonable but somewhat qualitative. It might strengthen the practical value of the paper if you could add approximate per-cell or per-sample cost ranges for Smart-seq2 versus 10X (even in relative terms), and, if possible, link these to your own dataset (e.g., hypothetical costs for your current design vs an alternative design with different cell numbers), so that readers can better translate your conceptual trade-offs into concrete study design decisions.
In several places (e.g., the sections on mitochondrial vs ribosomal genes, ambient RNA in 10X, and the "Ribosome" pathway enrichment), the interpretation of these QC and functional signatures could be more nuanced. Could you clarify how you distinguished technical artifacts (such as ambient RNA, partial lysis, or rDNA alignment issues) from true biological differences in translation activity or metabolic state, and consider adding a short paragraph in the Discussion that explicitly acknowledges alternative explanations and suggests additional analyses (e.g., ambient RNA correction tools, removal of rRNA/rDNA, or comparison to external reference datasets) that readers may want to apply in similar comparative studies?
Competing interests
The author declares that they have no competing interests.
Use of Artificial Intelligence (AI)
The author declares that they did not use generative AI to come up with new ideas for their review.
-
-