Validation of scRNA-seq by scRT-ddPCR using the example of ErbB2 in MCF7 cells

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Single-cell RNA sequencing (scRNA-seq) can unmask transcriptional heterogeneity facilitating the detection of rare subpopulations at unprecedented resolution. In response to challenges related to coverage and quantity of transcriptome analysis, the lack of unbiased and absolutely quantitative validation methods hampers further improvements. Digital PCR (dPCR) represents such a method as we could show that the inherent partitioning enhances molecular detections by increasing effective mRNA concentrations. We developed a scRT-ddPCR method and validated it using two breast cancer cell lines, MCF7 and BT-474, and bulk methods. ErbB2 , a low-abundant transcript in MCF7 cells, suffers from dropouts in scRNA-seq and thus calculated fold changes are biased. Using our scRT-ddPCR, we could improve the detection of ErbB2 and based on the absolute counts obtained we could validate the scRNA-seq fold change. We think this workflow is a valuable addition to the single-cell transcriptomic research toolbox and could even become a new standard in fold change validation because of its reliability, ease of use and increased sensitivity.

Article activity feed

  1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Reply to the reviewers

    Manuscript number: RC-2022-01488

    Corresponding author(s): Tobias Lange, Csaba Jeney

    1. General Statements

    First of all, we would like to thank the reviewers for their valuable comments on our manuscript. We appreciate your questions and comments and we tried to answer thoroughly all points raised.

    Consistent with the comments, we would like to shift the focus of this paper to the comparison of scRT-ddPCR with scRNA-seq signal distributions taking this scRNA-seq method as an exemplary but experimentally matching control. This is represented in the new title ”Validation of scRNA-seq by scRT-ddPCR using the example of ErbB2 in MCF7 cells”.

    A major point was in the comments the use of ERCC spike-ins. We carefully considered ERCC spike-ins during the design of the experiments but finally omitted this concept as the ERCC spike-ins were designed for relative quantification (fold changes) but not for absolute counts. Furthermore, it was shown that these controls have a high variability and high dropouts (Risso et al. 2014, Vallejos et al. 2017). We were also aware that ERCC spike-ins analysis can be biased by the apparent Poisson distribution and could thus complicate the (absolute) quantitative analysis. However, we are reconsidering including these controls, after external validation, in a subsequent publication.

    Besides that, to improve the manuscript it was suggested to split it into two publications, which we seriously considered but the validation of scRNA-seq data by scRT-ddPCR is now the major conclusion, and separate publications are necessary regarding the other improvements presented. Please find the answers below.

    We hope that we were able to answer all criticism sufficiently well and we are open for further discussion.

    2. Description of the planned revisions

    Reviewer 1

    Major concerns:

    The authors did not compare their results with standard SMART-seq2 in detection sensitivity (comparison on UMAP clustering is really trivial, and cannot serve the purpose)

    To further validate our workflow, we consider to compare signal distributions of ErbB2 and ACTB in MCF7 cells from Isakova et al. (2021) (GSE151334) with the distributions obtained from our approach (see Fig 4 a and b).

    Reviewer 2

    Minor concerns:

    Figure 1. B and C figure's axes are not easy to read even at highest zoom. At least the 400 bp in the x axis could be represented using a bigger font.

    Yes, we will increase the font size of Fig 1b and c.

    Figure S4. 'for in in range' needs some attention.

    It might become clearer using the expression “cycle 1000x” or “repeat 1000x”.

    Reviewer 3

    Major concerns:

    Single-cell SMART-Seq with SMART-Seq from "bulk" and "cl", which authors include in scRT-dPCR but not in scRNA-Seq

    The controls “bulk” and “cl” are designed to validate the lysis of a single cell after processing by F.SIGHT and I.DOT (see 3.2 Validation of scRT-ddPCR using bulk methods). Our results indicate that, independent of the method, we quantify the same absolute amount of ACTB and ErbB2 mRNA in single cells (Fig 3). To avoid confusion, we will thus remove the signal distribution of “bulk” and “cl” in Fig 4 a and b.

    Authors analysed scRNA-Seq data using "pseudo-bulk" differential expression analysis using DESeq2. Authors did not include more details about processing of the data, if they used standard DESeq2 protocol, or modified protocol recommended for scRNA-Seq data. It is hard to conclude if the chosen method is optimal, however I'm recommending to use method, which is standard for scRNA-Seq nowadays, like a Seurat with SCTransfrom.

    We have followed published settings of DESeq2, which can be specifically applied to single cell data, also described here. Indeed, after we checked our settings, we realized that we did not use the correct parameters and we have implemented the changes. However, this will not substantially affect our results and conclusions. The scripts of data processing are added to the supplementary material.

    Reviewer 3

    Minor concerns:

    Commercial kit SMART-Seq from Takara is not same as Smart-Seq2 protocol (line 198). Please do not use same name for commercial and academical protocols.

    We adjust the nomenclature to account for the differences.

    Can you include more details for processing of data?

    We revise the manuscript accordingly. Briefly, the aligners were wrapped into bash scripts and alignment was performed on each FASTQ file separately. As recommended in the documentation for kallisto and salmon, the mean read length and its standard deviation was calculated for each file. In alignments with STAR a genome index was created (as recommended). After alignment, the read table was created with FeatureCounts. We also share the scripts and settings for data processing as indicated above.

    Can you share whole script for data processing?

    Yes, we share the scripts for data processing in the supplementary material.

    Why you didn't't show any other cell type specific markers, which differ between chosen cell lines (lines 328/329)?

    The chosen cell lines are highly related; in breast cancer research, they are frequently used to control each other. This choice has advantages and drawbacks, as they differ principally in their ErbB2 expression. MCF7 and BT-474 serve as good controls in this study. An expansion to other differentially expressed genes is limited. However, we evaluated KRT8 and TFF1 (two marker genes for MCF7 cells (Isakova et al. 2021)) in scRNA-seq and ErbB2 in both scRNA-seq and scRT-ddPCR as key markers. We could highlight more marker genes between MCF7 and BT-474 cells on the basis of scRNA-seq data.

    I don't understand "missing normalization of counts" for comparison between different aligners. Especially, because counts are normalized during analysis using DESeq2 (line 396).

    We apologize for the misunderstandable terms, the signal distributions are constructed based on the values from Fig 2b (and not based on DESeq2), thus the unit of salmon and kallisto distributions is TPM, while for STAR distributions it is raw counts. This discrepancy in normalization of counts might contribute to the difference in distributions between STAR and kallisto as well as between STAR and salmon.

    Authors should change name "integrated workflow" into something else, because there is no integration of scRNA-Seq data with scRT-dPCR. They only compare results from this two methods.

    We consider to rename the publication, for instance,* Validation of scRNA-seq by scRT-ddPCR using the example of ErbB2 in MCF7 cells. *

    There is no demonstration of needs of validation (line 416).

    Yes, we agree, the need for validation is only mentioned in the introduction (lines 69 to 75 and lines 82 to 85) but should be taken up here again.

    Are the differences in the log2FC real problem for single-cell experiments? Authors used different cells and different number of cells for comparison. Can it be source of different log2FC?

    Indeed, the amount of cells between scRNA-seq and scRT-ddPCR were different and we understand that this might introduce subsampling errors. Assessing that question we bootstrap and down-sample the scRNA-seq group to compare the same amount of cells between scRNA-seq and scRT-ddPCR and revise this part accordingly.

    3. Description of the revisions that have already been incorporated in the transferred manuscript

    Reviewer 1

    Minor concerns:

    Fig.1a,b, in ROI, there are overlap between "printed cells" and "detected particles"? How to distinguish between the two?

    Each dot in the 2D scatter plot is a detected particle during the dispensation process of the F.SIGHT (see representative images in Fig 1a). The particles can be of various origin: cell debris, cell aggregates, corpuscular materials from cell culture medium or cells. The ROI defines the morphological criteria (diameter and roundness) by which we define a particle as a cell. The overlap between detected particles, which can thus also be cells, and the printed cells is because of the fact that some cells are detected but not evaluated as single cells.

    Fig2d, what is the difference between DEG and "different genes"? The no. different genes is not specified for STAR?

    1. DEGs are significantly different genes, abs(log2FC)>1 and padj1 but padj>0.05. These genes are different but not significantly.
    2. For STAR, we did not obtain any genes of the latter category; all different genes are thus DEGs.

    It is not clear how the bulk samples(Fig.3,4) were prepared.

    Thank you for pointing this out. We revised the manuscript accordingly, briefly the total RNA was isolated from 1E6 cells, diluted and analyzed in a dPCR (as described in 2.2 Total RNA isolation and bulk cell lysis). The absolute mRNA counts per single cell were calculated by dividing the detected number of transcripts with the number of cells.

    Reviewer 2

    Minor concerns:

    P5, line 148 is not clear to me.

    1E6 cells were lysed using 500 µl of Actome’s proprietary lysis buffer (PICO-000010, Actome). This results in 2000 cells/µl of lysate. By addition of 49.5µl DPBS (100X dilution), the cell concentration is 20 cells/µl. Thus, dispensation of 50 nl using the I.DOT results in an equivalent amount of material of a single cell.

    Reviewer 3

    Major concerns:

    The conclusion from whole paper is confusing, because it is bringing several new information and methods, which would be better if they would presented separately. Mainly down-scaled SMART-Seq using i.DOT and F.SIGHT - it is novel and important. Single-cell dPCR combined with F.SIGHT, which can be presented separately without down-scaled SMART-Seq.

    We discovered that down-scaling does not significantly enhance the detection of low-abundant transcripts such as ErbB2 in MCF7 cells (Fig 4a) contrary to theoretical considerations (lines 77 to 82). To assess this bias, scRT-ddPCR was used to validate the representability of low-abundant transcripts in scRNAseq, ultimately, revealing a better resolution of the expression in single MCF7 cells (Fig 4a). We see scRT-ddPCR as the potential improvement in validating scRNA-seq data regardless of the scRNA-seq method used. For the sake of this paper, however, we used a unique combination of methods. We understand the value of the components themselves, and the validation of miniaturized scRNA-seq deserves a subsequent paper. The used combination of methods, albeit unique, was designed to reduce the technical and biological variability to minimum; the cells originate from the same population and the same instrumentation was used. This is better suited to support our claims regarding representability.

    Also to note, the scRT-ddPCR is a ground truth method that literally counts molecules. The only mathematical concept applied is Poisson statistics (Basu 2017), and no further data processing is necessary, which could influence data evaluation supporting its generality.

    It is hard to say, "what is important message of this manuscript".

    Low-abundant transcripts are often referred to as highly interesting and difficult to analyze especially regarding reproducibility (Fortunel et al. 2003, Schwender et al. 2014, Petrova et al. 2017, Taylor et al. 2017). Our data supports previous findings that dropouts in scRNA-seq are frequent (Luecken et al. 2019). Down-scaling of SMART-Seq2 does not significantly increase detection efficiency and reliability (Fig 4) despite the considerable assumptions described in lines 77 to 82. This part of the paper supports the previous findings. Additionally, we see the scRT-ddPCR method as a potential improvement in validating scRNAseq data regardless of the scRNA-seq method used.

    I don't understand, why authors present comparison of two RNA isolation protocol in RT-dPCR results.

    As described above, the “bulk” controls are needed to validate the full lysis of a single cell after processing by F.SIGHT and I.DOT (see 3.2 Validation of scRT-ddPCR using bulk methods). We assume that through total RNA isolation by commercially available and widely accepted kits, all mRNAs are released from the cells and are efficiently amplified. This serves as a reference value for the scRT-ddPCR method (Fig. 3b). To ensure the reliability of our reference value, we used two different commercial methods for total RNA isolation with unequivocal results (Fig S2d). These two methods, however, differ in sample preparation (DNase I digest vs no digest, enzymatic lysate homogenization vs mechanical lysate homogenization), in buffers and handling in general.

    More, the whole conclusion and results are made only from one experiment from two separated measurements. Authors should repeat experiment and check if differences in log2FC between scRNA-Seq and scRT-dPCR are same all the time.

    Single cell experiments are inherent biological replicates. We consider them to be a high number of parallels per experiment. They are processed parallely but separately, albeit using the same batch of chemicals and the same instrumentation. We purchased cells and chemicals from commercial sources assuming minimal possibility of error. The instruments were validated before use according to standard procedures.

    Reviewer 3

    Minor concerns:

    What is LBTW (line 154)?

    LBTW is a proprietary lysis buffer of Actome GmbH (line 146 and 147) (PICO-000010, Actome).

    Have you process/sort cells for scRT-dPCR and scRNA-Seq same day?

    Care has been taken to reduce the biological variability, so they were processed with minimal delay from the same dispensation cartridge and originate from the same cell culture flask.

    How you dilute RNA for ddPCR (line 180)?

    Total RNA from MCF7 cells was diluted 1:20, 1:50, 1:100 and 1:1000 with PBS, and ACTB and ErbB2 mRNAs were quantified in triplicates. Total RNA from BT-474 cells was diluted 1:50, 1:100, 1:1000 and 1:10000 with PBS, and ACTB and ErbB2 mRNAs were quantified in triplicates.

    And for scRT-dPCR?

    Single cells were not diluted. A single cell was dispensed directly into 0.5 µl lysis buffer, master mix was added and the scRT-ddPCR was performed.

    How many cells per condition you have sorted for SMART-Seq?

    84 cells were isolated for each cell line (Tab S1).

    How much time you need for collection of cells?

    The F.SIGHT requires a maximum of 8 min for the dispensation of 84 cells.

    How much time you need for pipetting of solution? Can it be problem for neutralization of tagmentation?

    The transposome activity was quenched by the addition of 0.5 µl of Neutralization buffer using the I.DOT. The I.DOT can dispense 96-wells per minute (Klinger et al. 2020).

    Have you use robot or have you manually cleaned cDNA with AMPure beads?

    The clean-up procedure was performed manually.

    Which magnetic separator you have used for clean-up?

    We used conventional neodym magnets for the separation of liquid and beads.

    384-well plate design and clean-up of 20 ul volume is not something standard, please specify it in the protocol.

    The procedure is described in the methods section lines 212 to 219.

    Have you pooled libraries before clean-up (line 238)?

    The cDNA libraries of single MCF7 and BT-474 cells were pooled separately.

    If yes, what was final volume?

    The final volume for each pooled library was ~420 µl (84 x ~5µl).

    How much AMPure beads you have used for clean-up?

    After cDNA amplification, we used 9 µl of AMPure bead suspension for clean-up. After library amplification, we used 0.6 to 1-fold volumes of the pooled library volume.

    Is it pooling reason of loosing of 30% cells from dataset?

    We isolated 84 single cells using the F.SIGHT. During quality control, we excluded ~30% of the cells from down-stream analyses (Tab S1). We applied the quality criteria as mentioned in lines 254 to 260. We think these are common criteria for filtering.

    Why you choose different cell types as you used for sequencing?

    In scRNA-seq and scRT-ddPCR (and in the corresponding controls), we used MCF7 and BT-474 cell lines. We chose these cell lines because of their well-described difference in ErbB2 expression (Durst et al. 2019) (lines 106 to 108 and lines 349 to 353).

    Why it is important that tagmented cDNA was 459/432bp long (Line 310)? Is it specific for down-scaled, or classical SMART-Seq? How to use this information?

    Jaeger et al. (2020) show examples of good quality tagmented cDNA libraries for down-scaled SMART-Seq2. Additionally, they mention that the peak should be within the range of 300 bp to 800 bp. Our tagmented cDNA library distribution (Fig 1c) exhibits remarkable similarity to the one shown by Jaeger et al. (2020) and the peak falls within the mentioned range. Thus, the tagmented cDNA obtained by our approach matches criteria for good quality tagemented cDNA library.

    Comparison in lines 354-355 is confusing.

    Using scRT-ddPCR, we could not detect a statistical difference in ACTB expression between the cell lines (Mann-Whitney test). MCF7 cells express 66±29 ACTB mRNAs per single cell and BT-474 cells express 114±80 ACTB mRNAs per single cell (Fig 3b).

    I don't know, what you wanted to say by showing number of copies od different genes in different cell lines.

    Absolute numbers of transcripts are very reliable, especially when they are generated by a method of ground truth relying on molecular counting (dPCR). Still some transcripts might not be transcribed into cDNA but partitioning increases their effective concentration (Basu 2017). Based on this, relative quantities can still be calculated. Additionally, absolute quantification eases data comparison as no standard is needed and dPCR has further advantages over qPCR (lines 97 to 103).

    I'm not sure that scRNA-Seq and scRT-ddPCR are truly orthogonal methods. Both methods are PCR based. (lines 381-382).

    While PCR is applied in both methods, we see the orthogonality of the methods in the independence of the detection events. dPCR provides a single molecular compartmentalization principle instead of scRNA-seq, where all mRNAs are transcribed competitively and simultaneously into cDNA. This results in multiple competing reactions and thus increases the propensity for dropouts. dPCR avoids that and directly provides molecular counts.

    At the end, it is not important if the gene has two times or three times higher expression. Important is preservation of the trend.

    In our view, trends are less informative measures than absolute counts, absolute counts are direct derivatives of chemical concentrations driving the chemical reactions in cells. Trends, however, can be reconstructed from absolute counts.

    Authors are analyzing relative expression of Actb and ErbB2 between two lines. Could be used scRT-qPCR instead of scRT-ddPCR? It could solve problem in the number of genes, which could be analyzed (line 438).

    Indeed, qPCR instruments usually offer a higher degree of multiplexing but we think that qPCR cannot deliver the sensitivity needed for the detection of low-abundant transcripts (see lines 97 to 103 and above). dPCR ensures the detection of single molecules, while qPCR has a variable sensitivity and would not be an orthogonal method according to our statement above. Furthermore, qPCR needs external standards for absolute quantification, while dPCR can absolutely quantify by molecular counting.

    Are the differences in the log2FC real problem for single-cell experiments? Authors used different cells and different number of cells for comparison. Can it be source of different log2FC?

    1. Difference in log2FCs might not be an exclusive problem for single-cell experiments (Rajkumar et al. 2015, von der Heyde et al. 2015, Everaert et al. 2017). However, we believe that in scRNA-seq differences are much more pronounced, especially regarding low-abundant transcripts, because of elevated amounts of technical noise and thus increased propensities for dropouts (Luecken et al. 2019).
    2. For the comparison of fold changes, we used two different cell lines, MCF7 and BT-474, but compared fold changes from expression of gene x in MCF7 cells versus expression of gene x in BT-474 cells. Fold changes from scRNA-seq and scRT-ddPCR were calculated this way and eventually compared.

    Why we need absolute numbers of copies of transcripts (lines 458-459)? I'm OK with relative quantity using RT-qPCR.

    Absolute numbers of transcripts are very reliable, especially when they are generated by a method of ground truth relying on molecular counting (dPCR). Based on this, relative quantities such as fold changes can still be calculated. As described above, dPCR has several advantages over qPCR. Furthermore, absolute amounts of mRNA per cell determine their chemical activity in a cell (Tang et al. 2011).

    Authors presented two novel application, which both separately can be important for single-cell transcriptomic analysis. One is down-scaled SMART-Seq, which save a money and brings full-length scRNA-Seq to more researchers. Second is scRT-ddPCR, which can ultimately increase sensitivity of single-cell methods. However, combination of both methods in one paper, without comparison of other technologies decrease impact and importance both of them. I.e. Pokhilko et. al (2021) presented targeted single-cell RNA-Seq, which increase sensitivity of Smart-Seq2 too.

    Pokhilko et al. (2021) also present a down-scaled version of SMART-Seq2 just as many other publications (Mora-Castilla et al. 2016, Jaeger et al. 2020, Isakova et al. 2021, Hahaut et al. 2022, Hagemann-Jensen et al. 2022). Pokhilko et al. (2021) use scRNA-seq data from Volpato et al. (2018), who use a manual, non-high-throughput method for single cell isolation. The F.SIGHT can gently isolate hundreds of single cells in a short period of time (see above) and records in parallel morphological characteristics, which can later be used to judge the cell’s integrity by neuronal networks (Riba et al. 2020) (see above: regarding the main conclusion of the paper and the planned follow-up paper, we highlight here that the focus was intended to be on the scRT-ddPCR method and its validation, and the miniaturized scRNA-seq was used to reduce the technical divergence of the methods).

    In my opinion, separated publication of both methods will be better. While down-scaled Smart-Seq2 is often discussed in Core Facilities to bring scRNA-Seq to more biologist and clinician, scRT-dPCR is very interesting but specific method.

    Although scRNA-seq is widely used, we could support recent findings (Luecken et al. 2019) that the detection of low-abundant transcripts is still challenging. Furthermore, we provide a proof-of-principle on how to validate the lack of representability of low-abundant transcripts in scRNA-seq: scRT-ddPCR.

    I'm focused in the RNA-Sequencing from sample preparation to data analysis. I'm helping people with optimization of the design to get as much information as possible. I wasn't able to say, if used statistical methods are correct.

    We carefully chose our statistical methods according to the suggestions in literature. We are open for specific scrutiny however.

    How you used DESeq2, BBKNN?

    1. The counts and transcript abundances were imported using the tximeta and tximport packages for data aligned with salmon and kallisto, respectively. Differential testing was carried out on the resulting count matrices with DESeq2 using LRT testing and other parameters set according to the recommendations of the DESeq2 vignette for testing single-cell data.
    2. For BBKNN clustering, a custom data set was constructed combining our data (salmon aligner) with a published data set containing MCF7 cells, fibroblasts and HEK293T cells (Isakova et al. 2021). For the analysis, the data set was imported in SCANPY. Cells with fewer than 200 genes expressed and genes expressed in less than three cells were excluded from the analysis. Counts per cell were normalized with SCANPY’s built-in normalization method. The data was log-transformed, scaled and a PCA was carried out, according to the standard workflow recommended in the SCANPY documentation. BBKNN was similarly carried out with the respective SCANPY method, the final plot was created after dimensionality reduction with UMAP.

    How you have processed published data?

    External data was concatenated with our data into a single AnnData object and analyzed according to the recommendations of the SCNAPY documentation.

    4. Description of analyses that authors prefer not to carry out

    Reviewer 1

    Major concerns:

    The authors did not compare their results with standard SMART-seq2 in detection sensitivity (comparison on UMAP clustering is really trivial, and cannot serve the purpose)

    Miniaturization of SMART-seq2 and related protocols is frequently applied (Mora-Castilla et al. 2016, Jaeger et al. 2020, Isakova et al. 2021, Hahaut et al. 2022, Hagemann-Jensen et al. 2022) ensuring high quality data and reducing costs per cell. Therefore, we think that there is sufficient evidence that miniaturized/down-scaled protocols deliver the same results compared with standard protocols.

    Fig3b, there are a total of four groups of comparison, two genes X two cell lines. In one of the four, i.e. ACTB in MCF7, the quantification among the three methods differ significantly. Given no ground truth here, it is hardly to judge the quality of their method. The author should add ERCC spike-in to control their experiments as stated in their Discussion.

    1. Yes, we are aware of this difference as described in lines 371 to 375 and relate this difference to a different passage number (Tab S4) as it was already shown that housekeeping genes underlie fluctuations, too (Kozera et al. 2013). However, the difference in absolute counts does not have a significant impact on the fold changes (Fig 4c).
    2. Risso et al. (2014) showed that ERCC control signals have a high variability, and Vallejos et al. (2017) found that only half of the spiked-in molecules are detected. Literature is not conclusive about the ErbB2 expression in MCF7 cells (Subik et al. 2010, Cui et al. 2012, Durst et al. 2019), so we applied scRT-ddPCR (a method of ground truth) on single MCF7 cells to reveal ErbB2 expression at highest available resolution. Upon these considerations ERCC control might have no impact on dPCR results.
    3. However, we understand that the ERCC controls, comprising a set of polyadenylated transcripts that are added to the scRNAseq analysis experiment during single-cell isolation, can replicate the effect of low abundance transcripts. Single cells have very low transcript counts; it is questionable to quantitatively recapitulate this effect. The apparent Poisson distribution of the ERCC counts, at that low level, can complicate the quantitative analysis of the results, while the single-cell analysis also has its inherent heterogeneity. In the case of the lack of conclusive quantitative nature of ERCC spike-in, see also above, internal transcripts also can serve this aim of the study. However, in a subsequent paper, we plan to compare the two methods. To our knowledge, such a comparison between scRNA-seq and scRT-ddPCR was never performed before, so we could not follow previous realizations here. Our findings support the hypothesis that scRNA-seq suffers from detection deficits at the lower detection end (Luecken et al. 2019).

    Fig4b, ACTB in BT-474, it seems that the scDDPCR resulted in more cells in the first bin than scRNA-seq. This is in contrast to their claim of higher detection sensitivity of the former.

    There are more cells in the first bins of the scRT-ddPCR histogram but on a statistical basis, the distributions do not significantly differ (Tab S6).

    To assess the performance of their methods in a more systematic manner, the authors should perform the single cell measurements with ERCC spike-in, and check at least 5-10 endogenous genes at different expression level, in addition to the spike-in RNAs. They should choose cell lines for which the absolute no. of RNA for some house-keep genes has been measured using imaging based methods.

    We thought we addressed these issues thoroughly in our discussion (ERCC spike-ins: lines 459 to 461 and more endogenous genes: lines 438 to 443 and see also above). In our view image-based methods suffer more technical ambiguities; however, they could serve as possible validation as they are orthogonal. Additionally, spatial resolution would be preserved but absolute quantification is not possible. Our scRT-ddPCR method was validated against bulk RNA isolation methods, which serve as established references regarding the RNA isolation. We accepted the RT-PCR as a reference as it has been thoroughly validated as a method providing precise nucleic acid counts.

    The two methods described in the manuscript represent little technical advance. In addition, the conclusion stated in the manuscript is also not sufficiently convincing. As such, it would be of little interest to limited group of audience.

    The two most frequently used methods for scRNA-seq are Chromium from 10X Genomics and Smart-Seq2-based protocols. In a direct comparison, Wang et al. (2021) showed that Smart-Seq2 is better suited for the detection of low abundant transcripts. We wanted to further enhance the sensitivity of SMART-seq2 by down-scaling; it was hypothesized that this increases the detection efficiency (Mora-Castilla et al. 2016). However, we were still not able to detect low-abundant transcripts such as ErbB2 in MCF7 cells (Fig 4a). Low-abundant transcripts are often referred to as highly interesting and difficult to analyze especially regarding reproducibility (Fortunel et al. 2003, Schwender et al. 2014, Petrova et al. 2017, Taylor et al. 2017). Our proposed scRT-ddPCR can reliably and absolutely quantify low-abundant transcripts offering a solution for the detection of such targets. The majority of similar workflows use scRT-qPCR (lines 82 to 86), although dPCR is much more sensitive and can detect fold changes down to 1.16-fold (Basu 2017).

    Reviewer 3

    Major concerns:

    Down-scaled SMART-Seq with standard SMART-Seq

    We compared our down-scaled SMART-Seq2 workflow to a validated, down-scaled SMART-Seq2 workflow (Isakova et al. 2021) using UMAP clustering. Furthermore, miniaturization of SMART-Seq2 and related protocols is common practice (Mora-Castilla et al. 2016, Jaeger et al. 2020, Isakova et al. 2021, Hahaut et al. 2022, Hagemann-Jensen et al. 2022). Therefore, we think that a UMAP comparison is sufficiently proving that our down-scaled protocols deliver reliable results. However, we see some possible improvement by comparing distributions of gene expressions (see above).

    Single-cell SMART-Seq with SMART-Seq from "bulk" and "cl", which authors include in scRT-dPCR but not in scRNA-Seq

    Smart-seq2 was designed to profile the transcriptome of single cells (Picelli et al. 2013, Picelli et al. 2014). Other methods are purely for comparison and validation and were not intended to be technological advancements.

    scRT-dPCR with scRT-qPCR

    qPCR is often used to validate fold changes from RNA-seq (Zucha et al. 2021). The differences between qPCR and dPCR are extensively described, for instance, in Basu (2017). In several comparisons between qPCR and dPCR or even RT-qPCR and RT-dPCR, the latter showed increased precision, reproducibility, higher sensitivity and high tolerance towards inhibitors (Alikian et al. 2017, Taylor et al. 2017). Thus, we assume that qPCR is not the method of choice for the detection of low-abundant transcripts such as ErbB2 in MCF7 cells (lines 97 to 103).

  2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #3

    Evidence, reproducibility and clarity

    Summary:

    Authors compared differential expression of Actb and ErbB2 between cells from two cell lines MCF7 and BT-474. They used two new/optimized methods: down-scaled SMART-Seq and single-cell RT-dPCR. They demonstrated, that scRNA-Seq method is not sensitive enough method to properly quantify low abundant transcripts and we need additional method for it.

    Major comments:

    Authors are comparing results from two novel methods - down-scaled SMART-Seq and scRT-dPCR, without including standard protocol. I'm missing some additional experiments/comparison. The conclusion about one correct and one incorrect method are too strong, with many variables. Only one clear conclusion is about dropout in scRNA-Seq in comparison with scRT-dPCR.

    • Down-scaled SMART-Seq with standard SMART-Seq
    • Single-cell SMART-Seq with SMART-Seq from "bulk" and "cl", which authors include in scRT-dPCR but not in scRNA-Seq
    • scRT-dPCR with scRT-qPCR Authors analysed scRNA-Seq data using "pseudo-bulk" differential expression analysis using DESeq2. Authors did not include more details about processing of the data, if they used standard DESeq2 protocol, or modified protocol recommended for scRNA-Seq data. It is hard to conclude if the chosen method is optimal, however I'm recommending to use method, which is standard for scRNA-Seq nowadays, like a Seurat with SCTransfrom. The conclusion from whole paper is confusing, because it is bringing several new information and methods, which would be better if they would presented separately. Mainly down-scaled SMART-Seq using i.DOT and F.SIGHT - it is novel and important. Single-cell dPCR combined with F.SIGHT, which can be presented separately without down-scaled SMART-Seq. And comparison of different aligner for scRNA-Seq data analysis. It is hard to say, "what is important message of this manuscript". I don't understand, why authors present comparison of two RNA isolation protocol in RT-dPCR results. More, the whole conclusion and results are made only from one experiment from two separated measurements. Authors should repeat experiment and check if differences in log2FC between scRNA-Seq and scRT-dPCR are same all the time.

    Minor comments:

    Authors are commenting sensitivity The method part needs additional information.

    What is LBTW (line 154)?

    Have you process/sort cells for scRT-dPCR and scRNA-Seq same day?

    How you dilute RNA for ddPCR (line 180)? And for scRT-dPCR?

    Commercial kit SMART-Seq from Takara is not same as Smart-Seq2 protocol (line 198). Please do not use same name for commercial and academical protocols.

    How many cells per condition you have sorted for SMART-Seq?

    How much time you need for collection of cells?

    How much time you need for pipetting of solution? Can it be problem for neutralization of tagmentation? Have you use robot or have you manually cleaned cDNA with AMPure beads?

    Which magnetic separator you have used for clean-up? 384-well plate design and clean-up of 20 ul volume is not something standard, please specify it in the protocol.

    Have you pooled libraries before clean-up (line 238)? If yes, what was final volume? How much AMPure beads you have used for clean-up? Is it pooling reason of loosing of 30% cells from dataset?

    Can you include more details for processing of data? How you used DESeq2, BBKNN? How you have processed published data? Why you choose different cell types as you used for sequencing? Can you share whole script for data processing?

    Why it is important that tagmented cDNA was 459/432bp long (Line 310)? Is it specific for down-scaled, or classical SMART-Seq? How to use this information?

    Why you didn't't show any other cell type specific markers, which differ between chosen cell lines (lines 328/329)?

    Comparison in lines 354-355 is confusing. I don't know, what you wanted to say by showing number of copies od different genes in different cell lines.

    I'm not sure that scRNA-Seq and scRT-ddPCR are truly orthogonal methods. Both methods are PCR based. (lines 381-382).

    I don't understand "missing normalization of counts" for comparison between different aligners. Especially, because counts are normalized during analysis using DESeq2 (line 396).

    Authors should change name "integrated workflow" into something else, because there is no integration of scRNA-Seq data with scRT-dPCR. They only compare results from this two methods.

    There is no demonstration of needs of validation (line 416). At the end, it is not important if the gene has two times or three times higher expression. Important is preservation of the trend.

    Authors are analyzing relative expression of Actb and ErbB2 between two lines. Could be used scRT-qPCR instead of scRT-ddPCR? It could solve problem in the number of genes, which could be analyzed (line 438).

    Are the differences in the log2FC real problem for single-cell experiments? Authors used different cells and different number of cells for comparison. Can it be source of different log2FC?

    Why we need absolute numbers of copies of transcripts (lines 458-459)? I'm OK with relative quantity using RT-qPCR.

    Significance

    Authors presented two novel application, which both separately can be important for single-cell transcriptomic analysis. One is down-scaled SMART-Seq, which save a money and brings full-length scRNA-Seq to more researchers. Second is scRT-ddPCR, which can ultimately increase sensitivity of single-cell methods. However, combination of both methods in one paper, without comparison of other technologies decrease impact and importance both of them. I.e. Pokhilko et. al (2021) presented targeted single-cell RNA-Seq, which increase sensitivity of Smart-Seq2 too.

    In my opinion, separated publication of both methods will be better. While down-scaled Smart-Seq2 is often discussed in Core Facilities to bring scRNA-Seq to more biologist and clinician, scRT-dPCR is very interesting but specific method.

    I'm focused in the RNA-Sequencing from sample preparation to data analysis. I'm helping people with optimization of the design to get as much information as possible. I wasn't able to say, if used statistical methods are correct.

  3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #2

    Evidence, reproducibility and clarity

    This study aimed to validate the lack of representability of lowly expressed genes by using an integrated workflow of downscaled Smar-seq2 and absolute quantitative, single-cell digital PCR. They addressed the issue of biased/mismatch of data of lowly expressed genes when comparing sc-RNA-seq and RTqPCR which arises due to dropouts of lowly expressed genes in scRNA-seq. They leveraged the sensitivity of scRT-ddPCR in addressing this issue.

    The team made a great effort to address the issues related to coverage and quantity of transcriptome analysis. by combining down-scaled sc RNA-seq and scST-ddPCR. They harnessed the inherent portioning of the dPCR which effectively increases the sensitivity that is lacking in sc RNA-seq when it comes to low-abundant mRNAs. They developed a novel, integrated workflow combining down-scaled, single-cell Smart-seq2 and absolute quantitative, single-cell digital PCR. They further validated the workflow by comparative clustering from published data sets and their scRT-ddPCR datasets by contrasting absolute mRNA counts to bulk methods.

    The key conclusions of the study are satisfying and supported by the experimental design and robust experiments. Data and methods are well-presented and are reproducible. The manuscript is articulate, and well-written, the data provided are of high standards and help the reader easier understand, especially the graphical abstract.

    I have no major comments, but a few minor changes are encouraged.

    1. Figure 1. B and C figure's axes are not easy to read even at highest zoom. At least the 400 bp in the x axis could be represented using a bigger font.
    2. Figure S4. 'for in in range' needs some attention.
    3. P5, line 148 is not clear to me.

    Significance

    scRNA-seq is a great tool for characterizing cells. However, the issue of losing the lowly expressed genes due to dropouts and also the variation in the fold change found between the bulk methods and ddPCR is one of the challenges. The authors took a nice strategy to address these issues through their effective workflow. The authors performed a thorough comparison between the data from scRNA-seq and ddPCR and their workflow showed to be very effective in addressing the issue of biased conclusions which substantiate their findings. Furthermore, by investigating the workflow in two different cell lines convincingly corroborates their results.

    This manuscript is well-written, experiments are thoroughly performed, the findings are convincing and it clearly is an important contribution to the scientific community. Great piece of work and I wish the authors all the best.

  4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #1

    Evidence, reproducibility and clarity

    The manuscript by Lange et.al described two methods, down-scaled sc-SMART-seq 2 and sc-droplet-based digital PCR for quantification of gene expression at single-cell level. By plate-based analysis of two cell lines, MCF-7 and BT-474, the authors claimed that their methods could achieve high sensitivity and accuracy in single cell gene expression quantification, in particular for the digital PCR strategy. In my opinion, this major conclusion is not sufficiently convincing, given that

    1. The authors did not compare their results with standard SMART-seq2 in detection sensitivity (comparison on UMAP clustering is really trivial, and cannot serve the purpose)
    2. Fig3b, there are a total of four groups of comparison, two genes X two cell lines. In one of the four, i.e. ACTB in MCF7, the quantification among the three methods differ significantly. Given no ground truth here, it is hardly to judge the quality of their method. The author should add ERCC spike-in to control their experiments as stated in their Discussion.
    3. Fig4b, ACTB in BT-474, it seems that the scDDPCR resulted in more cells in the first bin than scRNA-seq. This is in contrast to their claim of higher detection sensitivity of the former.

    To assess the performance of their methods in a more systematic manner, the authors should perform the single cell measurements with ERCC spike-in, and check at least 5-10 endogenous genes at different expression level, in addition to the spike-in RNAs. They should choose cell lines for which the absolute no. of RNA for some house-keep genes has been measured using imaging based methods.

    Minor concern

    1. Fig.1a,b, in ROI, there are overlap between "printed cells" and "detected particles"? How to distinguish between the two?
    2. Fig2d, what is the difference between DEG and "different genes"? The no. different genes is not specified for STAR?
    3. It is not clear how the bulk samples(Fig.3,4) were prepared.

    Significance

    The two methods described in the manuscript represent little technical advance. In addition, the conclusion stated in the manuscript is also not sufficiently convincing. As such, it would be of little interest to limited group of audience.

    I have been working in the field of genomics, in particularly transcrptomics for the last 20 years. In the last few years, my lab has been developing single-cell omics related methods.