Transcriptomic entropy benchmarks stem cell-derived cardiomyocyte maturation against endogenous tissue at single cell level
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Review Commons)
Abstract
The immaturity of pluripotent stem cell (PSC)-derived tissues has emerged as a universal problem for their biomedical applications. While efforts have been made to generate adult-like cells from PSCs, direct benchmarking of PSC-derived tissues against in vivo development has not been established. Thus, maturation status is often assessed on an ad-hoc basis. Single cell RNA-sequencing (scRNA-seq) offers a promising solution, though cross-study comparison is limited by dataset-specific batch effects. Here, we developed a novel approach to quantify PSC-derived cardiomyocyte (CM) maturation through transcriptomic entropy. Transcriptomic entropy is robust across datasets regardless of differences in isolation protocols, library preparation, and other potential batch effects. With this new model, we analyzed over 45 scRNA-seq datasets and over 52,000 CMs, and established a cross-study, cross-species CM maturation reference. This reference enabled us to directly compare PSC-CMs with the in vivo developmental trajectory and thereby to quantify PSC-CM maturation status. We further found that our entropy-based approach can be used for other cell types, including pancreatic beta cells and hepatocytes. Our study presents a biologically relevant and interpretable metric for quantifying PSC-derived tissue maturation, and is extensible to numerous tissue engineering contexts.
There is significant interest in generating mature cardiomyocytes from pluripotent stem cells. However, there are currently few effective metrics to quantify the maturation status of a single cardiomyocyte. We developed a new metric for measuring cardiomyocyte maturation using single cell RNA-sequencing data. This metric, called entropy score, uses the gene distribution to estimate maturation at the single cell level. Entropy score enables comparing pluripotent stem cell-derived cardiomyocytes directly against endogenously-isolated cardiomyocytes. Thus, entropy score can better assist in development of approaches to improve the maturation of pluripotent stem cell-derived cardiomyocytes.
Article activity feed
-
Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Reply to the reviewers
Our response to reviewers has been provided as a formatted typeset pdf file. This includes the original review comments (bolded) and our responses. In particular, our responses include several figures. Our intention is to include the full set of reviews and responses as supplementary information in our manuscript once published at a journal - we would also be happy to have this document uploaded to biorXiv for readers as well.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #3
Evidence, reproducibility and clarity
Summary:
Kannan et al start with the good idea of using Shannon entropy as a way to temporally classify the development of cells, quantifying their maturation status by implementing it on single cell gene expression as measured by scRNAseq. The idea behind is that as cells develop, genes are silenced and hence the overall GeX entropy goes down. This approach would allow a robust method to compare heterogeneous datasets, an important problem that current scRNAseq analysis methods (such as Monocle) using dimensionality reduction are unable to robustly perform this task. Unfortunately the analysis and calculation of the entropy and …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #3
Evidence, reproducibility and clarity
Summary:
Kannan et al start with the good idea of using Shannon entropy as a way to temporally classify the development of cells, quantifying their maturation status by implementing it on single cell gene expression as measured by scRNAseq. The idea behind is that as cells develop, genes are silenced and hence the overall GeX entropy goes down. This approach would allow a robust method to compare heterogeneous datasets, an important problem that current scRNAseq analysis methods (such as Monocle) using dimensionality reduction are unable to robustly perform this task. Unfortunately the analysis and calculation of the entropy and also the results obtained do not generate convincing proof that Entropy is actually a good metric for comparing development in diverse datasets/cell types.
Major Comments:
-The calculation of the entropy is not clear enough (or not performed correctly).Shouldn't Pi be the GeX distribution of Gene i across all cells? The authors seem to have calculated Pi as the probability of expression in one cell then summed across. Unless I am wrong, this does not make sense and invalidates all the analysis.
-Entropy score correlated only moderately with pseudotimes for the three methods. This is a major problem that needs to be explained. One would expect entropy to give a higher correlation if it is a robust measure of development.
-One of the main purposes of the approach is to classify maturation of in vitro datasets, but basically no entropy changes are found. They are minimal in figures 5c. Following with this, the developmental times of the datasets as shown by color codes do not match the changes in entropy (see Figs 4b, 5a/b.
Minor Comments:
-Also Pi being a probability, how was the normalization performed so that the sum of the probability is 1. Given the variability in gene expression, scRNAseq platforms and number of cells it would be good to have a metric estimating the quality of the distribution. -why is the entropy not compared between the Kannan dataset and Wang and Yao? This would prove that indeed entropy is a good measure as opposed to UMAP+monocle.
Fig 3 should be in the supplement.
Significance
The idea behind this study is of potential significance as well stated by the authors, but the implementation of these ideas lacks scientific rigor. Entropy analysis needs to be repeated or clarified/better explained.
Referees cross-commenting
After reading the other reviewers comments showing the relevance of the approach developed by the authors, I do feel that with some clarification/discussion regarding the technical questions of the analysis solving the doubts I expressed, the manuscript could be of interest.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #2
Evidence, reproducibility and clarity
The manuscript does a fairly exhaustive job of comparing and bench-marking different single cell/nucleus RNA-seq on in vivo cardiomyocytes and in vitro cardiomyocyte differentiation protocols. The analyses is clearly described.
Minor comments, questions and clarifications sought:
It may be useful to emphasize that matching the entropy score of in vivo cardiomyocytes (or a given CM developmental state) is not a sufficient indication of matching the expression patterns of the in vivo counterpart. Compare entropy scores from cardiomyocytes from snRNA-seq on post mortem tissue (Litviňuková, et al. Nature volume 588, pages466-472(2020))
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #2
Evidence, reproducibility and clarity
The manuscript does a fairly exhaustive job of comparing and bench-marking different single cell/nucleus RNA-seq on in vivo cardiomyocytes and in vitro cardiomyocyte differentiation protocols. The analyses is clearly described.
Minor comments, questions and clarifications sought:
It may be useful to emphasize that matching the entropy score of in vivo cardiomyocytes (or a given CM developmental state) is not a sufficient indication of matching the expression patterns of the in vivo counterpart. Compare entropy scores from cardiomyocytes from snRNA-seq on post mortem tissue (Litviňuková, et al. Nature volume 588, pages466-472(2020)) There are differences in cardiomyocytes obtained from different regions of the human heart (atrial vs. ventricular, left vs. right, etc.). It will be informative to compare the many in vitro differentiation datasets (and protocols) that may give result in atrial-like or ventricular-like CM to their in vivo counterparts. This question pertains to in vitro CM differentiation: Is entropy score sensitive to cell-types that differentiate into alternative lineages during in vitro differentiation (issue of purity)? Different cell lineages may have different maturation rates and if they are not excluded, the non-cardiomyocyte cells could contribute to noisy measurements. If the entropy score is calculated after a first round of clustering, on identified CM among the population (as opposed to cardiac progenitor cells, for example), I would be more confident of the entropy score.
This also pertains to in vitro CM differentiation: Even within the cardiomyocyte lineage, there may be different rates of development that ultimately lead to the same end point. Therefore there may be the need to coarse-grain the developmental time-points to account for the precocious ones and the 'late bloomers'. It may be useful to anchor the developmental trajectory based on entropy score to biological milestones (such as when the CM's start beating in plates). Can the authors comment on this, please?
CM's are interesting in that they are post-mitotic and as such, will attain a level or maturity at the end of the maturation process. I can imagine this not being the case for cells that continue to cycle and divide. It would be interesting to compare the change in entropy score for such cells. How about cells that differentiate when activated by an external stimulus (e.g., immune cells)? As long as a cell has high transcriptional variability or is transcriptionally active (e.g., as stress response) it may still show high entropy score. How would one interpret Entropy scores in such situations?
The authors note "higher mtGENE in differentiated cells and later time points."- Fig 2a. Could this be related to difficulty in dissociation, as part of stress response? The authors note "In particular, 10x Chromium and STRT-seq datasets appeared to have systematically higher percentages of ribosomal protein-coding genes than other protocols." Could this simply be due to higher transcript capture rate of these protocols? These protocols/techniques may not be statistically sampling a cell's transcripts at the same rate as the techniques with "lower" capture efficiency.
Can entropy score be used in the context of activation (under external stimulus) or deactivation (when the external stimulus is removed)?
What do the black dots represent in Fig 2c?
Significance
The manuscript, "Transcriptomic entropy benchmarks stem cell-derived cardiomyocyte maturation against endogenous tissue at single cell level" by Kannan et al. introduces an interesting phenomenon, transcriptional entropy to track the rate of maturation in an important in PSC-derived cardiomyocytes. The need for cardiomyocyte in translational and clinical research along with the difficulty in getting live, mature cardiomyocytes from humans and make it imperative that in vitro systems are sought. Being able to characterize the rate of differentiation and maturation in these in vitro systems is also valuable and in that respect, the manuscript does a fairly exhaustive job of comparing and benchmarking different cardiomyocyte differentiation protocols that have been profiled by sc/snRNA-seq to date. Most importantly, comparing entropy scores between in vitro and in vivo counterparts is a simple and elegant way to anchor in vitro differentiation to pre- and post-natal development. Another interesting aspect of transcriptional entropy measure in a single cell is that it is independent of neighboring cells, and is therefore a conceptually different and novel way to characterize single cell data that, to date, have been analyzed by techniques that group cells by each cell's similarity to others. The study is well conceived and systematically explored. The manuscript is also well written. I recommend that the manuscript be accepted for publication.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #1
Evidence, reproducibility and clarity
Kannan et al. have developed an approach based on the quantification of gene distributions to assess pluripotent stem cell (PSC)-derived cell and tissue maturation. Methodologically, they combined single cell RNA-seq (scRNA-seq) with bioinformatic and statistical approaches to calculate transcriptomic entropy scores to benchmark cellular maturation. Their findings address unresolved issues regarding the developmental state of isolated cells and current problems associated with cell population heterogeneity. As model systems, the authors focused on cardiomyocytes (CMs) from mouse heart and on CMs generated through in vitro …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #1
Evidence, reproducibility and clarity
Kannan et al. have developed an approach based on the quantification of gene distributions to assess pluripotent stem cell (PSC)-derived cell and tissue maturation. Methodologically, they combined single cell RNA-seq (scRNA-seq) with bioinformatic and statistical approaches to calculate transcriptomic entropy scores to benchmark cellular maturation. Their findings address unresolved issues regarding the developmental state of isolated cells and current problems associated with cell population heterogeneity. As model systems, the authors focused on cardiomyocytes (CMs) from mouse heart and on CMs generated through in vitro differentiated of PSCs from human. The authors examine a spectrum of CMs from mouse heart as a function of developmental time and provide evidence showing that scRNA-seq captures maturation related changes. Using a modification of the Shannon entropy of scRNA-seq and CMs isolated from embryonic, fetal, neonatal and early adult mouse hearts, they show that transcriptomic entropy scores decrease with developmental time. The authors then extend their results to human cells and perform a meta-analysis of publicly available scRNA-seq datasets. When cross-study comparisons were performed, meaningful comparisons could only be generated after gene and cell filtration. The output of the resulting workflow and computed entropy scores show good concordance among cells generated using different in vitro differentiation and different isolation techniques, and between stage-matched mouse and human tissues. The authors go on to show that in vitro derived CMs or reprogrammed CMs (from fibroblasts) undergo an apparent developmental block to maturation in vitro. The relevance of their approach to other cell systems was demonstrated using datasets from pancreatic beta cells and hepatocytes. In summary, the calculated entropy scores recapitulate known CM maturation gene expression profiles, making this approach invaluable for future comparisons between engineered and in vivo derived tissues.
Comments:
The key conclusions of the manuscript by Kannan et al. are supported by an examination of multiple datasets and the use of extensive and complementary bioinformatic and statistical analyses. The authors utilized a digestion and cell sorting approach that permits the isolation of viable CMs from mouse heart. The choice of scRNA-seq approaches eliminated cell type heterogeneity (either physically or bioinformatically) from otherwise complex cell populations. The authors then employed a variety of analytical approaches to identify limitations to cross-data comparisons and to define the maturation state of the cells. By minimizing protocol-related biases, resolving mismapping of mitochondrial reads to pseudogenes, taking into account variations in study sensitivity, and excluding datasets of relative poor quality, they were able to develop an informative workflow to generate meaningful entropy scores to benchmark maturation in cross-study and cross-species comparisons. These comparisons were validated using reprogrammed fibroblasts, hepatocytes and pancreatic beta cells. Overall, the experiments were well designed, the experimental and bioinformatic limitations addressed, and the conclusions supported by robust datasets, entropy scores, bioinformatics and statistics. This leads me to conclude that their validated approach will be of significant value to other researchers who need to benchmark cell maturation using a quantitative, transcriptome-based approach.
A few experimental additions or discussion points would have strengthened the overall impact of this study.
First, the process of cell dissociation coupled with cell sorting may be associated with a time lag in sample preparation that might be expected to affect RNA stability. If comparisons were performed between scRNA-seq and bulk RNA-seq, would the entropy scores have been equally informative or would differences have been observed from RNA instability that may have affected the entropy scores? While this test would be difficult with in vivo acquired cells, such a comparison could have been made using purified (but not sorted) hPSC-CMs. An answer to this question might be valuable to investigators who wish to use your approach to examine existing bulk RNA-seq datasets. Basically, is the workflow only applicable for scRNA-seq data where problems of cell heterogeneity can be eliminated, even though you provide evidence on how to exclude non-CMs from your datasets using transcriptome profiles?
Second, would mouse strain differences or sex differences cause a shift in the entropy scores or pseudotime analyses, even if only marginally? Not all mouse models develop at the same rate and sex is known to affect both murine fetal and infant growth.
Third, when performing the entropy scores and pseudotime analyses, were there specific transcripts or groups of transcripts that were more informative of specific stages of maturation? You mention that ~81.5% were identified as differentially expressed by all methods and some transcript profiles are shown in Figure 4e, but were any informative genes or gene sets (i.e., markers) more useful for assessing maturation that would not require scRNA-seq? This information (which could be added in the supplement) might make your approach more accessible to the broader research community (i.e., the identification of new and informative markers of CM development or differentiation). Alternatively, it may be that scRNA-seq is required. If so, then this should be discussed. Finally, could you comment further on the application of entropy scores to study maturation and how your approach may be of value to the research community? A number of situations beyond comparisons of engineered and in vivo tissues, and somatic cell reprogramming protocols might include an evaluation of PSC-CMs for pharmaceutical and toxicity testing, and the prediction of pathways that may be essential for maturation of cells either through a gene regulatory network or through individual signaling pathways. While these experiments and discussion points are not necessary to support your conclusions, an evaluation of these points and limitations in the Discussion may broaden the paper's impact and significance.
As minor critiques, there are a few typos (e.g., celltypes [cell types]), redundancies (e.g., ...transcript and protein level expression [...transcript and protein levels.]), and some improvements to the figures that could be made. For the latter, the font sizes are often too small (Figs 1, 3, 4, 5), as are some of the timepoints listed on the x axis (Fig 3a,d, 4b). Otherwise, the figures are visually informative, and the supplemental data are necessary to the assessment of the procedure.
Significance
The approach describe by Kannan et al. represents a significant advance over existing strategies to benchmark maturation states of PSC derivatives. Gene expression studies1 and transcriptome-based studies2-4 have been useful to estimate the developmental state of mouse and human PSC-CMs; however, most published studies have relied either on an assessment of a few markers or on data from a limited number of in vivo derived samples. These earlier studies were further limited by the confounding problem of heterogeneous cell populations. Omics based quantitative approaches have been proposed for improved maturation benchmarking and have proved valuable to study the differentiation of stem cells to progenitors and to committed lineages. 5-9 In this paper, Kannen et al. have improved upon these approaches and report the use of entropy scores to benchmark in vitro PSC-CM maturation against a gold standard of in vivo counterparts. The result is a reference resource that captures transcriptomic profiles from mouse CMs across a broad range of developmental states that will be particularly valuable to the cardiac field. By extending the assessments to include meta-analyses and cross-species comparisons (mouse versus human), they have established a workflow that results in a meaningful benchmark a cell's maturation state. Kannan et al., thus, have developed a quantitative and reproducible approach (entropy score) that simultaneously resolves issues of cell heterogeneity and estimates then in vivo maturation state of in vitro derived cells. This quantitative approach is likely to advance studies designed to assess drug and toxicity testing of more "adult-like" CMs, and adoption of this approach by the broader stem cell community will likely prove invaluable for the assessment of engineered tissues made from complex cell populations and for applications to regenerative medicine.
Keywords: Reviewer's field of expertise Cardiovascular Physiology, Stem Cell Biology, Omics
References:
- AC Fijnvandraat, et al., Cardiomyocytes derived from embryonic stem cells resemble cardiomyocytes of the embryonic heart tube. Cardiovascular Research 58, 399-409 (2003).
- E Poon, et al., Transcriptome-guided functional analyses reveal novel biological properties and regulatory hierarchy of human embryonic stem cell-derived ventricular cardiomyocytes crucial for maturation. PLoS ONE 8, e77784 (2013).
- CW van den Berg, et al., Transcriptome of human foetal heart compared with cardiomyocytes from pluripotent stem cells. Development (Cambridge, England) 142, 3231-3238 (2015).
- H Uosaki, et al., Transcriptional Landscape of Cardiomyocyte Maturation. Cell Reports 13, 1705-1716 (2015).
- D Grun, et al., De Novo Prediction of Stem Cell Identity using Resource De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data. Cell Stem Cell 19, 266-277 (2016).
- W Chen, AE Teschendorff, Estimating Differentiation Potency of Single Cells Using Single- Cell Entropy (SCENT). Comput. Methods for Single-Cell Data Analysis 1935, 125-139 (2019).
- M Guo, EL Bao, M Wagner, JA Whitsett, Y Xu, SLICE : determining cell differentiation and lineage based on single cell entropy. Nucleic Acids Res. 45, 1-14 (2017).
- AE Teschendorff, T Enver, Single-cell entropy for accurate estimation of differentiation potency from a cell's transcriptome. Nat. Commun. 8, 1-15 (2017).
- GS Gulati, et al., Single-cell transcriptional diversity is a hallmark of developmental potential. Science 367, 405-411 (2020).
-
