Large-scale synthetic data enable digital twins of human excitable cells
Curation statements for this article:-
Curated by eLife
eLife assessment:
This important study presents a novel and technically robust framework that combines deep learning and optimized patch‑clamp protocols to infer biophysical parameters and generate electrophysiology‑based digital twins, with the inclusion of convincing experimental data being a clear strength; there is methodological innovation and potential impact for understanding cellular heterogeneity, drug response, and arrhythmia risk prediction. Concerns remain about clarity and validation, particularly regarding the biological meaning of the modeled heterogeneity, the selection and sufficiency of large synthetic training populations, and the robustness and uniqueness of inferred parameter sets. Most notably, key translational claims (e.g., replacing large‑scale wet experiments and predicting rare arrhythmic events) lack direct experimental validation and head‑to‑head comparisons with conventional protocols. Overall, while the approach is promising and timely, stronger biological grounding, clearer framing, and additional experimental validation are needed to support the manuscript's broad claims.
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (eLife)
Abstract
Individual variability shapes how diseases manifest, how patients respond to therapy, and how rare phenotypes arise. Conventional experimental approaches obscure variation by averaging, which limits mechanistic insight and predictive accuracy. We present a computational framework that builds digital twins of human induced pluripotent stem cell derived cardiomyocytes from a single optimized voltage clamp experiment. The framework depends on massive synthetic datasets comprising synthetic simulated cells that span broad ionic and electrophysiological ranges. These synthetic data make it possible to control parameters precisely, explore biological variability comprehensively, and train models beyond the limits of experimental data. A neural network trained on synthetic data then inferred cell specific biophysical parameters from experimental recordings from live cells, reproducing distinct features. Our study unites computational modeling, data simulation, and learning to enable scalable, precise, individualized cardiac electrophysiology modeling and can be readily extended to any electrically active cell type.
Article activity feed
-
eLife assessment:
This important study presents a novel and technically robust framework that combines deep learning and optimized patch‑clamp protocols to infer biophysical parameters and generate electrophysiology‑based digital twins, with the inclusion of convincing experimental data being a clear strength; there is methodological innovation and potential impact for understanding cellular heterogeneity, drug response, and arrhythmia risk prediction. Concerns remain about clarity and validation, particularly regarding the biological meaning of the modeled heterogeneity, the selection and sufficiency of large synthetic training populations, and the robustness and uniqueness of inferred parameter sets. Most notably, key translational claims (e.g., replacing large‑scale wet experiments and predicting rare arrhythmic events) lack direct …
eLife assessment:
This important study presents a novel and technically robust framework that combines deep learning and optimized patch‑clamp protocols to infer biophysical parameters and generate electrophysiology‑based digital twins, with the inclusion of convincing experimental data being a clear strength; there is methodological innovation and potential impact for understanding cellular heterogeneity, drug response, and arrhythmia risk prediction. Concerns remain about clarity and validation, particularly regarding the biological meaning of the modeled heterogeneity, the selection and sufficiency of large synthetic training populations, and the robustness and uniqueness of inferred parameter sets. Most notably, key translational claims (e.g., replacing large‑scale wet experiments and predicting rare arrhythmic events) lack direct experimental validation and head‑to‑head comparisons with conventional protocols. Overall, while the approach is promising and timely, stronger biological grounding, clearer framing, and additional experimental validation are needed to support the manuscript's broad claims.
-
Reviewer #1 (Public review):
Summary:
This study presents an interesting approach for finding electrophysiological models that match experimental patch-clamp data. The authors develop a new method for deriving optimized current clamp protocols by training a neural network on synthetic data. This optimized current clamp is then used on both computational training data and on experimental data to predict current gating and conductance parameters that correctly reconstruct the electrical phenotype.
Strengths:
(1) The fitting of gating variables through an optimized patch clamp protocol is interesting.
(2) The inclusion of experimental data is important, and the approach is shown to be effective in fitting them.
Weaknesses:
(1) Some clarity is necessary on the generation and selection of variable IPSC models. With such a large variation in …
Reviewer #1 (Public review):
Summary:
This study presents an interesting approach for finding electrophysiological models that match experimental patch-clamp data. The authors develop a new method for deriving optimized current clamp protocols by training a neural network on synthetic data. This optimized current clamp is then used on both computational training data and on experimental data to predict current gating and conductance parameters that correctly reconstruct the electrical phenotype.
Strengths:
(1) The fitting of gating variables through an optimized patch clamp protocol is interesting.
(2) The inclusion of experimental data is important, and the approach is shown to be effective in fitting them.
Weaknesses:
(1) Some clarity is necessary on the generation and selection of variable IPSC models. With such a large variation in so many parameters, I would expect some resulting parameters to generate non-realistic phenotypes, quiescent cells, etc. Are all 200,000 or 1,100,000 generated cells viable? Or are they selected somehow for realistic cell properties?
(2) The error shown in Figure 4 between different population sizes is not completely explained in the text - there seems to be a minimal difference between a population of 1,000 and 10,000, followed by a very good fit at 200,000. Is there a particular threshold that needs to be crossed where the error drops off? Related, how was the 200,000 number chosen?
(3) Related to the point above, the 1,100,000 population for fitting experimental data also needs a more complete explanation: how was this number chosen, and how does the error compare with the other population sizes shown in Figure 4?
(4) Why are the optimized current clamp protocols different between panels A and B in Figure 5? Are they somehow informed by experimental data?
(5) Figure 6D: Is the EAD risk in panel D specific to cell 1, 2, or the pooled variants of both?
(6) How sensitive is the fitting to minor parameter variation? Further, if one were to pick, let's say, the next-best fitting value, would that fall close to the best one? Is the solution found unique, or are there multiple sets with good fits?
-
Reviewer #2 (Public review):
Summary:
The authors present a computational framework for generating "cell-specific" digital twins of human iPSC-CMs from a single optimized voltage clamp recording. Using deep learning trained on > 1 million artificial cells, the authors demonstrate that the model can infer 52 biophysical parameters governing 6 major ionic currents, and the resulting digital twins can reproduce experimentally recorded action potentials.
Strengths:
The framework has clear potential for understanding cellular heterogeneity in iPSC-CMs, predicting individual drug responses, and reducing the experimental burden of multiple patch clamp protocols.
Weaknesses:
There are several concerns about the validation of the model and its clarity. First, the biological variability being modeled in this manuscript is not defined well. It is …
Reviewer #2 (Public review):
Summary:
The authors present a computational framework for generating "cell-specific" digital twins of human iPSC-CMs from a single optimized voltage clamp recording. Using deep learning trained on > 1 million artificial cells, the authors demonstrate that the model can infer 52 biophysical parameters governing 6 major ionic currents, and the resulting digital twins can reproduce experimentally recorded action potentials.
Strengths:
The framework has clear potential for understanding cellular heterogeneity in iPSC-CMs, predicting individual drug responses, and reducing the experimental burden of multiple patch clamp protocols.
Weaknesses:
There are several concerns about the validation of the model and its clarity. First, the biological variability being modeled in this manuscript is not defined well. It is unclear whether the framework addresses cell-to-cell differences within a single differentiation batch, variability across iPSC lines, or donor-to-donor differences. This ambiguity makes it difficult to interpret what the "digital twin populations" actually represent biologically. Second, the main claim, "the digital twins enable drug testing and arrhythmia prediction that would be impractical experimentally", is not experimentally validated. For example, the E-4031 simulations predict EAD rates, but no direct experimental head-to-head comparison is provided to confirm that these predictions are accurate. Third, technical reproducibility and biological representativeness are not assessed. Single voltage clamp recordings are inherently noisy. Without knowing how much variability comes from the recording process (technical variation) vs true biological differences, it is difficult to judge whether observed "cell-specific" parameter differences are meaningful. In addition, the optimized protocol is claimed to be superior to conventional approaches, but again, no experimental comparison is shown.
The authors should address these concerns, with particular emphasis on clarifying the biological context and providing direct experimental validation. Below are detailed specific points:
(1) Ambiguous definition of iPSC-CM heterogeneity.
The authors model "typical iPSC-CM heterogeneity" by varying 52 parameters +/- 40% around a baseline model (Figure 1), generating > 1 million synthetic cells. However, the manuscript does not clearly state what biological variability this model is intended to capture. Is this modeling within-line, cell-to-cell variability (e.g., cells from the same dish or differentiation batch that differ due to stochastic gene expression or maturation state)? Or is this modeling between-line or between-donor variability (e.g., genetic background differences, reprogramming efficiency)? This distinction is critical for interpretation. If the goal is to understand why different cells in the same dish behave differently, then training data should reflect that. If the goal is to compare patient lines or disease models, the framework needs validation across multiple donors or lines.
For example, the experimental validation in Figure 5 uses a single iPSC line (iPS-6-9-9T.B), but how many differentiation batches or dishes were tested, or whether cells came from the same preparation are unclear. Another example is that the wide AP diversity in the training population (Figure 1A) is impressive, but there is no demonstration that real experimental cells actually fall within this assumption range of +/- 40%.
From a biological perspective, iPSC-CMs are known to be highly heterogeneous within lines (maturation state, metabolic differences, epigenetic variation, spatial differences within the same dish, etc) and between lines (different donor/genetic background). Thus, please explicitly state whether the +/- 40% variation is intended to model within-line or between-line heterogeneity, and justify this choice with wet experiment data (or reference to experimental literature on iPSC-CM variability). Please clarify how many dishes, differentiation batches, and time points post-differentiation were used for experimental recordings (Figures 5-6). If the framework is intended to generalize across lines from different donors, please test the model on multiple independent iPSC lines (from different donors).
(2) Biological representativeness of single-cell measurements.
The framework generates digital twins from single voltage clamp recordings. The patch clamp recordings in iPSC-CMs are subject to substantial technical variability. The manuscript does not address a fundamental question: "How representative are the measurements from a single cell on the dish (or line)?" In other words, if I measure one cell from a dish of a million cells, does that cell's digital twin tell me something about the dish as a whole, or just about that one cell? The manuscript presents Cell 1 and Cell 2 (Figures 5-6) as distinct individuals, but it's unclear whether these differences reflect true biological heterogeneity or simply sampling variability. I think the authors should perform replicate recordings on multiple cells (e.g., > 10 cells) from the same dish (same differentiation batch) and quantify how much the inferred parameters vary, and then compare between lines.
(3) No experimental validation of the main claim that in silico populations can replace wet experiments.
The most exciting claim in the manuscript is that digital twins enable drug testing and arrhythmia prediction "at scale" without requiring hundreds of patch clamp experiments. Specifically, the authors show that in silico populations derived from two experimental cells (Figure 6C) predict dose-dependent EAD incidence for the IKr blocker E-4031 (Figure 6D), with ~3% of cells showing EADs at 50 nM.
However, this prediction is not validated experimentally. If I actually patch 20-30 real iPSC-CMs and apply 50 nM E-4031, will ~3% of them show EADs, as the model predicts? Without this validation, I think the drug testing framework is purely hypothetical. The model may be internally consistent (e.g., Cell 1's twin behaves differently from Cell 2's twin), but there is no evidence that these in silico populations reflect real biological variability in drug response. Please provide experimental validation that justifies the prediction by digital twins.
(4) Experimental validation and head-to-head comparison of optimized protocol.
The authors claim that their deep learning-optimized voltage clamp protocol (Figure 3, Figure 4A) is superior to conventional approaches, but they have not validated this experimentally by doing a head-to-head comparison. The manuscript does not compare the optimized protocol to any published voltage clamp designs. If the optimized protocol is genuinely easier to implement and more informative than existing approaches, this would be a major practical advance. But without side-by-side comparison, it is impossible to judge whether the optimization made a real difference.
-
Reviewer #3 (Public review):
Summary:
This work uses a convolutional neural network to optimize a voltage clamp protocol to identify features and parameters from human pluripotent stem cell-derived cardiomyocytes.
Yang et al. introduce an innovative experimental framework that integrates computational modeling and deep learning to generate a digital twin of human pluripotent stem cell-derived cardiomyocytes (hPSC-CMs).
Strengths:
The major strength is the methodology used to bridge in silico prediction of cell behavior and mechanistic insights from the experimental dataset.
The approach used in this study represents a significant step toward precision medicine by enabling in silico prediction of cellular behavior and mechanistic insight from experimental datasets. The study addresses an important and timely challenge in stem cell-based …
Reviewer #3 (Public review):
Summary:
This work uses a convolutional neural network to optimize a voltage clamp protocol to identify features and parameters from human pluripotent stem cell-derived cardiomyocytes.
Yang et al. introduce an innovative experimental framework that integrates computational modeling and deep learning to generate a digital twin of human pluripotent stem cell-derived cardiomyocytes (hPSC-CMs).
Strengths:
The major strength is the methodology used to bridge in silico prediction of cell behavior and mechanistic insights from the experimental dataset.
The approach used in this study represents a significant step toward precision medicine by enabling in silico prediction of cellular behavior and mechanistic insight from experimental datasets. The study addresses an important and timely challenge in stem cell-based and personalized medicine, and the authors compellingly leverage state-of-the-art methods alongside strong expertise in computational modeling and cardiac electrophysiology
Weaknesses:
While the overall approach is highly compelling and the potential impact is substantial, there are two areas where clarification and refinement, particularly in the phrasing and framing used throughout the manuscript, would further strengthen the work.
(1) While the overall goal of the study is compelling, the manuscript would benefit from clearer articulation of how the proposed framework is intended to be used in practice. In particular, it is not entirely clear whether the authors envision this approach as:
a) a method to extract population-level trends that, when paired with biological data, enhance statistical power and interpretability, or
b) a strategy capable of constructing a population-based model from limited single-cell recordings. If the latter is intended, additional guidance on the number of action potentials required per cell and the assumptions underlying this extrapolation would greatly clarify the scope and applicability of the method.
(2) The manuscript would also benefit from a clearer explanation of how electrophysiological heterogeneity observed in hPSC-CMs is linked to inter-patient variability. Although the authors state that this framework can be generalized to compare patient-specific hiPSC-CM lines, it remains unclear how this generalization is achieved, given the substantial sources of variability intrinsic to hiPSC-CMs (e.g., batch effects, reprogramming strategy, differentiation protocol, and maturation state). As acknowledged by the authors, addressing this level of variability likely requires large datasets; further clarification of how the proposed approach mitigates or accommodates these challenges would strengthen the translational claims.
Below are my suggestions that could help strengthen the claims in the manuscript:
(1) Adding a dedicated section describing the electrophysiological phenotype of the hPSC-CMs used in this study would help justify the choice of the underlying ionic model and the selection of the six ion currents analyzed. These currents are not only developmentally regulated but may also vary substantially across different hPSC-CM lines, which has implications for generalizability.
(2) If feasible, inclusion of patch-clamp data from an additional hPSC-CM line would significantly strengthen the claim that this framework can harmonize and generalize across datasets and cell sources.
(3) The authors note that the experimental cells exhibited high variability in action potential morphology. This is an important observation that directly supports the motivation for the study and should be explicitly presented, even if only in the supplementary materials.
(4) In the hERG-blocker experiments, further clarification is needed regarding the biological relevance of the reported 3% incidence of early afterdepolarizations (EADs). Additionally, an interrupted sentence in this section makes it unclear whether the goal is to demonstrate that the digital twin can capture rare arrhythmic risk events or whether the digital twin is necessary to determine whether this level of risk is clinically meaningful.
(5) The manuscript states that some action potentials were excluded from the experimental dataset. A brief explanation of the exclusion criteria, along with guidance on how to distinguish high-quality from low-quality recordings, would improve transparency and reproducibility.
-