Personalized Synthetic Electrocardiograms with Outcomes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Synthetic data can be the solution to privacy requirements, can enrich datasets limited by underrepresentation of certain subgroups/minorities, combat data shortage, and reduce annotation costs, to facilitate the development of data-hungry machine-learning applications. An important limitation of current synthetic data is the missing link to patient characteristics and outcomes. This metadata is essential for synthetic data to be useful for solving real-world problems, and we aimed to generate novel synthetic data with clinical characteristics and outcomes.
Methods
We designed a novel generative method for generating 1:1 personalized synthetic electrocardiograms (ECGs) with associated patient characteristics and outcomes. The architecture of the model is a U-net neural network, which creates patient-specific ECGs to allow attachment of patient characteristics, comorbidities, and outcomes. We developed the model on the General Suburban Population Study and the Lolland-Falster Study cohorts and validated the model on the Danish Inter99 cohort. We compared original and synthetic ECGs usings Bland-Altman plots, by comparing associations with sex, age, and body mass index using linear models, and by comparing associations with all-cause mortality using Cox models. We defined that at most 5% of synthetic ECGs should have their paired original ECG as nearest neighbor in Euclidean space.
Results
We generated 6,612 novel, personalized electrocardiograms. Although Bland-Altman plots showed a high level of agreement between synthetic and original ECGs, only 3.7% of synthetic ECGs had their original paired ECG as their nearest neighbor. Synthetic ECGs had preserved relations between heart rate, R-wave amplitude, and T-wave amplitude and age, sex, and body mass index. Corrected QT interval was about 9.5 ms shorter in men compared to women in both the original and synthetic cohorts. The associations between PR interval and clinical characteristics were attenuated in the synthetic cohort. Heart rate and corrected QT interval were each associated with increased mortality with similar hazard ratios in the original and synthetic populations.
Conclusions
We demonstrated the ability of a novel neural network method to generate personalized synthetic ECGs with preserved associations with many patient characteristics and with all-cause mortality. This method may facilitate data sharing for the development of better risk prediction models.