GHOSTS: Generation of synthetic hospital time series for clinical machine learning research

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Machine learning (ML) holds great promise to support, improve, and automatize clinical decision-making in hospitals. Data protection regulations, however, hinder abundantly available routine data from being shared across sites for model training. Generative models can overcome this limitation by learning to synthesize hospital data from a target population while ensuring data privacy. Clinical time series acquired during intensive care are, however, difficult to model using established techniques, especially due to uneven sampling intervals. Here we introduce GHOSTS (Generator of Hospital Time Series), a novel generator of synthetic patient trajectories that is capable of generating heterogeneous hospital data including realistic time series with uneven sampling intervals. We further design a suite of novel benchmarks, GHOSTS-Bench. We train GHOSTS on a large cohort of patient data from the MIMIC-IV critical care dataset and measure the quality of the generated data in terms of how faithfully the distributions of individual features in the real data are approximated, how well spatio-temporal dynamics in the multivariate time series are preserved, and how well ML models trained on the generated data can solve a clinical prediction task on the real data. We observe that GHOSTS outperforms a state-of-the-art approach, DoppelGANger, with respect to these criteria.

Article activity feed