GHOSTS: Generation of synthetic hospital time series for clinical machine learning research

Rustam Zhumagambetov
Niklas Giesa
Sebastian D. Boie
Stefan Haufe

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Machine learning (ML) holds great promise to support, improve, and automatize clinical decision-making in hospitals. Data protection regulations, however, hinder abundantly available routine data from being shared across sites for model training. Generative models can overcome this limitation by learning to synthesize hospital data from a target population while ensuring data privacy. Clinical time series acquired during intensive care are, however, difficult to model using established techniques, especially due to uneven sampling intervals. Here we introduce GHOSTS (Generator of Hospital Time Series), a novel generator of synthetic patient trajectories that is capable of generating heterogeneous hospital data including realistic time series with uneven sampling intervals. We further design a suite of novel benchmarks, GHOSTS-Bench. We train GHOSTS on a large cohort of patient data from the MIMIC-IV critical care dataset and measure the quality of the generated data in terms of how faithfully the distributions of individual features in the real data are approximated, how well spatio-temporal dynamics in the multivariate time series are preserved, and how well ML models trained on the generated data can solve a clinical prediction task on the real data. We observe that GHOSTS outperforms a state-of-the-art approach, DoppelGANger, with respect to these criteria.

Version published to 10.1101/2024.10.29.24316332v1 on medRxiv
Nov 4, 2024

AI and Machine Learning in Healthcare: Advancing Diagnostics, Personalized Treatment, and Predictive Modeling

This article has 4 authors:
1. Pulock Deb Roy
2. Umashanker Gupta Chowdhory
3. Angshu Dey
4. Dolour Husain Sagor
This article has no evaluationsLatest version Apr 1, 2025
Generating Medical Diagnostic Scenarios with LLM-Based Reinforcement Learning Feedback: Dataset Release and Methodology

This article has 1 author:
1. Aniruth Ananthanarayanan
This article has no evaluationsLatest version Mar 21, 2025
TOO-BERT: A Trajectory Order Objective BERT for self-supervised representation learning of temporal healthcare data

This article has 5 authors:
1. Ali Amirahmadi
2. Farzaneh Etminani
3. Jonas Bjork
4. Olle Melander
5. Mattias Ohlsson
This article has no evaluationsLatest version Mar 31, 2025

Listed in

Abstract

Article activity feed

Related articles

AI and Machine Learning in Healthcare: Advancing Diagnostics, Personalized Treatment, and Predictive Modeling

Generating Medical Diagnostic Scenarios with LLM-Based Reinforcement Learning Feedback: Dataset Release and Methodology

TOO-BERT: A Trajectory Order Objective BERT for self-supervised representation learning of temporal healthcare data