Comparison of Imputation Strategies for Incomplete Electronic Health Data

Shuo Zhang
Zhilong Zhang
Yuxi Zhou
Shenda Hong
Huixin Liu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Missing data is a persistent challenge in electronic health records (EHRs), often compromising data integrity and limiting the effectiveness of predictive models in healthcare. This study systematically evaluates five widely used imputation strategies—GAIN, MICE, Median, MissForest, and MIWAE—across three real-world clinical datasets under varying missingness mechanisms (MCAR, MAR, and MNAR) and missingness rates (10%–90%). We assessed imputation quality using multiple statistical measures and examined the relationship between imputation accuracy and downstream classification performance. Our results show that MICE and MissForest consistently outperform other methods across most scenarios, while deep learning-based approaches such as GAIN exhibit high instability under MAR and MNAR, particularly at higher missingness levels. Furthermore, imputation quality does not always align with classification performance, underscoring the need to consider task-specific goals when selecting imputation strategies. We also provide a practical framework summarizing method recommendations based on missingness type and rate, aiming to support robust data preprocessing decisions in clinical AI applications.

Version published to 10.1101/2025.08.01.25332573 on medRxiv
Aug 5, 2025

Comparing Missing Data Imputation Methods for Patient-Reported Outcomes in Esophageal Cancer Research

This article has 6 authors:
1. Yong Jin Kweon
2. Yousif Salman
3. Shayan Dhillon
4. Mehrnoush Dehghani
5. Emad A. Mohammed
6. R. Trafford Crump
This article has no evaluationsLatest version Sep 12, 2025
Probing missing data in population-based longitudinal studies: A tutorial and application using R

This article has 3 authors:
1. Hedyeh Ahmadi
2. Gagandeep Singh
3. Megan Herting
This article has no evaluationsLatest version Sep 27, 2025
Missing Values Are Valuable: Shifting Focus from Amount to Form of Missing Data

This article has 3 authors:
1. Ehsan Zangene
2. Veit Schwammle
3. Mohieddin JAFARI
This article has no evaluationsLatest version Aug 27, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Comparing Missing Data Imputation Methods for Patient-Reported Outcomes in Esophageal Cancer Research

Probing missing data in population-based longitudinal studies: A tutorial and application using R

Missing Values Are Valuable: Shifting Focus from Amount to Form of Missing Data