Mission imputable: Effects of missing data processing on infectious disease detection and prognosis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Missing data in medical datasets poses significant challenges for developing effective AI/ML pipelines. Inaccurate imputation can lead to biased results, reduced model performance, and compromised clinical insights. Understanding how different imputation methods affect AI/ML model performance is crucial for ensuring accurate clinical findings.

Objective

This study systematically investigates the effects of different imputation methods on AI/ML model performance and the clinical implications of these methods.

Methods

We investigate the impact of four different missing data strategies on the performance of common classification algorithms for analyzing medical data. The performance was evaluated based on sensitivity and specificity metrics for the tasks of predicting COVID-19 diagnosis and patient deterioration. We also perform feature analysis to understand the clinical implications the choice of imputation method has.

Results

Our findings show that the choice of imputation method significantly affects the performance of AI/ML techniques and the clinical conclusions drawn from the data. The optimal handling of missing values depends on (i) the composition of the features with missing values, (ii) the rate of missing values, and (iii) the pattern of the missing features. Using COVID-19 diagnosis and patient deterioration as representative examples of clinical tasks, our results indicate that MICE imputation yields the best overall performance, resulting in a 26% improvement in accuracy compared to baseline methods. Specifically, for predicting COVID-19 diagnosis, we achieved a sensitivity of 81% and specificity of 98%, while for patient deterioration, the sensitivity was 65% and specificity was 99%.

Conclusion

This study demonstrates the critical impact of missing data imputation on AI/ML model performance and the clinical insights derived from these models. Our findings underscore the importance of selecting appropriate imputation techniques tailored to the specific characteristics of medical data to ensure accurate and reliable AI/ML predictions.

Article activity feed