Comparing Missing Data Imputation Methods for Patient-Reported Outcomes in Esophageal Cancer Research
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Missing data is common in patient-reported outcomes (PRO) research, particularly in oncology settings. We evaluated seven methods for handling missing data in esophageal cancer quality of life measurements, namely: Multiple Imputation by Chained Equations (MICE), Variational Autoencoder (VAE), Denoising Autoen-coder (DAE), Bayesian Principal Component Analysis (BPCA), a deep autoencoder method with patient-specific embeddings and temporal pattern modeling, SoftImpute (a matrix completion method using iterative soft-thresholded singular value decomposition), and K-Nearest Neighbors (KNN). Using data from McGill University’s Esophageal and Gastric Data- and Bio-Bank, we compared these methods across 44 Functional Assessment of Cancer Therapy-Esophageal (FACT-E) quality-of-life variables on execution time, distribution preservation, correlation maintenance, imputation accuracy, and clinical classification performance. Our comprehensive validation framework provides evidence-based recommendations for selecting appropriate imputation methods for esophageal cancer PRO research, which may improve the validity and reliability of research findings in this domain.