Robustness of Bayesian Random Forest in High-Dimensional Analysis with Missing Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The challenge of missing data in scientific research prompts researchers to decide between imputing incomplete data or discarding observations, where discarding can lead to information loss. Various methods exist, from simple deletion to sophisticated approaches like Multiple Imputation (MI). However, these methods often fall short with high-dimensional datasets. Multiple Imputation by Chained Equations (MICE) and Random Forest (RF) proximity imputation offer promising alternatives. Therefore, in this paper, we propose integrating MICE with Bayesian random forest (BRF) to enhance imputation accuracy and predictive power, particularly in high-dimensional analyses. Our approach combines MICE’s efficiency with BRF’s robustness, offering a comprehensive solution to missing data challenges. By way of example, we provide empirical evaluations to validate its effectiveness using synthetic data of various missing data scenarios. The results from the simulations showed that the combination of BRF and MICE offered a promising strategy for high-dimensional analysis in the presence of missing data.