Robustness of Bayesian Random Forest in High-Dimensional Analysis with Missing Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The challenge of missing data in scientific research prompts researchers to decide between imputing incomplete data or discarding observations, where discarding can lead to information loss. Various methods exist, from simple deletion to sophisticated approaches like Multiple Imputation (MI). However, these methods often fall short with high-dimensional datasets. Multiple Imputation by Chained Equations (MICE) and Random Forest (RF) proximity imputation offer promising alternatives. Therefore, in this paper, we propose integrating MICE with Bayesian random forest (BRF) to enhance imputation accuracy and predictive power, particularly in high-dimensional analyses. Our approach combines MICE’s efficiency with BRF’s robustness, offering a comprehensive solution to missing data challenges. By way of example, we provide empirical evaluations to validate its effectiveness using synthetic data of various missing data scenarios. The results from the simulations showed that the combination of BRF and MICE offered a promising strategy for high-dimensional analysis in the presence of missing data.

Article activity feed