Causal Forests versus Inverse Probability of Treatment Weighting to adjust for Cluster-Level Confounding: A Parametric and Plasmode Simulation Study based on US Hosptial Electronic Health Record Data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Rapid innovation and new regulations increase the need for post-marketing surveillance of implantable devices. However, complex multi-level confounding related to patient-level and surgeon or hospital covariates hampers observational studies of risks and benefits. We conducted two simulation studies to compare the performance of Causal Forests (CF) vs Inverse Probability of Treatment Weighting (IPTW) to reduce confounding bias in the presence of strong surgeon impact on treatment allocation.

Methods

Two Monte Carlo simulation studies were carried out: 1) Parametric simulations with patients nested in clusters (ratio 10:1, 50:1, 100:1, 200:1, 500:1) and sample size n=10,000 were conducted with patient and cluster level confounders; 2) Plasmode simulations generated from a cohort of 9,981 patients admitted for pancreatectomy between 2015 to 2019 from the US PINC AT™ hospital research database. Different CF algorithms and IPTW were used to estimate binary treatment effects.

Results

CF provided more accurate estimates when the cluster-level confounding effect was strong (OR=2.5): relative bias 11.2% (11.77, 11.76) for CF compared with 19.9% (19.26, 20.54) for IPTW.

Conclusions

CF shows promise as a method for estimating treatment effects in scenarios where cluster-level confounding strongly impacts treatment allocation. More research is needed to guide its use.

Key messages

  • -

    Causal Forests using the double regression tree algorithm showed the least bias in scenarios with strong cluster-level confounding and small cluster size.

  • -

    Including the cluster-ID indicator in Causal Forests resulted in higher bias and empirical standard error in scenarios with fewer but larger size clusters.

  • -

    Causal Forests outperformed IPTW in scenarios with strong cluster-level confounding.

  • Plain language summary

    This study compared two methods for estimating treatment effects in clustered observational health data where the healthcare provider, such as a hospital or surgeon, influences patient outcomes and treatment allocation. Using simulated data, we evaluated a machine learning method called Causal Forest and compared it to a commonly used approach called Inverse Probability of Treatment Weighting (IPTW). We found that Causal Forest, particularly when using the double regression tree technique, performed well in many scenarios and gave less bias when provider-related factors strongly influenced the treatment allocation. While IPTW generally produces accurate results, it depends on strong assumptions that may not hold in real-world studies. Our findings suggest that Causal Forest is a promising approach for estimating treatment effects in clustered observational health data settings, particularly in surgical and medical device research where treatment decisions vary by provider.

    Article activity feed