Enhancing Propensity Score Analysis with data Missing Not at Random: Introducing Dual-Forest Proximity Imputation

Yongseok Lee
Walter Leite

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Researchers using propensity score analysis (PSA) to estimate treatment effects using secondary data may have to handle data that is missing not at random (MNAR). Existing methods for PSA with MNAR data use logistic regression to model the missing data mechanisms, thus requiring manual specification of functional forms, and are difficult to implement with a large number of covariates. To overcome these limitations, this study proposes alternatives to existing methods by replacing logistic regression with a random forest. Also, it introduces the Dual-Forest Proximity imputation method, which leverages two types of proximity matrices of random forest techniques and incorporates missing pattern information in each matrix. Results from a Monte Carlo simulation show Dual-Forest Proximity imputation’s enhanced bias reduction with various types of MNAR mechanisms as compared to existing and alternative methods. A case study is also provided using data from the National Longitudinal Survey of Youth 1979 (NLSY79).

Version published to 10.31219/osf.io/ex8ad_v1 on OSF Preprints
Jul 18, 2025

Evaluating Imputation Methods for Handling Missing Data in Complex Survey Designs: Evidence from the India DHS 2017–18

This article has 6 authors:
1. Mahfuzer Rohman
2. Md Sabbir Hossain
3. Md Fakrul Islam
4. Prosenjit Basak Arka
5. Md Rafi Hasan
6. Md Jamal Uddin
This article has no evaluationsLatest version Jan 23, 2026
Heterogeneous Treatment Effect Estimation with Instrumental Variable Methods

This article has 3 authors:
1. Amir Aamodt Kazemi
2. Joseph Sexton
3. Inge Christoffer Olsen
This article has no evaluationsLatest version Jan 13, 2026
Missing Data in OHCA Registries: How Multiple Imputation Methods Affect Research Conclusions—Paper II

This article has 4 authors:
1. Stella Jinran Zhan
2. Seyed Ehsan Saffari
3. Marcus Eng Hock Ong
4. Fahad Javaid Siddiqui
This article has no evaluationsLatest version Jan 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Evaluating Imputation Methods for Handling Missing Data in Complex Survey Designs: Evidence from the India DHS 2017–18

Heterogeneous Treatment Effect Estimation with Instrumental Variable Methods

Missing Data in OHCA Registries: How Multiple Imputation Methods Affect Research Conclusions—Paper II