Enhancing Propensity Score Analysis with data Missing Not at Random: Introducing Dual-Forest Proximity Imputation
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Researchers using propensity score analysis (PSA) to estimate treatment effects using secondary data may have to handle data that is missing not at random (MNAR). Existing methods for PSA with MNAR data use logistic regression to model the missing data mechanisms, thus requiring manual specification of functional forms, and are difficult to implement with a large number of covariates. To overcome these limitations, this study proposes alternatives to existing methods by replacing logistic regression with a random forest. Also, it introduces the Dual-Forest Proximity imputation method, which leverages two types of proximity matrices of random forest techniques and incorporates missing pattern information in each matrix. Results from a Monte Carlo simulation show Dual-Forest Proximity imputation’s enhanced bias reduction with various types of MNAR mechanisms as compared to existing and alternative methods. A case study is also provided using data from the National Longitudinal Survey of Youth 1979 (NLSY79).