Advancing Semi-Continuous Treatment Effect Estimation: Machine Learning and Parametric Approaches in Propensity Score Analysis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
AbstractPropensity score analysis (PSA) is a widely used method to address selection bias in observational studies, but its application to semi-continuous treatments remains limited. This study explores two generalized propensity score (GPS) definitions and compares parametric methods with Gradient Boosting Machines (GBM) for estimating average treatment effects (ATE) in semi-continuous treatments. The simulation findings highlight that the Zero-Inflated Negative Binomial (ZINB) model paired with the conditional mean GPS achieves the best covariate balance and reliable ATE estimates. While GBM excelled under specific conditions for the Hurdle model, it was less effective than the ZINB model. The Hurdle Negative Binomial (HNB) model consistently failed to yield unbiased ATE estimates. A practical example using Math Nation data, where the treatment is the number of recommended videos watched and the outcome is quiz scores, demonstrates PSA application. Translational AbstractPropensity score analysis (PSA) is a powerful tool for reducing selection bias in observational studies, but its application to semi-continuous treatments remains underexplored. This challenge is particularly relevant in online educational settings, where semi-continuous data, such as student engagement metrics, are increasingly common. This study addresses this gap by evaluating methods to estimate treatment effects for semi-continuous data, comparing statistical models and machine learning approaches.Through comprehensive simulations, we found that the Zero-Inflated model combined with a conditional mean definition provides the best balance and reliable treatment effect estimates. These findings offer practical guidance for researchers and practitioners dealing with semi-continuous data in education and beyond, helping to improve the accuracy of causal inferences in complex observational studies. A practical example using Math Nation data further illustrates the application of these methods.