Machine Learning for Propensity Score Estimation: A Systematic Review and Reporting Guidelines

Walter Leite
Huibin Zhang
zachary collier
Kamal Chawla
l.kong@ufl.edu
Yongseok Lee
Jia Quan
Olushola Soyoye

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Machine learning has become a common approach for estimating propensity scores for quasi-experimental research using matching, weighting, or stratification on the propensity score. This systematic review examined machine learning applications for propensity score estimation across different fields, such as health, education, social sciences, and business over 40 years. The results show that the gradient boosting machine (GBM) is the most frequently used method, followed by random forest. Classification and regression trees (CART), neural networks, and the super learner were also used in more than five percent of studies. The most frequently used packages to estimate propensity scores were twang, gbm and randomforest in the R statistical software. The review identified many hyperparameter configurations used for machine learning methods. However, it also shows that hyperparameters are frequently under-reported, as well as critical steps of the propensity score analysis, such as the covariate balance evaluation. A set of guidelines for reporting the use of machine learning for propensity score estimation is provided.

Version published to 10.31219/osf.io/gmrk7 on OSF Preprints
Oct 9, 2024

Machine Learning Models in Classifying, Predicting and Managing COVID-19 Severity

This article has 10 authors:
1. Larysa Sydorchuk
2. Maksym Sokolenko
3. Miroslav Škoda
4. Denys Nevinskyi
5. Yaroslav Vyklyuk
6. Ruslan Sydorchuk
7. Alina Sokolenko
8. Ludmila Sokolenko
9. Andrii Sydorchuk
10. Oleksandr Sokolenko
This article has no evaluationsLatest version Jan 27, 2026
Machine Learning Models in Classifying, Predicting and Managing COVID-19 Severity

This article has 10 authors:
1. Larysa Sydorchuk
2. Maksym Sokolenko
3. Miroslav Škoda
4. Denys Nevinskyi
5. Yaroslav Vyklyuk
6. Ruslan Sydorchuk
7. Alina Sokolenko
8. Ludmila Sokolenko
9. Andrii Sydorchuk
10. Oleksandr Sokolenko
This article has no evaluationsLatest version Jan 27, 2026
Benchmarking Ensemble Machine Learning Algorithms for the Early Prediction of Stroke in Imbalanced Clinical Cohorts: A Comparative Analysis and Decision Curve Assessment

This article has 2 authors:
1. Ibrahim Ibrahim Shuaibu
2. Yousaf Hussain
This article has no evaluationsLatest version Jan 22, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Machine Learning Models in Classifying, Predicting and Managing COVID-19 Severity

Machine Learning Models in Classifying, Predicting and Managing COVID-19 Severity

Benchmarking Ensemble Machine Learning Algorithms for the Early Prediction of Stroke in Imbalanced Clinical Cohorts: A Comparative Analysis and Decision Curve Assessment