Beyond the Hype: A Simulation Study Evaluating the Predictive Performance of Machine Learning Models in Psychology
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Although Machine Learning (ML) methods are gaining popularity in psychological research, the debate about their usefulness ranges from hype to disillusionment. The discrepancy between the hopes placed in ML methods and the empirical reality is often attributed to the quality of psychological datasets, which tend to be small and subject to imprecise measurement. In this simulation study, we examined the data requirements necessary for ML methods to perform well. We compared the performance of Elastic Net Regressions with and without prespecified interactions, Random Forests and Gradient Boosting Machines for different data-generating processes (including either interaction, stepwise, or piecewise linear effects) and under various conditions: (a) sample size, (b) number of irrelevant predictors, (c) predictor reliability, (d) effect size, and (e) nature of the data-generating model (i.e., linear vs. nonlinear effects). We investigated whether the models achieved the highest level of predictive performance attainable under the given simulated conditions. There were two main takeaways of our results: First, the maximum possible predictive performance was only achieved under optimal simulation conditions (N = 1,000, perfectly reliable predictors, predominantly linear effects, and an exceptionally large effect size of R² = .80), which are arguably rarely met in psychological research. Second, each ML model outperformed the others under certain conditions, but none was consistently superior or entirely robust to suboptimal data characteristics. We stress that data quality fundamentally limits predictive performance and discuss the interpretation of comparisons between flexible ML models and simpler (regularized linear) baselines in psychological research.