Bridging Traditional Statistics and Machine Learning Approaches in Psychology: Navigating Small Samples, Measurement Error, Non-independent Observations and Missing Data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In recent years, machine learning has propagated into different aspects of psychologicalresearch, and supervised machine learning methods have increasingly been used as a toolfor predicting human behavior or psychological characteristics when there is a largenumber of possible predictors. However, researchers often face practical challenges whenusing machine learning methods on psychological data. In this article, we identify anddiscuss four key challenges that often arise when applying machine learning to datacollected for psychological research. The four challenge areas cover (i) limited sample size,(ii) measurement error, (iii) non-independent data, and (iv) missing data. Such challengesare extensively discussed in the “traditional” statistical literature but are often notexplicitly addressed, or at least not to the same extent, in the applied machine learningcommunity. We present how each of these challenges is dealt with first from a traditionalstatistics perspective and then from a machine learning perspective, and discuss thestrengths and weaknesses of these solutions by comparing the approaches. We argue thatthe boundary between traditional statistics and machine learning is fluid, and emphasizethe need for cross-disciplinary collaboration to better tackle these core challenges andimprove replicability.