Prediction of Survey Item Nonresponse Through Supervised Machine Learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study investigates response patterns to political questions in the European Social Survey and identifies latent classes based on item nonresponse using Latent Class Analysis. Three distinct latent classes were identified: a politically engaged group with low missing data, a moderately engaged group with moderate missing data, and a politically disengaged group with high missing data. Sociodemographic variables, including age, gender, education level, income, employment status, marital status, and religiosity, were used as predictors to develop machine learning models, such as Logistic Regression, Lasso Regression, Decision Tree, Random Forest, XGBoost, and K-Nearest Neighbors, to predict latent class membership. Random Forest and XGBoost models showed superior accuracy, precision, recall, and F-1 score. Multiple imputations accounted for errors in predicted class membership, with consistent patterns observed across the imputed datasets. However, the study’s limitations, including reliance on self-reported data and a limited set of predictors, suggest avenues for future research to explore additional variables and alternative imputation methods.