Effect of Features Burdened with Classical Error, Misclassifications, Berkson Error, and Group-Summary Assignments on Machine Learning Performance and Interpretability
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Measurement error is widespread in modern data pipelines, but its influence on predictive modeling and variable importance assessments in machine learning is often underestimated. Our study examines how input measurement error affects both prediction accuracy and interpretability across four common additive error types: classical error, misclassification of predictive features, Berkson error, caused by algorithmic or model-based assignment errors, and group-summary (aggregation) assignment errors. Using controlled Monte Carlo simulation experiments that cover regression and classification, we evaluate five popular machine learning models: generalized linear models, support vector machines, random forests, XGBoost, and multilayer perceptrons, across various error levels. To distinguish the effect of the error from optimization issues, hyperparameters are tuned separately for each dataset, and performance is measured using standard predictive metrics and permutation feature importance. Results show that measurement error generally reduces predictive performance and can significantly alter the ranking and estimated contribution of highly predictive features, with different degradation patterns depending on the error type and model. These findings clarify when performance declines might be mistaken for distributional shifts, among other things, and highlight the importance of diagnosing and correcting measurement error before drawing conclusions about model robustness, generalization, or variable relevance.