From Prediction to Action: A Calibrated and Interpretable Machine Learning Framework for Personalized Student Retention

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Student attrition in higher education represents a critical challenge, impacting institutional sustainability and student success. While machine learning models have demonstrated increasing accuracy in predicting at-risk students, a significant gap persists between generating predictions and implementing effective, personalized interventions. This study introduces a comprehensive, educator-centric framework designed to bridge this prediction-to-action gap. The framework integrates a high-performance stacking ensemble model—combining Random Forest, XGBoost, and Logistic Regression—with isotonic calibration to ensure that predictive outputs are not only accurate but also statistically reliable for decision-making. Trained and validated on a dataset of 29,569 student records, the model achieves strong predictive performance (F1-score = 0.712, AUC-ROC = 0.922). More importantly, the calibrated risk probabilities are mapped to a three-tiered intervention system, translating quantitative risk into qualitative, pedagogically-informed action plans. Local and global model explanations, generated via SHAP (SHapley Additive exPlanations), guide the personalization of support within each tier. By providing a transparent, reliable, and actionable pipeline, this framework empowers institutions to transition from reactive measures to proactive, data-driven student support, optimizing resource allocation and fostering equitable educational outcomes. The complete code and dataset are made publicly available to ensure reproducibility and encourage further research.

Article activity feed