From Prediction to Action: A Calibrated and Interpretable Machine Learning Framework for Personalized Student Retention
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Student attrition in higher education represents a critical challenge, impacting institutional sustainability and student success. While machine learning models have demonstrated increasing accuracy in predicting at-risk students, a significant gap persists between generating predictions and implementing effective, personalized interventions. This study introduces a comprehensive, educator-centric framework designed to bridge this prediction-to-action gap. The framework integrates a high-performance stacking ensemble model—combining Random Forest, XGBoost, and Logistic Regression—with isotonic calibration to ensure that predictive outputs are not only accurate but also statistically reliable for decision-making. Trained and validated on a dataset of 29,569 student records, the model achieves strong predictive performance (F1-score = 0.712, AUC-ROC = 0.922). More importantly, the calibrated risk probabilities are mapped to a three-tiered intervention system, translating quantitative risk into qualitative, pedagogically-informed action plans. Local and global model explanations, generated via SHAP (SHapley Additive exPlanations), guide the personalization of support within each tier. By providing a transparent, reliable, and actionable pipeline, this framework empowers institutions to transition from reactive measures to proactive, data-driven student support, optimizing resource allocation and fostering equitable educational outcomes. The complete code and dataset are made publicly available to ensure reproducibility and encourage further research.