SOD-FE: A Supervised Outlier Detection and Feature Engineering Approach for Student Dropout Prediction

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Student dropout in higher education creates academic and socioeconomic challenges for institutions and students. Effective early prediction models are essential to identify at-risk students and implement timely interventions. This paper proposes SOD-FE, a supervised machine learning approach that combines label-aware outlier detection with feature engineering to enhance dropout prediction. The approach integrates interquartile range (IQR) based outlier detection with mutual information and Pearson correlation to identify and mitigate the impact of outliers before constructing the final model. Then, a feature selection strategy is applied to refine the dataset. The approach is evaluated through experiments on two real-world datasets (Portugal and Slovakia) utilizing five classification algorithms, including Random Forest (RF) and Extreme Gradient Boosting (XGB). Performance increased greatly with the RF classifier, achieving F1 scores of 98.09% and 98.33% on two benchmark datasets, using 5-fold cross-validation. The proposed approach also incorporates explainable AI techniques (SHAP) to enhance model transparency and support data-driven educational policy. These findings show the significant potential of the SOD-FE method for improving student retention and early intervention systems in educational institutions.

Article activity feed