Developing Optimized Machine Learning Models For Timely Prediction and Prevention of College Dropout

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

College graduates earn substantially more and are more likely to be employed. Consequently, it is critically important to understand the predictors of college dropout so that students and administrators can make a difference in college graduation outcomes. Previous studies remain limited in the scope of evaluating machine learning models for dropout prediction. Leveraging a dataset of 4,424 students that includes graduation outcome, demographic, socioeconomic and course data, and macroeconomic data, the objective of this paper is to identify the optimum machine learning model for predicting college dropout as a classification problem. We (a) perform extensive exploratory data analysis, (b) perform feature optimization (c) identify the best performing machine learning model across seven models evaluated, (d) study different testing-to-training ratios, (e) perform a comprehensive model evaluation, and (f) compare a multi-class classification approach to a binary classification one. The models were fine-tuned leveraging a grid search optimization algorithm and validated with k-fold cross-validation. Optimizing the hyperparameters, the grid search optimized random forest model performed the best in predicting college dropout with 0.85 accuracy, 0.72 sensitivity, 0.92 specificity, 0.82 precision, and 0.89 AUC-ROC. Furthermore, the optimized random forest model suggested the key predictors of dropout, in order of importance to be: number of curricular units in the second semester, number of curricular units in the first semester and whether the tuition and fees are up-to-date. The findings underscore the value of using machine learning for timely dropout risk prediction, enabling targeted resource allocation to mitigate risk and support successful graduation outcomes.

Article activity feed