Predicting Student Outcomes and Withdrawal Timing: A Robust, Interpretable Machine Learning Approach across Heterogeneous Educational Contexts

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Digital learning environments generate detailed traces of student participation, yet predictive learning analytics often remain focused on static endpoint classification and rarely examine whether findings generalize across heterogeneous educational contexts. Using the Open University Learning Analytics Dataset (OULAD), this study develops and evaluates an interpretable dual-task framework to predict both four-class outcomes (Withdrawn, Fail, Pass, Distinction) and time-to-withdrawal among students who disengage. The primary novelty of the study lies in integrating multiclass outcome prediction, continuous modeling of withdrawal timing, and subgroup generalization analyses across course modules and UK regions within a single unified evaluation framework. Through this integrated design, the study moves beyond the global or binary prediction settings that dominate much of the prior literature. Engagement was operationalized through interpretable indicators of participation consistency, interaction breadth, and temporal redistribution of activity, alongside demographic and course-context variables. Models were trained with strict train/validation/test separation, repeated resampling, and explicit leakage-prevention procedures. For final outcome prediction, Random Forest achieved strong and balanced performance (macro-AUC OvR = 0.925), with reliable detection of withdrawal. For withdrawal timing, XGBoost explained substantial variance (test R² = 0.747; RMSE = 42.99 days). Performance also remained consistently high across diverse modules and regions, indicating that the identified engagement patterns are robust across instructional contexts. Permutation importance and partial dependence analyses indicated that sustained participation and broad interaction were most strongly associated with favorable outcomes, whereas withdrawal timing was primarily related to registration timing, late-phase activity, and active-day regularity. These findings extend predictive learning analytics toward temporally resolved and context-aware modeling, supporting targeted, interpretable, and time-sensitive intervention strategies in digital higher education.

Article activity feed