A Position-Aware Multi-Head Self-Attention Model for Student Performance Prediction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Student performance prediction is a central problem in educational data mining and learning analytics, aiming to build generalizable and interpretable models from students’ historical learning-process data to support personalized instruction and early academic warning. However, educational data are often high-dimensional and strongly temporal, with complex feature interactions, making it challenging for conventional regression approaches to jointly capture temporal regularities and nonlinear dependencies. To address this issue, we propose PAM-MLP , a student performance prediction model that integrates Position-Aware Attention (PAA) and Multi-head Self-Attention (MSA) . The PAA module incorporates learnable positional encodings to capture stage-wise and periodic patterns in learning trajectories, and adopts a gated scaled dot-product attention to dynamically adjust the importance of different time steps. Meanwhile, the MSA module models feature dependencies from multiple perspectives, enhanced by adaptive head weighting and a non-uniform attention distribution strategy to better characterize heterogeneous learning behaviors. On top of attention-based representations, a multi-layer perceptron is employed to capture higher-order nonlinear interactions and improve regression fitting. Experimental results show that PAM-MLP consistently outperforms competitive regression baselines, achieving improvements of 9% , 11% , and 10% in MAE , RMSE , and R² , respectively, demonstrating its effectiveness and robustness for student performance prediction in educational settings.