Attention-Driven Ensemble Learning: Enhancing Diabetes Prediction in Data-Scarce Environments
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper presents a semi-supervised adaptive ensemble model designed to improve predictive performance in scenarios with limited labeled data. By integrating RandomForest, XGBoost, and an Attention-based Multi-Layer Perceptron (AttentionMLP), the model leverages both labeled and unlabeled data, using only 50% of the available labeled data alongside unlabeled data through an iterative pseudo-labeling process and an adaptive weighting scheme. The AttentionMLP incorporates a sample-wise attention mechanism to prioritize informative samples, enhancing robustness. The model's performance is evaluated on three diabetes classification datasets: BRFSS2015, Pima Indian, and Diabetes Diagnosis. Results demonstrate that the proposed model achieves superior Area Under the Curve (AUC), F1 Score, and Accuracy on the Pima Indian and Diabetes Diagnosis datasets, with AUC improvements of up to 12.4% over baseline models such as LSTM, GRU, and BiLSTM. On the BRFSS2015 dataset, the model performs competitively, highlighting its effectiveness across diverse data distributions. The findings suggest that the ensemble's combination of traditional and deep learning methods, augmented by attention and pseudo-labeling with limited labeled data, offers a powerful approach for classification tasks in data-scarce environments.