HybGANN: A Hybrid GAN-GA-ANN Framework for Predicting Diabetes from Imbalanced Medical Data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The digitization of medical data has enabled large-scale analysis. However, clinical datasets, such as those used for diabetes prediction, often have class imbalances, with disease cases significantly underrepresented. This imbalance poses a major challenge for traditional machine learning models, which tend to favor the majority classes. In addition, many high-performance models operate as black boxes, limiting their adoption in clinical practice due to their lack of interpretability. In this paper, we present HybGANN, a novel hybrid framework that integrates Conditional Tabular Generative Conditional Networks (CTGAN) for synthetic minority data generation, a unique hybrid genetic algorithm (GA) that co-evolves hyperparameters and internal weights from artificial neural networks (ANNs) in a Lamarckian fashion, and SHapley Additive Explanations (SHAP) for post-hoc model interpretability. In contrast to previous work, to the best of our knowledge, this is the first application of a Lamarckian GA for the optimization of node weights and hyperparameters in tabular medical data classification. HybGANN creates a semi-automated workflow that improves predictive performance while providing transparency and adaptability. Applied to a large-scale diabetes dataset, experiments have demonstrated that the HybGANN model outperforms a benchmark ANN network that also uses the same CTGAN pre-balanced dataset on all key classification metrics. The framework achieves a ROC-AUC value of 0.9184 and a PR-AUC value of 0.9268, demonstrating its effectiveness and potential as a reliable AI solution for clinical decision support in imbalanced medical fields.