An Integrative Approach for Identification of Pre-Diabetes Clusters and Glycaemic Prediction using Ensemble Machine Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Introduction Prediabetes, a precursor to type 2 diabetes, is metabolically and phenotypically heterogeneous. While diagnostic criteria are established, sub-classification remains limited. This study integrates biochemical, physiological, lifestyle, and Ayurvedic Prakriti parameters to identify distinct prediabetes subgroups for personalized interventions. Methods A retrospective analysis was conducted using anonymized data from 104 adults (aged 18–60) drawn from eight clinical studies (2016–2022). Variables included anthropometrics, biochemical markers, Indian Diabetes Risk Scores (IDRS), and Prakriti. t-SNE and K-means clustering revealed latent patterns, identifying four clusters. Random Forest and Gradient Boosting models were used for blood glucose prediction and feature importance analysis. ANOVA assessed statistical significance. Results Four distinct clusters were identified, each with unique Prakriti, body composition, and metabolic profiles. Clusters 0 and 1, predominantly Kapha-dominant, exhibited higher adiposity, elevated fasting glucose, total cholesterol, and LDL levels. Cluster 2, Vata-dominant, showed higher skeletal muscle mass and lower fasting glucose, while Cluster 3, Pitta-dominant. Vata Prakriti emerged as the most influential predictor, followed by age, family history, and waist circumference. Conclusions This study highlights the potential of integrating Prakriti with modern diagnostics and machine learning for prediabetes sub-classification into different phenotypes. The findings underscore the need for personalized management strategies and position ensemble models as valuable tools for early risk stratification. Future research should validate these clusters in larger cohorts and assess their clinical utility for targeted interventions.