Multidimensional Health Phenotyping and Metabolic Syndrome Prediction in Chinese Community-Dwelling Elderly: An Integrated Data-Driven Approach
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Population aging in China poses significant healthcare challenges, with chronic diseases accounting for over 80% of disability-adjusted life years among older adults. Traditional single-disease assessments fail to capture the complex interactions between multiple health domains. Methods This retrospective cross-sectional study utilized health examination data from 10, 639 community-dwelling elderly residents (≥ 60 years) from Jiangling Sub-district, Suzhou City, China, between April 1 and August 31, 2025. Health indicators included anthropometric measurements, blood pressure, fasting blood glucose, lipid profiles, and self-reported chronic diseases. K-means clustering was used to identify distinct health patterns. A machine learning model was developed to predict metabolic syndrome using random forest, as a key application of the health phenotyping framework. Performance metrics included area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), calibration (Hosmer-Lemeshow test), and clinical utility (decision curve analysis). Results Five distinct health patterns were identified: (1) Severe Metabolic Disorder Type (5.6% ), (2) Healthy Lean Hypercholesterolemia Type (21.7% ), (3) Elderly Hypertension Type (21.3% ), (4) Middle-aged Obese Multi-risk Type (25.4% ), and (5) Relatively Healthy Type (26.0% ). Validation via principal component analysis showed clear separation, with PC1 (38.2% variance) driven by metabolic load and PC2 (18.5% ) by age and cholesterol. The metabolic syndrome prediction model demonstrated high discriminative ability (AUC = 0.964, 95% CI: 0.958–0.970), substantially outperforming a baseline model using age and sex alone (AUC = 0.712) and showing good calibration (Hosmer-Lemeshow P = 0.38). Triglycerides (Gini importance = 0.318), fasting blood glucose (0.199), and BMI (0.195) were the most important predictors. Decision curve analysis showed positive net benefit across threshold probabilities of 0.2–0.5. Conclusion This study established a comprehensive multidimensional health assessment framework for Chinese community-dwelling elderly, identifying five clinically meaningful health patterns. The high-performance metabolic syndrome prediction model has practical applications for preventive care, while the identified patterns provide a roadmap for stratified interventions in geriatric care. This integrated data-driven approach represents an important advancement in geriatric medicine, with the potential to transform community-based preventive care and resource allocation for older adults.