A MULTI-DIMENSIONAL MACHINE LEARNING APPROACH FOR CARDIOVASCULAR DISEASE PREDICTION IN THE UK BIOBANK STUDY
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Cardiovascular diseases (CVD) are complex disorders involving the impaired function of blood vessels or the heart. Several risk factors contribute to CVD, including diet, body composition and physical activity. We developed an interpretable machine learning model integrating nongenetic factors from 502,134 UK Biobank participants, of whom 53,378 experienced a CVD event over eighteen years of follow-up. Age, cystatin C, and self-perceived health status emerged as the most relevant predictors of CVD. By combining these variables into a multidimensional model, we achieved excellent predictive performance, with an area under the ROC curve of 0.99 and both sensitivity and specificity above 0.97 in the validation cohort. Although cardiovascular events arise from multiple factors, our results show that data integration and machine learning can accurately predict individual risk using simple measurements. This approach could also support the prediction of other cardiovascular outcomes and aid personalized risk assessment and preventive care.