A Deployable Explainable AI System for Cardiovascular Risk Stratification in Occupational Health: A Cross-Sectional Study

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

AbstractObjectiveThis study aimed to develop an interpretable machine learning system for CVD risk stratification and identification of high-risk individuals using cross-sectional occupational health data, and to integrate it into routine clinical workflows within an occupational health electronic health record (EHR) platform serving the Iranian workforce.MethodsWe analyzed comprehensive health screening data from 9,126 Iranian employees. Three feature selection approaches—LASSO regularization, Recursive Feature Elimination, and correlation-based methods—were compared to identify parsimonious predictor sets. Several machine learning classifiers (Logistic Regression, Random Forest, XGBoost, SVM) were validated using stratified train-validation-test splits. Model interpretability was enhanced through SHAP analysis. The optimal model was deployed as a web-based risk calculator integrated into physician dashboards.ResultsLASSO feature selection identified six clinically relevant variables (Age, Sex, Diabetes, HDL-C, Total Cholesterol, Family history) while maintaining strong predictive performance (AUC = 0.791). Logistic Regression with undersampling achieved a sensitivity of 84.0% and specificity of 74.1%, with an overall accuracy of 79.0%. SHAP analysis revealed age, diabetes, and HDL cholesterol as primary risk drivers, enabling patient-specific explanations.ConclusionThis study demonstrates the successful translation of machine learning research into operational clinical practice for CVD risk stratification. By prioritizing interpretability through sparse feature selection and transparent explanations, we developed a deployment-ready system that addresses key barriers to clinical machine learning adoption. Note that this system classifies cardiovascular risk status using cross-sectional data; prospective validation for incident CVD prediction is identified as essential future work. This represents the first ML-based CVD risk stratification system integrated within the Iranian occupational health infrastructure.

Article activity feed