Accurate machine learning-based CVD risk prediction in primary care may reduce the need for routine health care checks

Katarzyna Dziopa
Sophie Eastwood
Daniel Bos
Maryam Kavousi
Maarten J.G. Leening
Joline W J Beulens
Peter P Harms
Nishi Chaturvedi
Folkert W Asselbergs
Amand Floriaan Schmidt

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Cardiovascular risk prediction models, such as PCE, QRISK3, and SCORE2 are recommended tools to guide treatment initiation/intensification in primary care. In clinical practice, the absence of one or more required predictors is common, which precludes routine application of such models.

Methods: We developed a set of partial models predicting the 10-year risk of cardiovascular disease (CVD) and major CVD (additionally considering atrial fibrillation, heart failure, and peripheral arterial disease) using combinations of 14 predictors, allowing application in settings were only a subset of variables is available. The set of partial models was evaluated across five studies jointly comprising 105,550 participants.

Findings: We trained 4,096 unique models to predict 10-year major CVD risk, observing near identical performance evaluated against CVD and major CVD. The c-statistic ranged between: quartiles (Q1) 0.71 and Q3: 0.73 across the five studies. This was comparable to the performance of the PCE (Q1: 0.70, Q3: 0.74, 10 predictors) and SCORE2 (Q1: 0.71, Q3: 0.75, 8 predictors). Due to large number of required predictors (22/23 for men/women) the QRISK3 was evaluated in a single cohort: c-statistic 0.72 (95% CI 0.72; 0.73). Model performance remained adequate when focussing on the set of partial models using 2-4 predictors: c-statistic Q1: 0.70 and Q3: 0.71. Partial models demonstrated reasonable calibration across most studies, observing a limited risk underestimation in two cohorts. Partial models excluding blood pressure and lipids demonstrated similar performance to models incorporating these variables. The set of partial models has been made available through a python-based application programming interface.

Interpretation: We show that in the presence of partially missing data, clinically relevant predictions of the 10-years risk of major CVD can be obtained by using a subset of features, facilitating improved and more timely treatment decisions.

Funding: Dutch Research Council, British Heart Foundation, UK Research and Innovation.

Keywords: Missing data, machine learning, Cardiovascular disease, Prediction, Risk Score.

Version published to 10.1101/2025.06.09.25329273 on medRxiv
Jun 12, 2025

Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease

This article has 16 authors:
1. Hao Liu
2. Meijun Liu
3. Xinmiao Guan
4. Feng Cao
5. Changhao Liang
6. Zhongwen Qi
7. Jiaqi Hui
8. Junnan Zhao
9. Jingli Xing
10. Jianguo Zhou
11. Dong Zhang
12. Lei Liu
13. Xiaoliang Hao
14. Minjing Luo
15. Fengqin Xu
16. Yutong Fei
This article has no evaluationsLatest version Jan 12, 2026
Machine Learning Insights for Cardiovascular Risk Prediction in Diabetic Patients: Emphasis on Renal and Cardiac Markers Using Random Forests

This article has 1 author:
1. Julian Borges
This article has no evaluationsLatest version Jan 21, 2026
Machine Learning-Based Risk Prediction Model for Fatigue in Chronic Heart Failure Patients

This article has 9 authors:
1. Min Zhou
2. Jingran Yang
3. Yimei Zhang
4. Yu Wang
5. Ruijie Yanglan
6. Qinlan Li
7. Yangjuan Bai
8. Wei Wei
9. Fang Ma
This article has no evaluationsLatest version Jan 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease

Machine Learning Insights for Cardiovascular Risk Prediction in Diabetic Patients: Emphasis on Renal and Cardiac Markers Using Random Forests

Machine Learning-Based Risk Prediction Model for Fatigue in Chronic Heart Failure Patients