Accurate machine learning-based CVD risk prediction in primary care may reduce the need for routine health care checks
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Cardiovascular risk prediction models, such as PCE, QRISK3, and SCORE2 are recommended tools to guide treatment initiation/intensification in primary care. In clinical practice, the absence of one or more required predictors is common, which precludes routine application of such models.
Methods: We developed a set of partial models predicting the 10-year risk of cardiovascular disease (CVD) and major CVD (additionally considering atrial fibrillation, heart failure, and peripheral arterial disease) using combinations of 14 predictors, allowing application in settings were only a subset of variables is available. The set of partial models was evaluated across five studies jointly comprising 105,550 participants.
Findings: We trained 4,096 unique models to predict 10-year major CVD risk, observing near identical performance evaluated against CVD and major CVD. The c-statistic ranged between: quartiles (Q1) 0.71 and Q3: 0.73 across the five studies. This was comparable to the performance of the PCE (Q1: 0.70, Q3: 0.74, 10 predictors) and SCORE2 (Q1: 0.71, Q3: 0.75, 8 predictors). Due to large number of required predictors (22/23 for men/women) the QRISK3 was evaluated in a single cohort: c-statistic 0.72 (95% CI 0.72; 0.73). Model performance remained adequate when focussing on the set of partial models using 2-4 predictors: c-statistic Q1: 0.70 and Q3: 0.71. Partial models demonstrated reasonable calibration across most studies, observing a limited risk underestimation in two cohorts. Partial models excluding blood pressure and lipids demonstrated similar performance to models incorporating these variables. The set of partial models has been made available through a python-based application programming interface.
Interpretation: We show that in the presence of partially missing data, clinically relevant predictions of the 10-years risk of major CVD can be obtained by using a subset of features, facilitating improved and more timely treatment decisions.
Funding: Dutch Research Council, British Heart Foundation, UK Research and Innovation.
Keywords: Missing data, machine learning, Cardiovascular disease, Prediction, Risk Score.