Advancing cardiovascular disease risk prediction beyond conventional methods: a systematic review of multimodal machine learning models integrating traditional clinical factors and multi-omics data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Cardiovascular disease (CVD) is a leading global health burden. Traditional risk prediction models, though widely used, often overlook genetic predisposition and other complex biological factors, which significantly impacts CVD risk. The emergence of multi-omics technologies now enables a more comprehensive view of an individual’s risk, but integrating such high-dimensional data has been challenging and requires advanced computational approaches. Recent advances in machine learning methods now offer powerful tools to synthesize and integrate these high-throughput dataset, offering a promising approach to improve CVD risk stratification.
Objective
This systematic review assesses whether CVD risk prediction models incorporating omics data alongside clinical and other variables improve prediction compared to using clinical or omics data alone.
Methods
A systematic search was conducted across PubMed (MEDLINE), Embase, and Web of Science databases in June 2025 using keywords related to CVD, risk prediction, multi-omics data, and machine learning. Studies reporting on models comparing multi-omics data with traditional clinical factors for CVD risk prediction were included. Data on model performance, methodologies, and subgroup analyses were extracted and synthesized.
Results
Studies consistently showed that clinical models integrating multiple modalities, including approximately genomic (n=58), biomarkers (n=109), biological (n=125), and other data types significantly enhanced CVD risk prediction, with combined clinical+genomic models outperforming single-modality approaches. Other data types like lifestyle factors and proteomics further refined performance. Subgroup analyses revealed decreased predictor accuracy across diverse ancestries and age-specific performance differences. Importantly, genetically defined high-risk individuals often derived greater absolute benefits from targeted clinical interventions. Models effectively spanned from predicting risk in asymptomatic individuals for primary prevention to guiding prognosis in diseased patients for secondary prevention
Conclusion
CVD risk prediction models integrating genomic, clinical, and other variables offer superior accuracy and refined stratification. These advanced models hold immense potential for personalized interventions across diverse populations. Future research should prioritize real-world implementation and broad validation to translate these findings into routine clinical practice.