Exposomics and Cardiovascular Diseases: A Scoping Review of Machine Learning Approaches
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The principal objective served by this article is to identify key literature and provide an overview of the breadth of research in the field of machine learning applications on exposomics data with a focus on cardiovascular diseases. Secondarily, this study aimed at identifying common limitations and meaningful directives to be addressed in the future.
Most of the identified literature focuses on Disease Understanding, Prevention, and Management and the remaining on healthcare resource planning and management. Linear, non-linear, and ensemble machine learning methods have been applied, with tree-based methods generally being the most popular, and Random Forests specifically reported as being the best performer most of the time. Non-traditional CVD risk factors spanning behavioural/ lifestyle, socio-economic and a wide range of environmental factors have been investigated in the identified literature. According to our findings, the most reported category is the environmental risk factors which are often considered alone rather than in conjunction with the others.
Applications of machine Learning have been a substantial accelerating factor on the field, enabling the analysis of high dimensional data, improving accuracy, investigating novel risk factors and expanding our knowledge on the impact of exposome on cardiovascular diseases. However, several challenges persist, including data heterogeneity and the lack of standardized protocols, which introduces bias and hinders reproducibility and comparability across studies, as well as concerns about the validity of inference when applying machine learning methods to identify associations between exposure and health outcomes. Addressing these issues requires collaborative and interdisciplinary effort to integrate standardized exposome data frameworks akin to those of other –omics fields, apply causal inference methods to validate findings, and further expand the use of Explainable Artificial Intelligence to build insights and enable comparability and understanding.