Towards Principled Modeling Of Coronary Artery Calcium Scores With Zero-Inflated Regression

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The Coronary Artery Calcium Score (CACS) is a widely-used measure of Coronary Artery Disease (CAD). The primary use of CACS is to identify subclinical CAD and estimate risk of future cardiovascular events such as acute myocardial infarction. The development of coronary atherosclerosis is well known to be accelerated in response to risk factors such as age, hypertension, hypercholeterolaemia and smoking. However, there is substantial variability in an individual’s susceptibility or resistance to CAD against these risk factors. Quantifiying the deviation from “expected” CACS provides an novel opportunity to inform multi-omic and similar unbiased discovery of new markers and mechanisms of CAD trained against CT imaging. Standard linear regression struggles to model CACS due to the high prevalence of zeros (which reflect the absence of measurable coronary plaque). Prior works have variously handled this by discarding measurements with CACS of zero or by binning the CACS into broad categorical groups. Such approaches discard meaningful data, motivating the need for a more principled approach to handling the data distribution. In this work, we explored zero-inflated regression as a possible approach to modeling CACS using a cohort of patients from the BioHEART-CT study, and devised metrics to validate performance. We identified zero-inflated negative binomial regression, zero-inflated gamma regression and zero-inflated lognormal regression as promising approaches for handling the distributional properties of CACS, where the best method to use can vary depending on the dataset considered. A key contribution of our work is to demonstrate how these models can also estimate the percentile of the observed CACS relative to the distribution that would be expected after controlling for the inputs, thereby avoiding the need for data-hungry binning-based approaches.

Article activity feed