Socioeconomic and Behavioral Drivers of Geographic Disparities in U.S. Cardiovascular Mortality: A Machine Learning Analysis

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Substantial geographic disparities in cardiovascular disease (CVD) mortality persist across the United States. The extent to which “place” reflects underlying socioeconomic and behavioral risk factors remains insufficiently explained. This study applies machine learning to quantify the determinants of these disparities.

Methods

A cross-sectional analysis linked county-level 2019–2020 age-adjusted CVD mortality rates from the CDC with health determinant metrics from the 2023 County Health Rankings dataset. The analytic sample included [N counties] with complete data. A Random Forest regressor modeled mortality outcomes, incorporating socioeconomic, healthcare access, and behavioral predictors. Model interpretation used SHAP to assess feature-level contributions.

Results

The model explained [R 2 value] of variance in CVD mortality. Socioeconomic factors, particularly median household income and poverty rates, were the most influential predictors, followed by county-level smoking prevalence. Geographic identifiers alone had limited explanatory value after accounting for socioeconomic and behavioral metrics.

Conclusions

Geographic disparities in CVD mortality are explained by underlying socioeconomic disadvantage and community health behaviors. Effective reduction of disparities requires public health interventions addressing poverty, education, and behavioral risk factors beyond clinical care.

What Is New?

Explanatory vs. Predictive Modeling: Previous research has largely focused on identifying geographic disparities in cardiovascular disease (CVD) mortality. This study goes further by not only predicting mortality but also explaining why disparities exist, quantifying the relative importance of socioeconomic, behavioral, and healthcare access determinants.

Advanced Interpretation

We apply SHAP (SHapley Additive exPlanations), an advanced interpretability framework in machine learning, to measure precisely the effect of each county-level characteristic on mortality, uncovering complex patterns and interactions.

Integrated Data Approach

By combining recent granular datasets on health outcomes, socioeconomic context, and behaviors, this study produces a multi-domain explanatory model of CVD mortality drivers at the national scale.

Clinical Implications

Findings show that clinical interventions alone are insufficient to eliminate disparities in CVD mortality, since the most powerful predictors are upstream social determinants of health.

This evidence supports the need for clinicians and health systems to partner in policies that address economic stability, educational access, and environments conducive to healthier behaviors. Strategic targeting of resources toward communities with high poverty and low educational attainment may yield more effective and equitable reductions in the national CVD burden compared to approaches focused only on clinical care.

Article activity feed