A Supervised Learning Framework for Stroke Hospitalization Factors Selection Using the Lasso-MIDAS Model
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Stroke, as an acute cerebrovascular disease with significant public health implications, is influenced by a complex interplay of meteorological conditions, air quality, and socioeconomic factors. However, the inherent challenges of mixed-frequency data from diverse sources and high-dimensional variable spaces limit the effectiveness of traditional regression models.
This study develops a Lasso-MIDAS model framework to identify the key multidimensional drivers of stroke admissions. Using this approach, 21 candidate variables encompassing meteorological, environmental, and economic indicators were screened. The empirical results identified 11 core influencing factors. In the meteorological and environmental dimensions, Wind Speed, Carbon Monoxide (CO), and Sulfur Dioxide (SO2) were identified as significant positive drivers, with Temperature Difference also positively correlating with admission risks. Conversely, Nitrogen Dioxide (NO2) exhibited a negative correlation, potentially reflecting behavioral adaptation and exposure reduction during peak pollution periods. In the socioeconomic dimension, the Consumer Price Index (CPI) for Food, Tobacco, and Alcohol emerged as a major risk factor, highlighting the impact of living cost pressures on public health.
The findings demonstrate the superiority of the Lasso-MIDAS model in handling large-scale healthcare data. It effectively addresses the frequency mismatch problem while enhancing the robustness of causal identification through variable shrinkage. These conclusions provide a scientific basis for health authorities to establish early warning systems and optimize public health policy interventions.