Forecasting State Employment Relativity with Health, Labor Flow, and Small Establishment Indicators: A Leakage Safe Temporal Machine Learning Pipeline

Dingyuan Liu
Yiqing Wang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Standard employment forecasting often absorbs common national drift, which can obscure the reasons states persistently differ in employment intensity. This study develops and evaluates a forecasting framework for state employment relativity, defined as the log ratio of a state employment per capita measure to its contemporaneous national counterpart. We construct a state year panel for 2014 to 2024 by integrating administrative labor market aggregates, population weighted health and socioeconomic indicators, and small establishment structure measures that proxy local service capacity. To prevent temporal leakage, all tuning and validation procedures preserve chronological ordering, and the main model comparison is based on a fixed forward holdout that trains on 2014 to 2021 and evaluates on 2022 to 2024. Among the prespecified learners, LightGBM delivers the strongest forward performance, with holdout R^2=0.831, RMSE =0.0536, and MAE =0.0430 in log points. The main interpretability analysis therefore focuses on SHAP values from the selected LightGBM model, while a stacked ensemble is used only as a supplementary robustness diagnostic for temporal stability of predictor families. Across these analyses, the most important signals center on education composition, health burden and access, local service ecosystem capacity, and hiring intensity. A state level case study for Texas and North Carolina shows how similar relative employment positions can arise from different feature combinations and different model error profiles. The results support a forecasting interpretation in which state deviations from national employment intensity reflect nonlinear interactions among human capital composition, population health conditions, and local business service structure, and they show why clear separation between model selection, model interpretation, and robustness analysis is essential for credible empirical claims.

Version published to 10.21203/rs.3.rs-9150589/v1 on Research Square
Mar 18, 2026

Forecasting Health Expenditures in Iran 2030

This article has 3 authors:
1. Maryam Hedayati
2. Ali Akbar Fazaeli
3. Amir Abbas Fazaeli
This article has no evaluationsLatest version Mar 6, 2026
Machine learning early warning for financial distress in health plan operators

This article has 2 authors:
1. Guilherme Coelho
2. Clarimar José Coelho
This article has no evaluationsLatest version Mar 4, 2026
Service Coverage Does Not Uniformly Translate into Financial Protection under Universal Health Coverage: A Multicountry Panel Analysis, 2000-2023

This article has 2 authors:
1. Anderson Díaz Pérez
2. Wendy Acuña Pérez
This article has no evaluationsLatest version Mar 17, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Forecasting Health Expenditures in Iran 2030

Machine learning early warning for financial distress in health plan operators

Service Coverage Does Not Uniformly Translate into Financial Protection under Universal Health Coverage: A Multicountry Panel Analysis, 2000-2023