Prediction of Long COVID and Mortality among Patients with Substance Use Disorder
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The convergence of the COVID-19 pandemic and the substance use disorder (SUD) crisis has created a syndemic that places this vulnerable population at extreme risk for acute and chronic adverse outcomes. This study addresses the critical need for proactive risk stratification by developing and contrasting machine learning models to predict two distinct endpoints in hospitalized patients with COVID-19 with SUD: in-hospital mortality and long COVID. Using comprehensive electronic health record (EHR) data, we systematically address severe class imbalance using a combination of specialized algorithms (e.g., Balanced Random Forest) and data resampling techniques (e.g., SMOTE). Our fine-tuned Logistic Regression model for mortality achieves 93% recall, successfully identifying patients at risk of death. For the more challenging long COVID prediction task, our proposed weighted ensemble model achieves 80% recall, demonstrating strong performance in identifying patients susceptible to chronic illness. Feature importance analysis reveals distinct clinomic signatures: acute mortality is driven by markers of systemic distress (e.g., lactic acid, D-dimer), while chronic risk is linked to metabolic and inflammatory factors (e.g., BMI, renal function, preexisting sleep disorders). Our work delivers a validated computational toolkit for dual-risk prediction, enabling targeted interventions to mitigate both immediate and long-term harm in this high-risk population.