Multi-scale Data Improves Performance of Machine Learning Model for Long COVID Prediction

Wei-Qi Wei
Christopher Guardo
Xinmeng Zhang
Srushti Gandireddy
Chao Yan
Vern Kerchberger
Alyson Dickson
Emily Pfaff
Hiral Master
Melissa Basford
Christopher Chute
Nguyen Tran
Salvatore Manusco
Toufeeq Syed
Zhongming Zhao
QiPing Feng
Melissa Haendel
Christopher Lunt
Paul Harris
Lang Li
Geoffrey Ginsburg
Joshua Denny
Dan Roden

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Long COVID affects a substantial proportion of the over 778 million individuals infected with SARS-CoV-2, yet predictive models remain limited in scope. While existing efforts, such as the National COVID Cohort Collaborative (N3C), have leveraged electronic health record (EHR) data for risk prediction, accumulating evidence points to additional contributions from social, behavioral, and genetic factors. Using a diverse cohort of SARS-CoV-2-infected individuals (n>17,200) from the NIH All of Us Research Program, we investigated whether integrating EHR data with survey-based and genomic information improves model performance. Our multi-scale approach outperformed EHR-only models original AUROC 0.736 (95% CI: 0.730, 0.741), achieving an AUROC of 0.748 (0.741,0.755). Among the top predictors, active-duty service status, self-reported fatigue, and chr19:4719431:G:A_A were among the most informative survey and genetic features. These findings highlight the importance of incorporating multi-scale data to improve risk stratification and inform personalized interventions for long COVID.

Version published to 10.21203/rs.3.rs-7234976/v1 on Research Square
Aug 31, 2025

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

This article has 10 authors:
1. Olanrewaju Eniade
2. Ezekiel Ukwenga
3. Uchenna Akuka
4. Opeyemi Adeniyi
5. Elonna Obak
6. Omolola Adeagbo
7. Peter Babatunde Olaitan
8. Rita Ayanbolade Olowe
9. Tolulope Opakunle
10. Olugbenga Adekunle Olowe
This article has no evaluationsLatest version Jan 25, 2026
Multinational, Calibrated, Non-Laboratory Prevalent Disease Prediction and Survival Modeling for Diabetes, CKD, and CVD

This article has 2 authors:
1. Arthur Moreira Costa
2. Iris Badezet-Delory
This article has no evaluationsLatest version Dec 10, 2025
Personalized Disease Risk Prediction from Multimodal Health Data Using Large Language Models

This article has 2 authors:
1. Hanieh Arjmand
2. Alexandre Tomberg
This article has no evaluationsLatest version Jan 25, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

Multinational, Calibrated, Non-Laboratory Prevalent Disease Prediction and Survival Modeling for Diabetes, CKD, and CVD

Personalized Disease Risk Prediction from Multimodal Health Data Using Large Language Models