Disentangling Confounders from Pathology in Long-COVID Trajectory Prediction for Women: An Interpretable Large-Language-Model Approach

Jing Wang
Zorina Galis
Tong Zhang
Yiming Luo
Amar Sra
Xing Niu
Jie Shen
Qiaomin Xie
Jeremy C. Weiss

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective

Post-acute sequelae of SARS-CoV-2 infection (PASC, “Long COVID”) disproportionately affects women, in whom hallmark symptoms—insomnia, fatigue, palpitations, cognitive difficulty—overlap with comorbidities and hormonal transitions such as menopause. This diagnostic overlap is a confounding problem: models that forecast future symptom severity risk attributing baseline physiological noise to viral pathology. We ask whether an interpretable, causally disentangled language model can separate true pathological signal from such confounders while remaining competitive with strong predictors of future PASC severity.

Materials and methods

Retrospective cohort of 1,155 adult women (median age 61) from the NIH RECOVER program, combining static clinical profiles, longitudinal symptom surveys, and four weeks of mean-aggregated consumer-wearable physiology (heart rate, sleep, activity). We render each patient as a natural-language clinical narrative and fine-tune a small open-weight language model (Qwen2.5-0.5B, LoRA) with an attention-based disentanglement layer that gates the latent state into a causal and a confounder component, trained with an environment-mixing InfoNCE objective. We predict the PASC index at 3-, 6-, and 9-month horizons and benchmark against last-value carry-forward, Lasso, Ridge, gradient-boosted trees (XGBoost), a deep MLP, a tabular ResNet, and a self-attention network, over 20 stratified resamples with paired significance tests. We further stratify by trajectory phenotype (Protected / Responder / Refractory).

Results

Long-COVID severity is strongly autocorrelated, so last-value carry-forward is a hard reference and is the most accurate method on the full cohort (MAE 3.02 / 1.99 / 1.52 at 3/6/9 months). Among learned models the LLM regressor had the lowest MAE at every horizon (e.g. 3.11 vs. XGBoost 3.57 at 3 months; paired p ≤ 0.01). In the Responder phenotype— patients whose trajectories actually move—the LLM was the most accurate method overall at 3 and 6 months (MAE 4.72, 4.06), though its advantage over carry-forward was not statistically significant ( p = 0.70, 0.29). The disentanglement layer assigned maximal saliency to direct pathology tokens ( breathlessness, malaise ; 1.00) while suppressing confounders ( menopause, diabetes ; < 0.27) and linguistic filler (< 0.17).

Conclusion

For static, slowly evolving patients a simple carry-forward forecast is hard to beat and should be the reference any PASC model is judged against. The value of a learned, disentangled model is (i) better accuracy where trajectories are dynamic and (ii) an interpretable, “clinically honest” attribution that down-weights confounders such as menopause—reducing the risk of misattributing baseline physiology to Long COVID.

Author summary

Long COVID is more common and often more severe in women, but many of its symptoms look like other common conditions or like the normal changes of menopause. When a computer model tries to predict how a patient’s symptoms will evolve, it can be fooled into blaming Long COVID for what is really background physiology. We built a model based on a small language model that reads a written summary of each patient—their history, comorbidities, and a month of wearable-device data—and is explicitly trained to separate “true disease signal” from “background noise.” We tested it against standard predictors at 3, 6, and 9 months. We report an finding that is easy to overlook: because Long COVID severity changes slowly, simply assuming a patient’s next score equals their last score is very accurate and hard to beat for stable patients. Our model’s advantage appears where it matters clinically—patients whose symptoms are actually changing—and, importantly, the model shows which words drove its prediction , correctly emphasizing symptoms like breathlessness while down-weighting confounders like menopause. We argue this interpretability, not a small accuracy gain, is the real contribution.

Version published to 10.64898/2026.06.10.26355420 on medRxiv
Jun 12, 2026

Volatility-Level Inference Indexes Psychosis Spectrum Symptoms Independent of Age in Transdiagnostic Help-Seeking Youth

This article has 25 authors:
1. Milad Soltanzadeh
2. Stephanie H. Ameis
3. Colleen E. Charlton
4. Kristin Cleverley
5. Darren B. Courtney
6. Erin W. Dickie
7. Daniel Felsky
8. George Foussias
9. Benjamin Goldstein
10. John D. Griffiths
11. Nicole Kozloff
12. Denisa Lazar
13. Aloha Narajos
14. Yuliya Nikolova
15. Oreoluwa A. Ogundipe
16. Thalia Phi
17. Alexia Polillo
18. Connie Putterman
19. Lena C. Quilty
20. Diya Shah
21. Aristotle N. Voineskos
22. Wei Wang
23. Zheng Wang
24. Andreea O. Diaconescu
25. TAY CAMH Cohort Study Group
This article has no evaluationsLatest version Jun 29, 2026
Predicting Functional Changes in Down Syndrome During the COVID-19 Pandemic: The Role of Biopsychosocial Determinants of Health

This article has 11 authors:
1. Ester Jo
2. Carla Wall
3. Lindsay K. Allen
4. Naomi Wheeler
5. Nicole Baumer
6. Allison D’Aguilar
7. Timothy P. York
8. George Capone
9. Colleen Jackson-Cook
10. Ananda B. Amstadter
11. Ruth C. Brown
This article has no evaluationsLatest version May 21, 2026
Impact of Modifiable Risk Factors and APOE on Neuropsychiatric Symptoms in Alzheimer’s Disease

This article has 14 authors:
1. Hasib Mia
2. Pamela Del Rosario
3. Ajneesh Kumar
4. Nicholas R. Ray
5. Jiji T. Kurup
6. Masood Manoochehri
7. Colin Stein
8. Alyssa N. De Vito
9. Brenna Cholerton
10. Robert A. Sweet
11. Michael L. Cuccaro
12. Gary W. Beecham
13. Edward D. Huey
14. Christiane Reitz
This article has no evaluationsLatest version Jun 5, 2026

Discuss this preprint

Listed in

Abstract

Objective

Materials and methods

Results

Conclusion

Author summary

Article activity feed

Related articles

Volatility-Level Inference Indexes Psychosis Spectrum Symptoms Independent of Age in Transdiagnostic Help-Seeking Youth

Predicting Functional Changes in Down Syndrome During the COVID-19 Pandemic: The Role of Biopsychosocial Determinants of Health

Impact of Modifiable Risk Factors and APOE on Neuropsychiatric Symptoms in Alzheimer’s Disease