Post-ED Trajectory Prediction in Abdominal Pain with a Generative Medical Event Model
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Importance
Abdominal pain causes roughly 10 million US emergency department (ED) visits annually, most resulting in discharge. Post-discharge courses vary, yet existing risk models predict only whether an ED revisit occurs, not what that revisit outcome will entail.
Objective
To evaluate whether Curiosity, a generative medical event foundation model, can predict post-ED-discharge trajectories for adults with abdominal pain, differentiating the timing and severity of expected outcomes.
Design
Retrospective cohort study; encounters January 1–December 31, 2022; 30-day follow-up; analysis conducted in 2026.
Setting
Epic Cosmos research network (multicenter, population-based, de-identified electronic health record).
Participants
Adults (≥ 18 years) discharged from the ED with abdominal pain, excluding training-set patients. Random sample of 3,000 drawn from 150,030 eligible patients (65.3% female; median age 47 years [IQR 36–60]).
Exposure
ED discharge after evaluation for abdominal pain.
Main Outcomes and Measures
Primary: Curiosity model vs. per-task, separately estimated XGBoost models on area under the receiver operating characteristic curve (AUROC) for ED revisit ending in admission (admit-revisit), ED revisit ending in discharge (DC-revisit), and any ED revisit at 72 hours, 7 days, and 30 days. Secondary: trajectory-level accuracy across 36 trajectory classes and edit distance vs XGBoost; calibration of simulated vs observed conditional path probabilities across 45 transitions.
Results
Curiosity identified patients at high risk of revisit requiring admission more accurately than XGBoost and differentiated those likely to revisit without admission. Among 3,000 patients, Curiosity’s 30-day admit-revisit AUROC was 0.83 (95% CI 0.79–0.87) vs 0.70 (95% CI 0.65–0.75) for XGBoost (DeLong P<.001), and admit-revisit AUC-PR was 0.37 (95% CI 0.29–0.46) against a 4.1% cohort base rate, vs XGBoost 0.13 (95% CI 0.09–0.19). Curiosity identified the most likely trajectory out of 36 possibilities for 45.9% of patients (XGBoost 41.0%; McNemar P<.001), with median edit distance 1.28 vs 1.40 (Wilcoxon P<.001). Median absolute calibration error across 45 transitions was 1.30 percentage points (95% CI 0.32–2.49).
Conclusions and Relevance
A generative medical event foundation model produced calibrated trajectory-level predictions and discriminated admit-revisits more effectively than task-specific XGBoost baselines, separating patients that revisited and were admitted from those who revisited and were discharged.