Predicting Hospital Admissions Using Pretrained EHR Embeddings: External Evaluation and Insights on Local Vocabulary Adaptation

Bernardo Neves
Jorge Cerejo
Simão Gonçalves
Inês Mota
José M. Moreira
Nuno A. Silva
Francisca Leite
Michael Wornow
Mário J. Silva

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose

Unplanned hospital admissions impose substantial strain on healthcare systems, yet predictive models for these events remain underexplored in practice. This study evaluates whether publicly available pretrained transformer-based embeddings, developed on an external health system, can improve prediction of hospital admissions—including unplanned cases—when applied to a different institution with sparser data and a distinct medical vocabulary.

Methods

We performed a retrospective cohort study using structured EHR data from 200,000 adult patients (2007–2023) at a Portuguese hospital, standardized to the OMOP Common Data Model. Four 30-day outcomes were predicted: emergency department visits, hospital admissions, unplanned admissions, and readmissions. Three modeling approaches were compared: (1) clinically curated handcrafted features, (2) frequency-based representations of all recorded OMOP concepts, and (3) pretrained CLMBR-T embeddings generated from longitudinal OMOP data of 2.57 million patients in a U.S. hospital system. Performance was assessed on held-out patients using AUROC, AUPRC, and calibration metrics, with additional analysis of the impact of vocabulary overlap between pretraining and local datasets.

Results

Pretrained embeddings achieved the highest discrimination for all outcomes, particularly for unplanned admissions (AUROC 0.877 vs. 0.770 for counts). Gains were greatest for rarer outcomes and patients with richer clinical histories. Despite only 58% overlap with local vocabulary and substantially fewer events per patient than in pretraining, embeddings transferred effectively, indicating generalizable temporal patterns. Calibration was poorer than simpler models, necessitating post-hoc recalibration before deployment.

Conclusion

Pretrained OMOP-based EHR embeddings can substantially improve prediction of hospital and unplanned admissions in data- and resource-limited settings, even with partial vocabulary overlap. These findings support their use for rapid, cost-effective deployment of clinically meaningful predictive models, provided local recalibration and workflow integration are addressed.

Highlights

Superior cross-institution performance – Pretrained EHR embeddings from Stanford Medicine achieved AUROC 0.877 for unplanned admissions, 0.814 for hospital admissions, 0.782 for ED visits, and 0.923 for readmissions in a Portuguese hospital, outperforming count-based (0.770, 0.767, 0.744, 0.922) and handcrafted feature models across all tasks.
Largest gains for rare, unpredictable events – For unplanned admissions (0.4% prevalence), embeddings nearly tripled AUPRC compared to counts (0.037 vs. 0.011) and improved AUROC by 0.107, with performance continuing to scale with more training data, unlike baselines.
Effective under substantial domain shift – Strong transferability observed despite only 58% vocabulary overlap and markedly different patient populations, coding distributions, and event density (707 vs. 71 events per patient).
Benefit increases with richer patient histories – Performance advantage of embeddings widened in patients with higher code volumes; AUROC for hospital admissions rose from 0.694 in the lowest quartile (Q1) to 0.870 in the highest (Q4), outperforming counts by up to 0.105 in Q4.
Actionable guidance for adoption – Hospitals with sparse data can achieve rapid, cost-effective deployment of predictive models using external embeddings, especially for rare outcomes, if paired with local fine-tuning and post-hoc recalibration to ensure accurate risk estimation before clinical use.

Version published to 10.1101/2025.09.08.25335368 on medRxiv
Sep 12, 2025

From Clinical Judgment to Large Language Models: Benchmarking Predictive Approaches for Unplanned Hospital Admissions

This article has 2 authors:
1. Bernardo Neves
2. Mário J. Silva
This article has no evaluationsLatest version Sep 12, 2025
Clinical evaluation of a natural language processing system for assisting structured diagnosis recording at the point of care: MiADE (Medical Information AI Data Extractor)

This article has 13 authors:
1. Mairead McErlean
2. Jack Ross
3. Jonathan Kossoff
4. Maisarah Amran
5. James Brandreth
6. Leilei Zhu
7. Gary Philippo
8. Wai Keong Wong
9. Folkert W. Asselbergs
10. Richard J.B. Dobson
11. Yogini H Jani
12. Enrico Costanza
13. Anoop D. Shah
This article has no evaluationsLatest version Sep 12, 2025
Lowering Barriers to AI Adoption in Regional Hospitals: Predicting Patient Volumes from Minimal Data

This article has 6 authors:
1. Stefan Förstel
2. Markus Förstel
3. Markus Gallistl
4. Dario Zanca
5. Bjoern M. Eskofier
6. Eva M. Rothgang
This article has no evaluationsLatest version Oct 14, 2025

Discuss this preprint

Listed in

Abstract

Purpose

Methods

Results

Conclusion

Highlights

Article activity feed

Related articles

From Clinical Judgment to Large Language Models: Benchmarking Predictive Approaches for Unplanned Hospital Admissions

Clinical evaluation of a natural language processing system for assisting structured diagnosis recording at the point of care: MiADE (Medical Information AI Data Extractor)

Lowering Barriers to AI Adoption in Regional Hospitals: Predicting Patient Volumes from Minimal Data