Generalist Foundation Models Are Not Clinical Enough for Hospital Operations

Eric Oermann
Lavender Jiang
Angelica Chen
Xu Han
Chris Liu
Radhika Dua
Kevin Eaton
Frederick Wolff
Robert Steele
Jeff Zhang
Anton Alyakin
Qingkai Pan
Yanbing Chen
Karl Sangwon
Daniel Alber
Jaden Stryker
Jin Lee
Yindalon Aphinyanaphongs
Kyunghyun Cho

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Operational decisions governing patient flow, cost, and quality of care demand specialized predictive models, yet most clinical NLP efforts focus on medical knowledge benchmarks. We introduce Lang1, a family of language models (100M-7B parameters) pretrained on 80 billion clinical tokens from NYU Langone Health electronic health records blended with 627 billion internet tokens. We evaluate Lang1 on the REalistic Medical Evaluation (ReMedE), an evaluation suite derived from 668,331 Electronic Health Records (EHR) notes spanning five tasks: readmission, mortality prediction, length of stay, comorbidity coding, and insurance denial. In zero-shot settings, both general-purpose and biomedical models underperform on four of five tasks. After finetuning, Lang1-1B outperforms finetuned generalist models up to 70x larger and zero-shot models up to 671x larger. Joint multi-task finetuning yields cross-task transfer, and Lang1-1B transfers effectively to unseen tasks and an external health system. These results demonstrate that effective healthcare AI requires in-domain pretraining, supervised finetuning, and evaluation beyond proxy benchmarks.

Version published to 10.21203/rs.3.rs-9078142/v1 on Research Square
Mar 19, 2026

Ensemble Machine Learning for Institutional-Level Hospital Mortality Prediction Using Clinical and Operational Indicators

This article has 5 authors:
1. Sri Murdiati
2. Murnawan Murnawan
3. Safrizal Rahman
4. Yoga Yuniadi
5. Nirwana Lazuardi Sary
This article has no evaluationsLatest version Mar 25, 2026
The Inefficacy of Artificial Intelligence Large Language Models in Healthcare: A Clinical and Statistical Perspective

This article has 4 authors:
1. Michael Williams
2. Raeed Kabir
3. Cody Taylor
4. Tariq Nakhooda
This article has no evaluationsLatest version Apr 27, 2026
The Inefficacy of Artificial Intelligence Large Language Models in Healthcare: A Clinical and Statistical Perspective

This article has 4 authors:
1. Michael Williams
2. Raeed Kabir
3. Cody Taylor
4. Tariq Nakhooda
This article has no evaluationsLatest version Apr 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Ensemble Machine Learning for Institutional-Level Hospital Mortality Prediction Using Clinical and Operational Indicators

The Inefficacy of Artificial Intelligence Large Language Models in Healthcare: A Clinical and Statistical Perspective

The Inefficacy of Artificial Intelligence Large Language Models in Healthcare: A Clinical and Statistical Perspective