Retrieval-aligned Tabular Foundation Models Enable Robust Clinical Risk Prediction in Electronic Health Records Under Real-world Constraints

Minh-Khoi Pham
Thang-Long Nguyen Ho
Thao Thi Phuong Dao
Tai Tan Mai
Minh-Triet Tran
Marie Elizabeth Ward
Una Geary
Rob Brennan
Nick McDonald
Martin Crane
Marija Bezbradica

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurate clinical outcome prediction from structured electronic health records (EHRs) remains challenging due to high-dimensional heterogeneous features, severe outcome imbalance, and distribution shift—constraints that are far more pronounced than in general tabular benchmarks. Although recently developed tabular foundation models, particularly in-context and retrieval-augmented approaches, achieve strong results on generic datasets, their behavior under realistic clinical constraints remains unclear. In this work, we present the first systematic multi-cohort EHR benchmark comparing classical machine learning, deep tabular models, and tabular in-context learning (TICL) methods, including prior-fitted network (PFN)–based approaches and their retrieval-augmented variants (RA-TICL), across small clinical datasets and large real-world ICU cohorts. Through controlled stress tests varying data scale, feature dimensionality, outcome rarity, and cross-cohort generalization, we identify regime-dependent performance patterns. PFN-based TICL models are notably sample-efficient without retraining in low-data regimes, but naïve distance-based retrieval leads to degradation as feature heterogeneity and class imbalance intensify. To address this limitation, we propose Attention Weighting for Aligned Retrieval Embeddings (AWARE), a task-aligned retrieval framework that improves neighborhood quality through supervised embedding retrieval learning and lightweight adapter fine-tuning. Across stress tests, AWARE yields relative AUPRC improvements that increase with data complexity, from approximately 0.3–4.0% as training size scales, to 7.0% under high feature dimensionality, and up to 12.2% under extreme outcome imbalance, demonstrating that retrieval alignment becomes increasingly critical in complex EHR settings. Collectively, our findings establish retrieval quality and retrieval–inference coupling as central bottlenecks for deploying tabular in-context learning in clinical prediction.

Version published to 10.21203/rs.3.rs-9085469/v1 on Research Square
Mar 19, 2026

Systematic benchmarking of foundation models and classical baselines for microbiome-based disease prediction

This article has 3 authors:
1. Jin Mu
2. Zheng-Zheng Tang
3. Guanhua Chen
This article has no evaluationsLatest version Feb 25, 2026
HEART: Hierarchical ensemble model using augmented representations and tabular learning for coronary artery disease prediction

This article has 3 authors:
1. Dimitrios Papakyriakopoulos
2. Pantelis Z. Lappas
3. Manolis N. Kritikos
This article has no evaluationsLatest version Feb 24, 2026
Reframing AI for Rare Disease Recognition

This article has 19 authors:
1. Wei-Qi Wei
2. Chao Yan
3. Wu-Chen Su
4. Yi Xin
5. Monika Grabowska
6. Vern Kerchberger
7. Victor Borza
8. Jinlian Wang
9. Liwei Wang
10. Rui Li
11. Jacob Lynn
12. Alyson Dickson
13. Cathy Shyr
14. QiPing Feng
15. C Stein
16. Kai Wang
17. Peter Embí
18. Bradley Malin
19. Hongfang Liu
This article has no evaluationsLatest version Apr 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Systematic benchmarking of foundation models and classical baselines for microbiome-based disease prediction

HEART: Hierarchical ensemble model using augmented representations and tabular learning for coronary artery disease prediction

Reframing AI for Rare Disease Recognition