Retrieval-aligned Tabular Foundation Models Enable Robust Clinical Risk Prediction in Electronic Health Records Under Real-world Constraints

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate clinical outcome prediction from structured electronic health records (EHRs) remains challenging due to high-dimensional heterogeneous features, severe outcome imbalance, and distribution shift—constraints that are far more pronounced than in general tabular benchmarks. Although recently developed tabular foundation models, particularly in-context and retrieval-augmented approaches, achieve strong results on generic datasets, their behavior under realistic clinical constraints remains unclear. In this work, we present the first systematic multi-cohort EHR benchmark comparing classical machine learning, deep tabular models, and tabular in-context learning (TICL) methods, including prior-fitted network (PFN)–based approaches and their retrieval-augmented variants (RA-TICL), across small clinical datasets and large real-world ICU cohorts. Through controlled stress tests varying data scale, feature dimensionality, outcome rarity, and cross-cohort generalization, we identify regime-dependent performance patterns. PFN-based TICL models are notably sample-efficient without retraining in low-data regimes, but naïve distance-based retrieval leads to degradation as feature heterogeneity and class imbalance intensify. To address this limitation, we propose Attention Weighting for Aligned Retrieval Embeddings (AWARE), a task-aligned retrieval framework that improves neighborhood quality through supervised embedding retrieval learning and lightweight adapter fine-tuning. Across stress tests, AWARE yields relative AUPRC improvements that increase with data complexity, from approximately 0.3–4.0% as training size scales, to 7.0% under high feature dimensionality, and up to 12.2% under extreme outcome imbalance, demonstrating that retrieval alignment becomes increasingly critical in complex EHR settings. Collectively, our findings establish retrieval quality and retrieval–inference coupling as central bottlenecks for deploying tabular in-context learning in clinical prediction.

Article activity feed