Sparse Foundation Models for Continous-Time EHRs

William Ferrel
Timothy Chang
Samuel Lawrence

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Electronic health records (EHRs) are characterized by extreme sparsity, irregular sampling, and event-driven observation processes, yet most existing foundation models treat them as dense sequences through discretization and tokenization. This mismatch forces models to infer temporal dynamics implicitly through depth and attention, conflating missingness with semantic content and limiting robustness under real-world sparsity. We propose a Sparse Foundation Model (SFM) that represents patient trajectories as continuous-time latent states governed by neural ordinary differential equations, with event-conditioned residual updates applied only at observed times. This formulation unifies continuous-time dynamics with ResNet-style depth, yielding a sparsity-adaptive architecture whose effective depth is determined by the data rather than a fixed temporal grid. We pretrain SFM using self-supervised objectives defined over irregular event streams, enabling the model to reason explicitly about missing observations and variable temporal gaps. Across MEDS-DEV and FoMoH benchmarks, SFM consistently outperforms state-of-the-art EHR foundation models, including CEHR-BERT, CEHR-GPT, CLMBR, and MOTOR, with particularly large gains under extreme sparsity, limited labeled supervision, and unseen prediction horizons. Beyond accuracy, SFM exhibits improved calibration, temporal stability, and robustness to observation dropout. These results suggest that aligning foundation model architecture with the sparse, continuous-time structure of clinical data is critical for scalable and reliable healthcare representation learning.

Version published to 10.21203/rs.3.rs-8819405/v1 on Research Square
Feb 10, 2026

Markovian Representation Learning for Longitudinal Electronic Health Records

This article has 3 authors:
1. Samuel Lawrence
2. Yuchen Zhang
3. Mingyu Wang
This article has no evaluationsLatest version Feb 24, 2026
A Representation-Consistent Gated Recurrent Framework for Robust Medical Time-Series Classification

This article has 1 author:
1. Maitri Krishna Sai
This article has no evaluationsLatest version Feb 11, 2026
Foundation Model for Biological Temporal Data Dynamics with Experimental Validation

This article has 2 authors:
1. Xiaoyu Duan
2. Vipul Periwal
This article has no evaluationsLatest version Mar 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Markovian Representation Learning for Longitudinal Electronic Health Records

A Representation-Consistent Gated Recurrent Framework for Robust Medical Time-Series Classification

Foundation Model for Biological Temporal Data Dynamics with Experimental Validation