Markovian Representation Learning for Longitudinal Electronic Health Records

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deep learning models for electronic health records are typically trained with token-level reconstruction or task-specific supervision, encouraging contextual predictability rather than explicit modeling of temporal dynamics. We propose a Markovian representation learning framework that treats longitudinal EHR data as realizations of a latent stochastic process and trains embeddings to admit low-complexity transition operators in latent space. By optimizing transition likelihood alongside information-preserving objectives, the learned representations approximate sufficient statistics of underlying clinical state, yielding compact dynamical modes with interpretable time scales. Across multiple prediction tasks and external hospital transfer settings, Markov-structured pretraining improves discrimination, temporal forecasting performance, and robustness relative to supervised and masked-event baselines. We further demonstrate that the learned transition structure supports LLM-assisted summarization and agentic cohort exploration, enabling clinician-friendly interpretation of latent progression patterns. These results suggest that explicitly modeling temporal transition structure provides a principled complement to token-centric foundation modeling for longitudinal medical data.

Article activity feed