Large language models improve transferability of electronic health record-based predictions across countries and coding systems

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Variation in medical practices and reporting standards across healthcare systems limits the transferability of prediction models based on structured electronic health record (EHR) data. We introduce GRASP, a novel transformer-based architecture that enhances the generalizability of EHR-based prediction by embedding medical codes into a unified semantic space using a large language model. We applied GRASP to predict the onset of 21 diseases and all-cause mortality in over one million individuals from UK Biobank (UK), FinnGen (Finland) and Mount Sinai (USA), all harmonized to OMOP common data model. Trained on the UK Biobank and evaluated in FinnGen and Mount Sinai, GRASP achieved an average ΔC-index that was 83% and 35% higher than language-unaware models, respectively. GRASP also showed significantly higher correlations with polygenic risk scores for 62% of diseases. Notably, GRASP mantained robust performance even when datasets were not harmonized to the same data model, accurately predicting disease risk from ICD-10-CM codes without direct mappings to OMOP. GRASP enables accurate and transferable disease predictions across heterogeneous healthcare systems with minimal resource requirements.

Article activity feed