GERBEHRT: A BERT-based Model Tailored for German Electronic Health Records – Potential in Chronic Kidney Disease Prediction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Routinely collected electronic health records (EHRs) contain rich longitudinal information that enables the prediction of patient outcomes at scale. We developed GERBEHRT, a transformer model adapted from BEHRT and specifically tailored to German EHRs. GERBEHRT was pretrained on outpatient claims from more than 9 million statutorily insured patients and fine-tuned with nearly 1 million additional patients to predict chronic kidney disease (CKD) - a serious condition whose progression can be delayed by early detection. GERBEHRT incorporates EHR features not previously explored in BERT-based approaches and introduces an efficient method to represent multiple attributes per medical concept, such as diagnoses and medications. In a test cohort of 3.7 million patients with 1.5% CKD positives, GERBEHRT achieved an area under the receiver operating characteristic curve (AUROC) of 87.9 and an average precision (AVPR) of 11.4 for a three-year prediction of incident moderate-to-severe CKD, outperforming riskfactor-based models (AUROC/AVPR: 83.6/6.4) and more traditional algorithms using the full EHR (AUROC/AVPR: 86.9/10.1). Although CKD risk prediction remains challenging, GERBEHRT’s superior performance underscores the importance of comprehensive EHR utilization and highlights the potential of tailored deep learning models for personalized CKD risk prediction and targeted patient screening.

Article activity feed