Pretrained Patient Trajectories for Adverse Drug Event Prediction Using Common Data Model-based Electronic Health Records

Junmo Kim
Joo Seong Kim
Ji-Hyang Lee
Min-Gyu Kim
Taehyun Kim
Chaeeun Cho
Rae Woong Park
Kwangsoo Kim

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Pretraining electronic health record (EHR) data using language models by treating patient trajectories as natural language sentences has enhanced performance across various medical tasks. However, EHR pretraining models have never been utilized in adverse drug event (ADE) prediction.

Methods

A retrospective study was conducted on observational medical outcomes partnership (OMOP)-common data model (CDM) based EHR data from two separate tertiary hospitals. The data included patient information in various domains such as diagnosis, prescription, measurement, and procedure. For pretraining, codes were randomly masked, and the model was trained to infer the masked tokens utilizing preceding and following history. In this process, we introduced domain embedding (DE) to provide information about the domain of the masked token, preventing the model from finding codes from irrelevant domains. For qualitative analysis, we identified important features using the attention matrix from each finetuned model.

Results

510,879 and 419,505 adult inpatients from two separate tertiary hospitals were included in internal and external datasets. EHR pretraining model with DE outperformed all the other baselines in all cohorts. For feature importance analysis, we demonstrated that the results were consistent with priorly reported background clinical knowledge. In addition to cohort-level interpretation, patient-level interpretation was also available.

Conclusions

CDM-based EHR pretraining model with DE is a proper model for various ADE prediction tasks. The results of the qualitative analysis with feature importance were consistent with background clinical knowledge.

Plain language summary

Patient history is like natural language; each medical code corresponds to a word, and the sequence of medical codes corresponds to a sentence. Language models learn from sentences by inferring masked words using remaining unmasked words, and language model-based EHR pretraining models comprehend medical context similarly. As several studies that have utilized EHR pretraining models have achieved great success in various tasks, we applied EHR pretraining models for adverse drug event (ADE) prediction. For better inference of the EHR pretraining model, we introduced domain embedding (DE) to provide a hint of the domain for each masked code. The model pretrained with DE performed the best in various ADE tasks, and regarding background clinical knowledge was well-reflected in our feature importance-based qualitative analysis.

Version published to 10.1101/2024.09.30.24314595v3 on medRxiv
Mar 14, 2025
Version published to 10.1101/2024.09.30.24314595v2 on medRxiv
Nov 11, 2024
Version published to 10.1101/2024.09.30.24314595v1 on medRxiv
Sep 30, 2024

Accurate machine learning-based CVD risk prediction in primary care may reduce the need for routine health care checks

This article has 10 authors:
1. Katarzyna Dziopa
2. Sophie Eastwood
3. Daniel Bos
4. Maryam Kavousi
5. Maarten J.G. Leening
6. Joline W J Beulens
7. Peter P Harms
8. Nishi Chaturvedi
9. Folkert W Asselbergs
10. Amand Floriaan Schmidt
This article has no evaluationsLatest version Jun 13, 2025
Enhancing Cause of Death Prediction: Development and Validation of ML Models Using Multimodal Data Across Multiple Healthcare Sites

This article has 22 authors:
1. Mohammed Al-Garadi
2. Rishi J Desai
3. Kerry Ngan
4. Michele LeNoue-Newton
5. Ruth M. Reeves
6. Daniel Park
7. Jose J. Hernández-Muñoz
8. Shirley V. Wang
9. Judith C. Maro
10. Candace C. Fuller
11. Joshua Lin Kueiyu
12. Aida Kuzucan
13. Kevin Coughlin
14. Haritha Pillai
15. Melissa McPheeters
16. Jill Whitaker
17. Jessica A. Deere
18. Michael F. McLemore
19. Dax M. Westerman
20. Tony Morrow
21. Margaret A. Adgent
22. Michael E. Matheny
This article has no evaluationsLatest version Jun 27, 2025
Using Ensemble and Multi-Model Learning to Predict Longitudinal Hospitalization of Cardiovascular Patients and Warn of Pandemic Risks: A Retrospective Cohort Study with External Validations

This article has 7 authors:
1. Chengbo Fu
2. Zhe Zheng
3. Xiaoyan Yin
4. Xuanchu Ge
5. Xiangyu Zeng
6. Lingfeng Zha
7. Yanze Li
This article has no evaluationsLatest version Jun 10, 2025

Listed in

Abstract

Background

Methods

Results

Conclusions

Plain language summary

Article activity feed

Related articles

Accurate machine learning-based CVD risk prediction in primary care may reduce the need for routine health care checks

Enhancing Cause of Death Prediction: Development and Validation of ML Models Using Multimodal Data Across Multiple Healthcare Sites

Using Ensemble and Multi-Model Learning to Predict Longitudinal Hospitalization of Cardiovascular Patients and Warn of Pandemic Risks: A Retrospective Cohort Study with External Validations