Advancing event log preparation with quality optimization for hospital process mining

Ruihua Guo
Angus Richie
Yang Lu
S. T. Boris Choy
Ross Smith
Haeri Min
Qifan Chen
Simon K. Poon

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Time-dependent clinical events collected from the electronic health records (EHR), known as event logs, provided enriched information yet lack of systematic approach to address quality problems. A multi-layer approach has been proposed to enhance the quality of hospital event logs to assess unplanned readmission risk in patients with heart failure (HF). Method Eligible patients were identified from DREAM, a multi-site hospital dataset encompasses routinely collected EHR within a large metropolitan health system in Australia. At source level, the Weiskopf and Weng framework was adopted to evaluate the quality across five dimensions—currency, correctness, completeness, concordance, plausibility, alignment with the study objective, and at multiple analytical levels. Results were benchmarked against the publicly available Medical Information Mart for Intensive Care IV (MIMIC-IV) hospital database. A biodiversity framework has been employed to assess the quality at log level and compared to MIMIC-IV logs. Results Our findings showed that DREAM provided a timely, area-specific source of information with superior currency and source completeness compared to the benchmark database. The correctness and plausibility were comparable for both sources. Both datasets showed higher coverage than log completeness, with MIMIC-IV logs demonstrating greater complexity, reflected by higher diversity across all subpopulation groups. Conclusion This multi-layer approach aligned closely with the study objective, enabling domain-specific contextual awareness and mitigating bias at multiple levels. By incorporating an additional layer of event log quality evaluation based on biodiversity theory, the approach enhanced external validity and internal fairness, improving log comparability across data sources and within subpopulation groups.

Version published to 10.21203/rs.3.rs-7796031/v1 on Research Square
Nov 20, 2025

Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease

This article has 16 authors:
1. Hao Liu
2. Meijun Liu
3. Xinmiao Guan
4. Feng Cao
5. Changhao Liang
6. Zhongwen Qi
7. Jiaqi Hui
8. Junnan Zhao
9. Jingli Xing
10. Jianguo Zhou
11. Dong Zhang
12. Lei Liu
13. Xiaoliang Hao
14. Minjing Luo
15. Fengqin Xu
16. Yutong Fei
This article has no evaluationsLatest version Jan 12, 2026
A novel pipeline for realistic synthetic longitudinal EHR data generation

This article has 3 authors:
1. Gabrielle Josling
2. Ibrahima Diouf
3. Sankalp Khanna
This article has no evaluationsLatest version Jan 29, 2026
Personalized Disease Risk Prediction from Multimodal Health Data Using Large Language Models

This article has 2 authors:
1. Hanieh Arjmand
2. Alexandre Tomberg
This article has no evaluationsLatest version Jan 25, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease

A novel pipeline for realistic synthetic longitudinal EHR data generation

Personalized Disease Risk Prediction from Multimodal Health Data Using Large Language Models