Personalized Disease Prediction Framework based on Genomic Variants and Disease Histories using Deep Embeddings and Alignment-based Process Conformance Checking
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study proposes a novel personalized disease prediction framework that integrates heterogeneous biomedical information, including structured genomic variant annotations, deep semantic embeddings of longitudinal disease histories, and process conformance metrics derived from historical disease pathways. Disease history embeddings were generated using BioClinicalBERT, while alignment-based process conformance checking was applied to quantify how closely an individual’s disease trajectory conforms to typical progression patterns observed in the population. The results demonstrate that incorporating conformance-based fitness features significantly improves prediction performance across all disease categories and classifiers, yielding consistently higher AUROC values and lower Brier scores. These findings indicate that process-level disease pathway conformity captures critical temporal and behavioral information not fully represented by genomic or deep semantic features alone, highlighting the importance of integrating genetic, semantic, and process-based signals for personalized disease prediction in precision medicine.