Construction of Personal Health Knowledge Graphs for Clinical Data Harmonization in Breast Cancer
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Personal health data contain valuable information for breast cancer management. Integration of heterogeneous data and maintenance of clinical registries are time-consuming and labor-intensive. We aimed to leverage an Artificial Intelligence (AI)-powered virtual assistant supporting semi-automated curation, including data quality enhancement, and publishing of personal health data. Identifiable data of breast cancer patients can be transformed into interoperable personal health knowledge graphs for secondary use. Methods With patient-informed consent, breast cancer patient data were extracted from the hospital systems and ingested into the virtual assistant. Data items were mapped and transformed into target concepts within a knowledge graph compliant with a reference ontology. Integrated classic and AI tools were used to support transformation of individual patients' data into a personal health knowledge graph (PHKG). Each graph was assessed by a Shapes Constraint Language (SHACL)-based validator to ensure the data quality. An RDF Query Language (SPARQL) query was executed on top of validated PHKGs from multiple patients to extract the relevant data elements and generate a local breast cancer registry, interoperable with registries generated in the same way across three different hospitals. Results The first version of the AI-powered virtual assistant prototype was developed and deployed in our hospital, as well as in two other hospitals in Austria and in Estonia. Twelve tables with 184 data items, including structured, semi-structured and fully narrative elements, were extracted from the local hospital systems. Data categories included demographics, diagnosis, medical history, pathological reports, laboratory tests, surgical records, therapy, and follow-up after previous treatments. Personal health knowledge graphs incorporating the data elements required for the Breast Cancer (BC) registry were constructed after data transformation. A SPARQL query was subsequently developed to build a local BC registry that automatically retrieved these relevant elements. The same approach took place in the other two hospitals. Conclusion The proposed workflow of semi-automated health data curation and quality enhancement from heterogeneous data sources to interoperable and reusable output is feasible. It provides a potential solution to enhance medical data interoperability and facilitate the maintenance of clinical registries.