SPHN Connector - A scalable pipeline for generating validated knowledge graphs from federated and semantically enriched health data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: The integration and reuse of heterogeneous health data, including clinical records, cohort studies, and omics datasets, are essential for advancing modern biomedical research. Knowledge graphs offer a powerful means to semantically link such data, enabling interoperability and reuse. The Swiss Personalized Health Network has developed a comprehensive semantic interoperability framework to implement the FAIR (Findable, Accessible, Interoperable, Reusable) principles at a national level. Methods: This paper presents the adopted strategy and the resulting tool for building such federated knowledge graphs, marking a shift from centralized approaches to a model where hospitals and research partners semantically enrich and produce their own data locally. Results: A core component enabling the implementation of this strategy is the SPHN Connector, a tool designed to tackle the technical challenges of this process. It converts diverse data formats into semantically enriched RDF, and offers capabilities for data transformation, de-identification, and validation, particularly for iterative delivery in a federated context. Conclusion: These generated datasets can then either be integrated centrally or used in a federated way, allowing for the linkage of information from the same patient, for example, clinical routine data and omics metadata, as well as the combination of data from different patients across sites.