Extracting Non-Taxonomic and Ternary Relations from Patient-Generated Texts for Semantic Interoperability
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Patient-generated texts usually contain useful information, but lack rigorous internal cell structure making them unfit for structured databases. Current research work has focused on identifying hierarchical/taxonomic relations, consequently ignoring non-hierarchical and ternary relations, which are equally crucial for comprehensive semantic understanding. This study addresses this through semantic alignment based on non-taxonomic and ternary relational components. The work adopts a Design Science Research (DSR) approach, with a pragmatic research philosophy. We develop and evaluate a knowledge-infused neural framework for cross-domain ontology integration that supports capturing and representing non-taxonomic and ternary relationships beyond general hierarchical relations. The framework adopts a four-layered architecture. A key contribution is the implementation of the delayed fusion strategy that balances the need for contextual neural learning, the interpretable rule-based dictionary knowledge, and incorporating BioBERT as a relations validator to ensure domain factual grounding in the integration. The framework was evaluated on 38,115 documents from anxiety and depression datasets, of which 27,183 were key phrases. The hybrid per class adaptive strategy extracted 113 unique concepts, prioritizing a more conservative prediction. The hybrid union extracted 222 unique concepts prioritizing wider coverage of domain concepts for the construction of the knowledge graph. The framework achieved an accuracy of 98.91% and an F1 score of 77.6%, a 10% improvement compared to the BiLSTM F1 score (67.7%). The framework also validated 384 semantic relations with a validation rate of 92.7%. Of the semantic relations validated, 240 were ternary relations that captured multiple contexts of interactions between the concepts. Non-taxonomic relations were 144 in total, organized into different semantic categories including associative, functional, causal, risk factors, and statistical. The framework transforms unstructured patient-generated texts into structured interoperable knowledge, while preserving different clinical contexts. This helped advance semantic interoperability of health data by improving clinical decision making and biomedical knowledge reuse.