Box embeddings for extending ontologies: a data-driven and interpretable approach

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deriving symbolic knowledge from trained deep learning models is challenging due to the lack of transparency in such models. A promising approach to address this issue is to couple a semantic structure with the model outputs and thereby make the model interpretable. In prediction tasks such as multi-label classification, labels tend to form hierarchical relationships. Therefore, we propose enforcing a taxonomical structure on the model’s outputs throughout the training phase. In vector space, a taxonomy can be represented using axis-aligned hyper-rectangles, or boxes, which may overlap or nest within one another. The boundaries of a box determine the extent of a particular category. Thus, we used box-shaped embeddings of ontology classes to learn and transparently represent logical relationships that are only implicit in multi-label datasets. We assessed our model by measuring its ability to approximate the full set of inferred subclass relations in the ChEBI ontology, which is an important knowledge base in the field of life science. We demonstrate that our model captures implicit hierarchical relationships among labels, ensuring consistency with the underlying ontological conceptualization, while also achieving state-of-the-art performance in multi-label classification. Notably, this is accomplished without requiring an explicit taxonomy during the training process.

Scientific contribution

Our proposed approach advances chemical classification by enabling interpretable outputs through a structured and geometrically expressive representation of molecules and their classes.

Article activity feed