Defining Peptides in ChEBI
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Modern biochemistry is producing vast amounts of chemical knowledge. Ontologies, such as the Chemical Entities of Biological Interest (ChEBI) ontology, can help organising this knowledge. With manual classification alone however, ontologies cannot keep up with the growth of their domain. In this work, we propose a novel taxonomy of 67 classes related to peptides, a large branch in ChEBI with nearly 15,000 compounds. The existing natural language definitions in ChEBI have been expanded and specified more precisely. These natural language definitions are accompanied by a logical axiomatisation in monadic second-order logic (MSOL). To use the axiomatisation for automated classification, a methodology has been developed that translates monadic second-order definitions first into partial first-order definitions and finally into an algorithmic classification. This connects three aspects important to ontological definitions: They reflect the opinions of experts, they are unambiguous, and they can be checked automatically. In our evaluation, we compare the results of our classification to the current taxonomy of ChEBI. This reveals both potential inconsistencies in ChEBI as well as areas that might benefit from automated extensions. We also evaluate our natural-language definitions in an expert survey. Scientific Contribution: This work provides precise natural-language definitions of 14 current ChEBI-classes as well as 53 new peptide-related classes. These definitions are formalised in MSOL and come with an efficient implementation that allows for large-scale molecule classification, including a full classification of ChEBI and PubChem.