Predicting The Pathway Involvement For All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Gene and Genomes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background/Objectives
Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict on metabolic pathways. However, there are many other types of pathways in cells and organisms which are of interest to biologists;
Methods
While several publications have made use of the metabolites and metabolic pathways available in the Kyoto Encyclopedia of Genes and Genomes (KEGG), we downloaded all the compound entries with pathway annotations available in KEGG. From this data, we constructed a dataset where each entry contained features representing compounds combined with features representing pathways followed by a binary label indicating whether the given compound is associated with the given pathway. We trained multi-layer perceptron binary classifiers on variations of this dataset;
Results
The model trained on all KEGG compounds and pathways scored an overall mean performance of 0.847, median of 0.848, and standard deviation of 0.0098.
Conclusions
The mean performance of 0.847 with a standard deviation of 0.0098 for all KEGG pathways, compared to the performance of 0.800 and standard deviation of 0.021 of metabolic KEGG pathways only, demonstrates the capability to effectively predict biochemical pathways in general in addition to those specifically related to metabolism. Moreover, the improvement in the performance demonstrates additional transfer learning with the inclusion of non-metabolic pathways.