Predicting The Pathway Involvement For All Pathway and Associated Compound Entries Defined in the Kyoto Encyclopedia of Gene and Genomes

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background/Objectives

Predicting the biochemical pathway involvement of a compound could facilitate the interpretation of biological and biomedical research. Prior prediction approaches have largely focused on metabolism, training machine learning models to solely predict on metabolic pathways. However, there are many other types of pathways in cells and organisms which are of interest to biologists;

Methods

While several publications have made use of the metabolites and metabolic pathways available in the Kyoto Encyclopedia of Genes and Genomes (KEGG), we downloaded all the compound entries with pathway annotations available in KEGG. From this data, we constructed a dataset where each entry contained features representing compounds combined with features representing pathways followed by a binary label indicating whether the given compound is associated with the given pathway. We trained multi-layer perceptron binary classifiers on variations of this dataset;

Results

The model trained on all KEGG compounds and pathways scored an overall mean performance of 0.847, median of 0.848, and standard deviation of 0.0098.

Conclusions

The mean performance of 0.847 with a standard deviation of 0.0098 for all KEGG pathways, compared to the performance of 0.800 and standard deviation of 0.021 of metabolic KEGG pathways only, demonstrates the capability to effectively predict biochemical pathways in general in addition to those specifically related to metabolism. Moreover, the improvement in the performance demonstrates additional transfer learning with the inclusion of non-metabolic pathways.

Article activity feed