KGML-xDTD: A Knowledge Graph-based Machine Learning Framework for Drug Treatment Prediction and Mechanism Description

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

Computational drug repurposing is a cost- and time-efficient approach that aims to identify new therapeutic targets or diseases (indications) of existing drugs/compounds. It is especially critical for emerging and/or orphan diseases due to its cheaper investment and shorter research cycle compared with traditional wet-lab drug discovery approaches. However, the underlying mechanisms of action (MOAs) between repurposed drugs and their target diseases remain largely unknown, which is still a main obstacle for computational drug repurposing methods to be widely adopted in clinical settings.

Results

In this work, we propose KGML-xDTD : a K nowledge G raph-based M achine L earning framework for e x plainably predicting D rugs T reating D iseases. It is a two-module framework that not only predicts the treatment probabilities between drugs/compounds and diseases but also biologically explains them via knowledge graph (KG) path-based, testable mechanisms of action (MOAs). We leverage knowledge-and-publication based information to extract biologically meaningful “demonstration paths” as the intermediate guidance in the Graph-based Reinforcement Learning (GRL) path-finding process. Comprehensive experiments and case study analyses show that the proposed framework can achieve state-of-the-art performance in both predictions of drug repurposing and recapitulation of human-curated drug MOA paths.

Conclusions

KGML-xDTD is the first model framework that can offer KG-path explanations for drug repurposing predictions by leveraging the combination of prediction outcomes and existing biological knowledge and publications. We believe it can effectively reduce “black-box” concerns and increase prediction confidence for drug repurposing based on predicted path-based explanations, and further accelerate the process of drug discovery for emerging diseases.

Article activity feed

  1. Background

    Michel Dumontier: This paper describes KGML-xDTD, a knowledge graph-based ML framework to predict and explain potential applications of drugs. The main approach is the use of graph reinforcement learning to predict drug-disease pairs and provide a knowledge-based path as a potential mechanism of action. The method is evaluated against other approaches, various data partitioning strategies, comparison to a manually curated database of mechanisms of actions, and two use cases. The paper is well written, easy to read, and makes a contribution to the scientific literature. Accurate prediction of drug uses remains an important and challenging problem in biomedical informatics. The novelty of the approach is to use graph reinforcement learning to achieve state of the art performance for the problem, and it also is able to generate plausible paths within a knowledge graph to serve as mechanistic explanations. There are some limitation to the work that should be addressed. These include:

    1. The baseline models (GAT & GraphSAGE+SVM) only use a small subset of drug-disease replacements. The authors indicate that the smaller subset is necessary owing to time performance constraints. However, there is no discussion as to the possible impact the reduced subset any aspects in relation to their method.
    2. The approach only evaluate 3-hop KG paths, which is 1/7 of what is available in DrugMechDB. What is the quality/performance impact of choosing longer paths? Wouldn't the the number of biologically reasonable paths to explain a predict be substantially reduced? I worry that this is cherry picking the dataset to show good performance for the only case (3-hop) that it is capable of (While critizing other methods as not being performant)
    3. The authors use RepoDB as one of their sources, and specifically use the "withdrawn" set as true negatives. However, most withdrawn tags are linked to reasons other than safety or efficacy of the clinical trial. As such it is not clear that this set is a good true negative set.
    4. The authors use MyChem as a resource for drug indications/contraindications. However, MyChem is not an original source - it aggregates other resources. The authors should properly identify the source of "human curated annotations".
    5. I commend the authors for their evaluation, which uses a number of different train/test strategies and against different methods. However, as far as i can see the train/test strategy does not adequately remove similar true drugs-disease pairs from the training/test set. That is to say there are many drugs that are approved for very similar conditions, and therefore it becomes somewhat trivial to predict these (this problem is highlighted in the 2011 PREDICT paper by assaf gottlieb). More work should be done here to report an accuracy based on more stringent evaluation criteria.
    6. It's unclear to me that the 124k diseases are real (diagnosable) diseases that could be prescribed for. Inflating the number of possible (but implausible) diseases might augment the performance, but contribute nothing to medicine. Elaborate.
    7. Figures 5, 6 are difficult to read
    8. It's nice to see the 2 use cases in the paper. However, the extracted subgraphs are quite different than the DrugMechDB MOA paths. So there's something to be said about the succinctness of the DrugMechDB MOA paths, which might prove to be a better training set for some explanation algorithm, rather that one that is independently generated. Overall, this is a nice paper with an interesting approach.
  2. ABSTRACT

    **Yuansheng Liu: **The paper entitled "KGML-xDTD: A Knowledge Graph-based Machine Learning Framework for Drug Treatment Prediction and Mechanism Description" proposes KGML-xDTD, a two-module, knowledge graph-based machine learning framework . Author constructs a large knowledge graph for the training of the model. The model is divided into two modules, one for drug repurposing prediction and the other for Mechansim Of Action Prediction. Both modules have achieved good results compared with the existing baseline. Here are my specific points: (1) It is mentioned on page 6 that the data are classified into three categories, while other data are classified into two categories. How did you exclude the "unknown" category and adjusted result? (2) Drug Repurposing Prediction model and Mechanism of Action Prediction model seems to be two separate training model. I can not find evidence of multitasking training from the content. If the model is trained separately, which model is the evaluation metrics according to? If training together, the model section should be written more clearly. (3) The introduction part only mentioned about Drug Repurposing Prediction Model, but it didn't describe existing Mechanism Of Action Prediction model. (4) Baseline seems to be Drug Repurposing Prediction SOTA model. But the best performance of the work is about Mechanism Of Action Prediction. (5) The data set appears to selectively chose drug-disease pairs with intermediate paths. But if the drug or disease in the network do not connect, that how dose Drug Repurposing Prediction model perform?