Tissue-Specific Carcinogenicity Prediction Using Multi-Task Learning on Attention-based Graph Neural Networks
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Cancer is caused by the uncontrolled growth and division of abnormal cells. In industrialized societies, chemical exposure is one of the leading causes of cancer. Indeed, since certain compounds can induce cancer by damaging genes or affecting cellular metabolism, studying carcinogens is essential. However, previous studies have not considered that compounds may promote different tissue-specific carcinogenicity. Therefore, this study developed a multi-task learning framework to predict tissue-specific carcinogenicity in the liver, lung, stomach, and breast tissues. This framework consisted of a shared layer to extract common features and task-specific layers to perform task-specific predictions. The shared layer contains a graph attention network (GAT) layer to make atom representations that reflect the importance of neighboring atoms and parallel fully connected layers designed for each task combination. These shared representations are then passed to task-specific layers to predict tissue-specific carcinogenicity. This entire training process was conducted through stepwise learning, whereby the model was trained in the first step using partially labeled data for tissues, and the initial weights were determined during this process. The second step trained the model using fully labeled data for all tissues, allowing the model to perform the final training for carcinogenicity prediction. The results demonstrated that the proposed multi-task model achieved superior performance overall. The best performance was observed in the stomach task (AUROC: 0.825; AUPR: 0.867), outperforming single-task models (AUROC: 0.800; AUPR: 0.840) and previous studies (AUROC: 0.743–0.791; AUPR 0.788–0.827). We further analyzed molecules with high predicted carcinogenicity in each tissue and identified critical substructures for the prediction using the attention mechanism. This research can contribute to predicting the tissue-specific carcinogenicity of candidate chemicals in the early stages of drug development, thereby reducing research costs and time.