Intelligent QSAR Approaches: Harnessing Machine Learning for Early Detection of Carcinogenic Agents
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Traditional carcinogenicity testing methods are costly and time-consuming, while existing Quantitative Structure-Activity Relationship (QSAR) models suffer from low accuracy and limited applicability domains. This study addresses these limitations by developing enhanced QSAR models using machine learning (ML) algorithms. A dataset of 805 compounds from the Carcinogenic Potency Database (CPD) was used to train classification and regression models, employing Bayesian classifiers, recursive partitioning, Kernel-based Partial Least Squares (KPLS), and deep learning techniques (Neural Networks, Random Forests). An independent validation dataset (105 compounds) was used to assess model performance. The DeepChem-based classification model achieved 81% test accuracy and 72% external validation accuracy, while the AutoQSAR regression model demonstrated an R² of 0.58 and Q² of 0.51, outperforming existing literature benchmarks. These models exhibit broad chemical space coverage, offering a robust, cost-effective alternative for carcinogenicity prediction.