DCPM-ADMET: Fusion of Dual-channel Pre-trained Model and Molecular Fingerprints to enhance Drug ADMET Properties Prediction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of drugs are critical to their efficacy and safety in clinical trials; however, traditional machine learning methods have limited generalization ability in ADMET prediction due to insufficient data. To address this issue, we developed DCPM-ADMET, an innovative pre-trained model with higher accuracy, whose architecture employs a two-channel system—including an XLNet-based module for capturing the semantics of molecular sequences, an RNN-based component for small molecule property extraction, and ECFP fingerprints for capturing molecular substructures—and after initial pre-training, the model outperforms traditional methods in prediction accuracy on multiple benchmark datasets for molecular properties; additionally, we fine-tuned it on a self-constructed database containing 465,470 entries covering 97 ADMET properties, and by integrating these 97 prediction models and 36 computational properties, we further developed a free online ADMET prediction tool with 133 endpoints (available at http://admet.bioai-global.com/), which is designed to assist researchers in conducting comprehensive molecular ADMET predictions. Scientific contribution The development of DCPM-ADMET represents a seminal advancement in computational pharmacology. This novel pre-trained model successfully addresses the fundamental limitation of poor generalization in predicting ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties, a challenge stemming from data insufficiency in traditional machine learning approaches. Our architecture innovatively employs a dual-channel system: an XLNet-based module for deep capture of molecular sequence semantics, an RNN-based component for efficient extraction of small molecule properties, and ECFP fingerprints to comprehensively encode structural features. Following intensive pre-training, DCPM-ADMET demonstrates superior predictive accuracy across multiple benchmark molecular property datasets. Furthermore, we fine-tuned this model on a proprietary, large-scale database of 465,470 entries covering 97 ADMET endpoints. By integrating the resultant 97 prediction models with 36 calculated physicochemical properties, we have deployed a free, high-throughput online ADMET prediction tool with 133 endpoints, which is set to become an essential resource for guiding early-stage drug discovery and safety assessment.

Article activity feed