Cross-chemical and cross-species toxicity prediction: benchmarking and a novel 3D-structure-based deep learning model

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Prediction of a compound’s toxicity is a key step toward realizing animal-free testing of chemical compounds. Recent advances have yielded significant progress in computational toxicity prediction, including machine learning methods that utilize chemical fingerprints and deep learning-based latent representations. However, challenges remain, primarily due to the lack of clean training datasets and the inconsistent model performance. To address these challenges, we curated a comprehensive dataset of aquatic toxicity from seven data sources, which contains 50,603 records for 5,889 compounds across 2,285 different species, much larger than similar datasets used in previous studies. We also developed tox-learn , a Python library featuring tools for automated dataset cleaning, machine learning methods and performance evaluation. The library places special emphasis on avoiding overestimation of prediction accuracy caused by improper train-test data splitting. Based on this toolbox, we benchmarked various predictive models using different train-test splitting strategies on the curated dataset. Our results showed that the choice of machine learning method, molecular fingerprint, and train-test splitting strategy all significantly affect performance. We demonstrated that incorporating species information generally improved predictions, although the degree of improvement depended on how this information was represented. In addition, we developed a new 3D structure–based deep-learning model, 3DMol-Tox , which achieves regression accuracy comparable to the best 2D-structure based model (GPBoost) while exhibiting consistently higher within-one-bin (W1B) classification accuracy. Finally, we analyzed the impact of different train–test splitting strategies and provide recommendations based on our benchmarking, such as using structure-aware splitting to mitigate information leakage, a common issue that inflates reported model performance.

Article activity feed