Comparing Massively-Multitask Regression Algorithms for Drug Discovery

Eric J Martin
Xiang-Wei Zhu
Patrick Riley
Steven Kearnes
Ekaterina A Sosnina
Li Tian
Zijian Wang
Ying Wei
Thomas M Whitehead
Gareth J Conduit
Matthew D Segall

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Massively-multitask regression models (MMRMs) have revolutionized activity prediction for drug discovery. MMRMs trained on millions of compounds and many thousands of assays can predict bioactivity with accuracy comparable to 4-concentration IC ₅₀ experiments. This report compares six MMRMs: pQSAR, Alchemite, MT-DNN, MetaNN, Macau and IMC. Models were trained by experts in each method, on identical sets of 159 kinase and 4276 diverse ChEMBL assays, employing the same, realistically novel, training/test set splits.MMRMs performed much better than single-task random forest regression (ST-RFR) models for our use-case of imputing full bioactivity profiles for the very sparse compound collection on which the models were trained. Five MMRMs train all models simultaneously, so must leave out test-set measurements for all assays to avoid leakage (i.e. 25% of data). One method trains models one-at-a-time, and trains on all but the test data for that assay (< 1% of data). All algorithms were compared both using 75/25 splits, and when possible, 99+/<1 splits. Many evaluations achieved similar accuracy when tested on the same split. When evaluated on 75/25 splits, all MMRMs performed much worse than when evaluated on 99+/<1% splits. Thus, while many produce comparable high-accuracy final production models (trained on all the data), models that require 75/25 splits cannot evaluate the accuracy of those final models.While outstanding for imputations, MMRMs proved little better than ST-RFR for compounds very unlike the training collection. Thus, MMRMs are best for hit-finding, off-target, promiscuity, MoA, polypharmacology or drug-repurposing within the training collection. Besides accuracy, other pros and cons of each method are discussed.

Version published to 10.21203/rs.3.rs-7482715/v1 on Research Square
Sep 22, 2025

MOAST: Mechanism of Action Similarity Tool

This article has 5 authors:
1. Akshar Lohith
2. Derfel Terciano
3. Adam Murray
4. John B. MacMillan
5. R. Scott Lokey
This article has no evaluationsLatest version Sep 19, 2025
Modeling Drug-Drug Interactions Using Graph Attention Networks and Latent Alignment for Unsupervised Severity Prediction

This article has 2 authors:
1. Ritwik Raj Saxena
2. Ritcha Saxena
This article has no evaluationsLatest version Sep 19, 2025
Intelligent QSAR Approaches: Harnessing Machine Learning for Early Detection of Carcinogenic Agents

This article has 5 authors:
1. Roy Tatenda Bisenti
2. Tanaka Denzel Chitsa
3. Amos Misi
4. Paul Mushonga
5. Albert Wakandigara
This article has no evaluationsLatest version Aug 26, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

MOAST: Mechanism of Action Similarity Tool

Modeling Drug-Drug Interactions Using Graph Attention Networks and Latent Alignment for Unsupervised Severity Prediction

Intelligent QSAR Approaches: Harnessing Machine Learning for Early Detection of Carcinogenic Agents