Comparing Massively-Multitask Regression Algorithms for Drug Discovery

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Massively-multitask regression models (MMRMs) have revolutionized activity prediction for drug discovery. MMRMs trained on millions of compounds and many thousands of assays can predict bioactivity with accuracy comparable to 4-concentration IC 50 experiments. This report compares six MMRMs: pQSAR, Alchemite, MT-DNN, MetaNN, Macau and IMC. Models were trained by experts in each method, on identical sets of 159 kinase and 4276 diverse ChEMBL assays, employing the same, realistically novel, training/test set splits.MMRMs performed much better than single-task random forest regression (ST-RFR) models for our use-case of imputing full bioactivity profiles for the very sparse compound collection on which the models were trained. Five MMRMs train all models simultaneously, so must leave out test-set measurements for all assays to avoid leakage (i.e. 25% of data). One method trains models one-at-a-time, and trains on all but the test data for that assay (< 1% of data). All algorithms were compared both using 75/25 splits, and when possible, 99+/<1 splits. Many evaluations achieved similar accuracy when tested on the same split. When evaluated on 75/25 splits, all MMRMs performed much worse than when evaluated on 99+/<1% splits. Thus, while many produce comparable high-accuracy final production models (trained on all the data), models that require 75/25 splits cannot evaluate the accuracy of those final models.While outstanding for imputations, MMRMs proved little better than ST-RFR for compounds very unlike the training collection. Thus, MMRMs are best for hit-finding, off-target, promiscuity, MoA, polypharmacology or drug-repurposing within the training collection. Besides accuracy, other pros and cons of each method are discussed.

Article activity feed