Transfer learning applied in predicting small molecule bioactivity
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Despite over a half-century of effort by computational chemists, developing accurate empirical QSAR (Quantitative Structure-Activity Relationships) models for predicting bioactivity directly from chemical structure has remained elusive. The difficulties have been especially pronounced for virtual screening, finding new active compounds substantially different from the known chemical matter used to train the models. Recent breakthroughs have been achieved by employing transfer of learning across huge numbers of bioactivity assays, greatly increasing the amount and diversity of chemical and biological information that informs each model. An early example was Profile-QSAR (pQSAR), a 2-level stacked model, where level-2 PLS (Partial Least Squares) models characterize compounds by their profile of bioactivity predictions from individual level-1 random forest regression QSAR models built on up to 10,000+ other assays. This study introduces metaNN, a meta-learner that trains deep neural networks (DNN) for each individual assay initialised from a well-generalized consensus DNN optimized across all assays. Comparison of the results suggested that while Profile-QSAR and metaNN perform similarly overall, metaNN works slightly better for smaller assays which were well-predicted by the consensus DNN; whereas pQSAR struggled more with smaller assays, due to the large number of level-1 models but was less sensitive to similarity to an overall consensus. An ensemble average of both methods combined the strengths of each, working better than either alone. The similar performance of the 2 largely orthogonal algorithms raises questions about whether we are approaching a limit of prediction accuracy in transfer learning, for this application scenario.