Machine Learning-Based Prediction of SARS-CoV-2 Bioactivity: Integrating IC50 Regression and Activity Classification Using Multi-Task Neural Networks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate prediction of compound bioactivity is essential for accelerating antiviral drug discovery and reducing experimental costs. Machine learning (ML) methods have shown considerable promise in modeling structure–activity relationships and compound potency. In this study, we present an integrated ML framework for predicting IC50 and pIC50 values of compounds active against SARS-CoV-2, key indicators of antiviral potency. The proposed framework comprises three complementary approaches: (i) a regression model for quantitative IC50 prediction validated against experimental data; (ii) a classification model that categorizes compounds into active and inactive classes to support compound prioritization; and (iii) a multi-task neural network that jointly performs IC50 regression and activity classification, enhancing predictive performance and interpretability. A distinctive feature of this work is the incorporation of ligand efficiency (LE) as a criterion for activity classification, offering an alternative perspective on compound prioritization that has not been previously explored in SARS-CoV-2 bioactivity modeling. The proposed models demonstrate strong predictive capability, achieving a coefficient of determination (R2) of 0.77 using a neural network with feature selection, while the Random Forest classifier attains an accuracy, precision, and recall of approximately 0.92. These results highlight the potential of integrated regression, classification, and multi-task learning approaches as scalable and cost-effective tools for SARS-CoV-2 bioactivity prediction and antiviral drug discovery.