Rethinking Evaluation in Compound Potency Prediction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Regression tasks are essential in many fields, including chemistry, where property prediction models are used to prioritize chemical compounds for experimental testing. In this context, it is common to maximize properties, such as potency, which measures the extent to which substances achieve their intended biological effect. Conventional evaluation metrics and loss functions used to train prediction models for these tasks prioritize average performance, assuming that all domain values are equally relevant. However, this does not correspond to real-world expectations. In this paper, we argue the urgent need to reassess current evaluation and optimization practices in drug discovery with implications for tasks beyond the chemistry domain where non-uniform domain preferences are observed. Here, we use data on ten potency classes to compare model outcomes optimized and selected using traditional loss functions and distribution-specific methods: a feature space design method and a recent function that accounts for non-uniform domain preferences. Our empirical results show that models using the latter methods identify more unique and better-performing compounds when compared to models optimized with traditional tools. While identifying relevant compounds detected by other models, results show that accounting for non-uniform domain preferences enhances predictive performance in the most relevant cases and more effectively distinguishes between less and more relevant instances. Critically, our results underscore the importance of reevaluating the optimization and evaluation methods used in critical domains such as chemistry and their impact on other natural and physical domains.