Conformal Prediction of Molecule-induced Cancer Cell Growth Inhibition Challenged by Strong Distribution Shifts
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The drug discovery process often employs phenotypic and target-based virtual screening to identify potential drug candidates. Despite the longstanding dominance of target-based approaches, phenotypic virtual screening is undergoing a resurgence due to its potential being now better understood. In the context of cancer cell lines, a well-established experimental system for phenotypic screens, molecules are tested to identify their whole-cell activity, as summarized by their half-maximal inhibitory concentrations. Machine learning has emerged as a potent tool for computationally guiding such screens, yet important research gaps persist, including generalization and uncertainty quantification. To address this, we leverage a clustering-based validation approach, called Leave Dissimilar Molecules Out (LDMO). This strategy enables a more rigorous assessment of model generalization to structurally novel compounds. This study focuses on applying Conformal Prediction (CP), a model-agnostic framework, to predict the activities of novel molecules on specific cancer cell lines. A total of 4320 independent models were evaluated across 60 cell lines, 5 CP variants, 2 set features, and training-test splits, providing strong and consistent results. From this comprehensive evaluation, we concluded that, regardless of the cell line or model, novel molecules with smaller CP-calculated confidence intervals tend to have smaller predicted errors once measured activities are revealed. It was also possible to anticipate the activities of dissimilar test molecules across 50 or more cell lines. These outcomes demonstrate the robust efficacy that LDMO-based models can achieve in realistic and challenging scenarios, thereby providing valuable insights for enhancing decision-making processes in drug discovery.