Scalable and universal prediction of cellular phenotypes enables in silico experiments
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Biological systems can be interrogated by perturbing individual components and observing the consequences across molecular, cellular, and phenotypic levels. The vast combinatorial space of possible perturbations and responses makes exhaustive experimentation infeasible. Recent advances in machine learning have shown that training on diverse datasets enables transfer learning across tasks, capturing patterns that generalize and improving performance on previously unseen problems. Inspired by this principle, we present Prophet, a transformer-based model pretrained on a vast, heterogeneous collection of perturbation experiments. This pretraining allows Prophet to predict the outcomes of untested genetic or chemical perturbations in novel cellular contexts, spanning phenotypes such as gene expression, viability, and morphology. By leveraging shared structure across apparently disconnected assays, Prophet provides a scalable framework for large-scale virtual screening and prioritization of informative experiments. Prophet consistently outperforms baseline models, including those trained on single phenotypes, showing that transfer learning between phenotypes not only is possible but improves predictive accuracy. Its capabilities extends to in vivo developmental systems, where it recapitulates known lineage biology and proposes new candidates. In a large-scale in silico screen for melanoma, Prophet identified and experimentally validated compounds with selective activity that mirrored clinically approved therapies, demonstrating its ability to transform perturbation biology into a predictive and scalable engine for therapeutic discovery.