Predicting cellular responses to perturbation across diverse contexts with State
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Cellular responses to perturbations are a cornerstone for understanding biological mechanisms and selecting drug targets. While machine learning models offer tremendous potential for predicting perturbation effects, they currently struggle to generalize to unobserved cellular contexts. Here, we introduce S tate , a transformer model that predicts perturbation effects while accounting for cellular heterogeneity within and across experiments. S tate predicts perturbation effects across sets of cells and is trained using gene expression data from over 100 million perturbed cells. S tate improved discrimination of effects on large datasets by more than 30% and identified differentially expressed genes across genetic, signaling and chemical perturbations with significantly improved accuracy. Using its cell embedding trained on observational data from 167 million cells, S tate identified strong perturbations in novel cellular contexts where no perturbations were observed during training. We further introduce Cell-Eval, a comprehensive evaluation framework that highlights S tate ’s ability to detect cell type-specific perturbation responses, such as cell survival. Overall, the performance and flexibility of S tate sets the stage for scaling the development of virtual cell models.