A Pooled Cell Painting CRISPR Screening Platform Enables de novo Inference of Gene Function by Self-supervised Deep Learning

Srinivasan Sivanandan
Bobby Leitmann
Eric Lubeck
Mohammad Muneeb Sultan
Panagiotis Stanitsas
Navpreet Ranu
Alexis Ewer
Jordan E. Mancuso
Zachary F Phillips
Albert Kim
John W. Bisognano
John Cesarek
Fiorella Ruggiu
David Feldman
Daphne Koller
Eilon Sharon
Ajamete Kaykas
Max R. Salick
Ci Chu

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

Pooled CRISPR screening has emerged as a powerful method of mapping gene functions thanks to its scalability, affordability, and robustness against well or plate-specific confounders present in array-based screening ^1–6 . Most pooled CRISPR screens assay for low dimensional phenotypes (e.g. fitness, fluorescent markers). Higher-dimensional assays such as perturb-seq are available but costly and only applicable to transcriptomics readouts ^7–11 . Recently, pooled optical screening, which combines pooled CRISPR screening and microscopy-based assays, has been demonstrated in the studies of the NFkB pathway, essential human genes, cytoskeletal organization and antiviral response ^12–15 . While the pooled optical screening methodology is scalable and information-rich, the applications thus far employ hypothesis-specific assays. Here, we enable hypothesis-free reverse genetic screening for generic morphological phenotypes by re-engineering the Cell Painting ¹⁶ technique to provide compatibility with pooled optical screening. We validated this technique using well-defined morphological genesets (124 genes), compared classical image analysis and self-supervised learning methods using a mechanism-of-action (MoA) library (300 genes), and performed discovery screening with a druggable genome library (1640 genes) ¹⁷ . Across these three experiments we show that the combination of rich morphological data and deep learning allows gene networks to emerge without the need for target-specific biomarkers, leading to better discovery of gene functions.

Arcadia Science
Dec 22, 2023

Wereport the number of gene KOs with an AU-ROC > 0.55

Why 0.55 and not 0.5?

Read the original source
Arcadia Science
Dec 22, 2023

We trained a ViT-small model with patch size = 8, number of global crops = 2, number of local crops = 8on 4 nodes x 8 NVIDIA-V100 GPUs per node (32 GPUs) for 100 epochs

would it be possible (and meaningful) to mention how many GPU hours this required? Also, some more details would be helpful for non-ML experts; e.g., why the choice of 100 epochs, was a stopping criterion used, which epoch was used for the final analysis/results, etc.

Read the original source
Arcadia Science
Dec 22, 2023

we re-parameterized the first layer ofthe model as:

This equation is a bit opaque; it would be helpful to explain what the superscripts and subscripts of theta mean.

Read the original source
Arcadia Science
Dec 22, 2023

(both ~1-1.5million cell tile images)

Does the 1-1.5m figure mean single-cell images? or FOVs? It would also be super helpful to comment on how this dataset size was chosen. Was it the minimum amount of data required for this level of performance? More generally, did you do any experiments varying the quantity or diversity of the training data?

Read the original source
Arcadia Science
Dec 22, 2023

The superior performance of CP-DINO 1640 is unlikely a result oftrivial memorization, as the 1640-genes druggable genome library and 300-genes MoA library sharesimilar numbers of overlapping genes with the 124 PoC library (30 and 26 genes respectively).

I think to make this claim more convincing, it would be important to show how many genes in the 1640 library are very similar to (rather than merely identical to) genes in the 124 PoC library ("very similar" is obviously subjective but I'm thinking of homologs/paralogs or genes that are components of the same complex or pathway)

Read the original source
Arcadia Science
Dec 22, 2023

nti-phospho-S6 (pS6) antibodywith AlexaFluor 750-conjugated secondary antibody was used in the 6th channel as an establishedbiomarker

it would be helpful to mention here what cellular structures of features the pS6 antibody labels, and also (for the non-biologists among us) what mTORC1 is

Read the original source
Arcadia Science
Dec 22, 2023

Nevertheless, CP-DINO 300 trained on bioimaging data yielded a moreinformative embedding that has higher median prediction accuracy than the other two models (Fig.S4a-b), and correctly classified more perturbations with better accuracy (Fig. 4c). CP-DINO 300 alsorecovered more known biological relationships from StringDB as measured by cosine similarity of theaggregate gene KO embeddings (Methods) than the other two models (Fig. 4d)

It's awesome to see such an explicit and direct comparison of classic feature engineering with modern unsupervised ML models!

If possible it would be great to quantify how much better the DINO-based approach is; Figures 4a-d are a bit hard to understand at first and obscure the relative differences; Fig 4d in particular doesn't give the impression that DINO is that much better than the CellStats approach …

Nevertheless, CP-DINO 300 trained on bioimaging data yielded a moreinformative embedding that has higher median prediction accuracy than the other two models (Fig.S4a-b), and correctly classified more perturbations with better accuracy (Fig. 4c). CP-DINO 300 alsorecovered more known biological relationships from StringDB as measured by cosine similarity of theaggregate gene KO embeddings (Methods) than the other two models (Fig. 4d)

It's awesome to see such an explicit and direct comparison of classic feature engineering with modern unsupervised ML models!

If possible it would be great to quantify how much better the DINO-based approach is; Figures 4a-d are a bit hard to understand at first and obscure the relative differences; Fig 4d in particular doesn't give the impression that DINO is that much better than the CellStats approach (even though the 0.12 of DINO vs the 0.09 of CellStats is actually a 30% improvement!). Also, some measure of statistical significance would be helpful; in particular, how likely is it that the 0.09 vs 0.12 in Fig 4d is reproducible?

Read the original source
Arcadia Science
Dec 22, 2023

phenotypic clustering of genes by their annotated mechanism of action,

It feels like there's a typo here somewhere, since genes don't really have a "mechanism of action" and the screen here does not involve compounds but rather gene KOs. Is the idea to use the phenotype of the KOs to cluster genes by the MoA of the compounds that target them? In any case, the reference to MoAs here is doubly confusing because the clustering shown in Fig 4E appears to capture cellular localization (and also pathway membership?), but I couldn't see any discussion of the clustering relative to the MoAs of the compounds used to select the 300 genes

Read the original source
Arcadia Science
Nov 3, 2023

otentially novel regulators of mTORC1

it's very cool that previously unknown genes contributing to a pathway can be identified!

Read the original source
Arcadia Science
Nov 3, 2023

a-b.Comparison of feature embedding methodologies based on median AUC of binary classification of KOfrom WT for each genetic perturbation.

It looks like the majority of the genes have AUC ~= 0.5; what is the interpretation of that? Does that mean that most gene KOs tested do not exhibit a phenotype distinguishable from wild-type?

Read the original source
Arcadia Science
Nov 3, 2023

he field of view images are then cropped around thecentroids of each of the segmented nuclei and masked by the corresponding cell segmentation mask tocreate tiles with a single cell in context

are pixels that are outside the mask painted with zeros on all channels?

Read the original source
Version published to 10.1101/2023.08.13.553051 on bioRxiv
Aug 15, 2023

Insights from pooled CRISPRi single-cell screens in K562 cells reveal gene functions, regulatory networks, and highlight opportunities and limitations

This article has 4 authors:
1. Hong Zhang
2. Peifen Zhang
3. Eric Bindels
4. Eskeatnaf Mulugeta
This article has no evaluationsLatest version Oct 23, 2025
RESTRICT-seq enables time-gated CRISPR screens and uncovers novel epigenetic dependencies of SCC resistance

This article has 6 authors:
1. Selahattin Can Ozcan
2. Dreyton G Amador
3. Justin Anthony Powers
4. Ashley G. Njiru
5. Zahra Ansari
6. Yvon Woappi
This article has no evaluationsLatest version Sep 20, 2025
A yeast mating-based platform enables the generation and screening of ultra large antibody libraries

This article has 2 authors:
1. Lester Frei
2. Sai T. Reddy
This article has no evaluationsLatest version Oct 7, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Insights from pooled CRISPRi single-cell screens in K562 cells reveal gene functions, regulatory networks, and highlight opportunities and limitations

RESTRICT-seq enables time-gated CRISPR screens and uncovers novel epigenetic dependencies of SCC resistance

A yeast mating-based platform enables the generation and screening of ultra large antibody libraries