Image3C, a multimodal image-based and label-independent integrative method for single-cell analysis

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This manuscript develops new software tools to analyze and classify single cells with high throughput based on single cell phenotyping using an existing imaging system. The authors show that tissues can be reproducibly decomposed into clusters of cells based on their feature space and that cell composition dynamics can be reliably detected. The main impact is to make single cell phenotyping more tractable, including for samples and organisms for which sequencing-based or fluorescent-labeling-based approaches are not readily available. Applicability was demonstrated in two research model organisms, zebrafish and freshwater snail.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Image-based cell classification has become a common tool to identify phenotypic changes in cell populations. However, this methodology is limited to organisms possessing well-characterized species-specific reagents ( e.g. , antibodies) that allow cell identification, clustering, and convolutional neural network (CNN) training. In the absence of such reagents, the power of image-based classification has remained mostly off-limits to many research organisms. We have developed an image-based classification methodology we named Image3C (Image-Cytometry Cell Classification) that does not require species-specific reagents nor pre-existing knowledge about the sample. Image3C combines image-based flow cytometry with an unbiased, high-throughput cell clustering pipeline and CNN integration. Image3C exploits intrinsic cellular features and non-species-specific dyes to perform de novo cell composition analysis and detect changes between different conditions. Therefore, Image3C expands the use of image-based analyses of cell population composition to research organisms in which detailed cellular phenotypes are unknown or for which species-specific reagents are not available.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    Key issues:

    • The main claim of the versatility of Image3C comes from the idea that it can extract image features even without reagents such as antibodies. The authors seems to have omitted a large body of work in the field of label-free imaging. There are many optical or computational methods to obtain useful cellular features without any chemical labels.

    We agree with the reviewer that there are many label-free imaging tools already published. We listed several of them in the new table (Table 1) where we compare label-free phenotyping and cell clustering approaches. We took into consideration on which samples the tool was tested, the need of prior knowledge of the sample and/or species-specific reagents at any point of the process, and the hardware and software required. As far as we know, our tool is the only one that does not require a priori knowledge of the sample and/or species-specific reagents at any point of the pipeline or for training the neural network. With this work, we provide the community with a tool that is able to perform de novo clustering and that does not require custom-made pieces of equipment (lines 137-142 and 618-627).

    • One core issue is that the entire pipeline is strongly dependent on the use of the ImageStream Mark II system. In particular, the cell image feature extraction step is performed by the IDEAS software that is associated with the imaging system. This limits the general applicability of the Image3C tool.

    We agree that the ImageStream is necessary for acquiring the images. This piece of equipment is already present in many Institutes/Departments and we hope that in coming years researchers will increasingly have more access to thiis type of equipment. We selected the ImageStream for our study because of the high reproducibility of image aquisition that allows comparison between multiple experiments performed in different days. Image3C could also be applied to microscopy images, but, in this case, users will have to control for batch effects if images are acquired from different slides or in different days. This is now discussed in the text (lines 137-142).

    • Image3C actually contains many separate pieces of computer code. The R code is available as many separate R scripts, and dependent R packages had to be installed manually prior to running the scripts. The clustering step requires a different Java tool called Vortex, which was developed by another group. The clustering results are then analysed by an R script. For data exploration, we also need a different tool called FCS Expression Plus. The set up of the CNN classifier require another set of python scripts. To ensure the Image3C pipeline is indeed as robust as it claims, the authors could consider converting their codes into R and Python packages, with streamlined installation instructions, and example code for users to test.

    We thank the reviewer for this comment. To make our tool more accessible, we have: 1) thoroughly revised the readme document in the GitHub page making it clearer and more detailed; 2) added tutorial videos to the GitHub page to show how to use some of the key software mentioned in the manuscript; 3) saved the R code as a markdown page to make it more user friendly; 4) renamed the example files that can be used to test our pipeline to make them more self-explanatory; 5) saved the Python code in the more user friendly Jupyter Notebooks; and 6) transformed Figure 1-supplement figure 1 in an interactive map where it is possible to click on the different steps of the Image3C pipeline and be automatically directed to the corresponding section of our GitHub repository page (lines 662-667). We also included in the Material and Methods a more detailed explanation for the rationale behind our choices of softwares/algorithms (lines 618-627 and 650-654).

    • In the software pipeline, cell phenotype feature extraction and clustering are performed by other existing tools. The other main new contribution appears to be training of a CNN model to perform cell-type classification, in which the cell type labels were defined by clusters identified in an unsupervised analysis. From a technical point of view, there is little novelty as all the methods are quite standard.

    We agree that some of the individual components were already available. However, when applying existing tools to our datasets, we did not obtain satisfactory and reproducible results. We then developed original R codes for clustering events based on the features extracted from IDEAS, and we created a pipeline that allows users to cluster cells in a given sample de novo without the need of cell-type markers and a priori knowledge of the tissue composition. Our pipeline allows users to take full advantage of such tools to characterize new cell populations coming from less established research organisms.

    • The authors claim that the CNN classifier was trained in an 'unsupervised' way. I have a strong reservation about this claim. In machine learning, classification by definition is supervised. The fact that cells are labelled by the cluster label does not make it an unsupervised task with respect to the classification task. I think what the authors really means is that the cell types (class labels) do not have to be known prior to analysing the sample.

    We thank the reviewer for pointing this out and we apologize for the mistake. The reviewer is correct, and we did not use the word “unsupervised” in the appropriate way. In the text, we rephrased all the sentences in which the CNN was mistakenly defined as “unsupervised”.

  2. Evaluation Summary:

    This manuscript develops new software tools to analyze and classify single cells with high throughput based on single cell phenotyping using an existing imaging system. The authors show that tissues can be reproducibly decomposed into clusters of cells based on their feature space and that cell composition dynamics can be reliably detected. The main impact is to make single cell phenotyping more tractable, including for samples and organisms for which sequencing-based or fluorescent-labeling-based approaches are not readily available. Applicability was demonstrated in two research model organisms, zebrafish and freshwater snail.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

  3. Reviewer #1 (Public Review):

    This paper presents a protocol and a set of open source computer scripts for performing high throughput imaging-based single cell phenotyping using the ImageStream Mark II system. Their contributions include new open-source computer codes to streamline data analysis, and a deep-learning based system to classify cell types based on previously generated single-cell phenotype profiles. The authors demonstrated the applicability of their tools by performing single cell phenotyping of zebrafish and a freshwater snail P. canaliculata. The core claim is that their tool, termed Image3C, expands the use of image-based analyses of cell population composition to research organisms in which detailed cellular phenotypes are unknown or for which specific reagents are not available (abstract).

    Key strength:
    - The source code is made available via github.

    - The demonstration on the zebrafish and freshwater snail data provides a convincing case for their applicability.

    Key issues:
    - The main claim of the versatility of Image3C comes from the idea that it can extract image features even without reagents such as antibodies. The authors seems to have omitted a large body of work in the field of label-free imaging. There are many optical or computational methods to obtain useful cellular features without any chemical labels.

    - One core issue is that the entire pipeline is strongly dependent on the use of the ImageStream Mark II system. In particular, the cell image feature extraction step is performed by the IDEAS software that is associated with the imaging system. This limits the general applicability of the Image3C tool.

    - Image3C actually contains many separate pieces of computer code. The R code is available as many separate R scripts, and dependent R packages had to be installed manually prior to running the scripts. The clustering step requires a different Java tool called Vortex, which was developed by another group. The clustering results are then analysed by an R script. For data exploration, we also need a different tool called FCS Expression Plus. The set up of the CNN classifier require another set of python scripts. To ensure the Image3C pipeline is indeed as robust as it claims, the authors could consider converting their codes into R and Python packages, with streamlined installation instructions, and example code for users to test.

    - In the software pipeline, cell phenotype feature extraction and clustering are performed by other existing tools. The other main new contribution appears to be training of a CNN model to perform cell-type classification, in which the cell type labels were defined by clusters identified in an unsupervised analysis. From a technical point of view, there is little novelty as all the methods are quite standard.

    - The authors claim that the CNN classifier was trained in an 'unsupervised' way. I have a strong reservation about this claim. In machine learning, classification by definition is supervised. The fact that cells are labelled by the cluster label does not make it an unsupervised task with respect to the classification task. I think what the authors really means is that the cell types (class labels) do not have to be known prior to analysing the sample.

  4. Reviewer #2 (Public Review):

    The authors present an image-based classification scheme of single cells that were imaged on a specific instrument (the ImageStream Mk2 imaging flow cytometer - a high throughput imaging analyzer analyzer by Amnis). Using a layered set of computational tools, sample data is extracted, scaled, normalized and clustered. The discretized representation can then further classified by convolutional neural networks, which can be the basis for efficient and quantitative inter-sample comparisons.

    When applied to whole kidney marrow (WKM) extracted from zebrafish and hemolymph from the non-model organism (the apple snail), clusters emerge that can be associated with specific cell types based on prior information.

    Importantly, experimental intervention (infection with S. aureus) allows the observation of cluster emergence and shifting of cellular proportions. Professional phagocytes are identified and functional studies demonstrate validity.

    The authors convincingly demonstrate that Image3C is capable of reliably and reproducibly clustering cells based on single cell imaging data and that a tissue's cell type composition can be effectively assessed using alignment to previously (or newly) trained classification models.

    The authors show that this analysis scheme will aid in quantitation of tissue compositions and is applicable to model-, as well as non-model organisms. The likely impact of this manuscript will be that it will become more tractable for researchers to quantitatively assess cellular composition of complex samples.