Comprehensive and unbiased multiparameter high-throughput screening by compaRe finds effective and subtle drug responses in AML models

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This paper aims to address the current gap in the efficient analysis of large-scale multiparameter flow cytometry and other datasets. The authors offer a software toolkit with an efficient algorithm for comparing numerous samples at once. The study is well presented and is relevant to single cell analysis research.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article

Abstract

Large-scale multiparameter screening has become increasingly feasible and straightforward to perform thanks to developments in technologies such as high-content microscopy and high-throughput flow cytometry. The automated toolkits for analyzing similarities and differences between large numbers of tested conditions have not kept pace with these technological developments. Thus, effective analysis of multiparameter screening datasets becomes a bottleneck and a limiting factor in unbiased interpretation of results. Here we introduce compaRe, a toolkit for large-scale multiparameter data analysis, which integrates quality control, data bias correction, and data visualization methods with a mass-aware gridding algorithm-based similarity analysis providing a much faster and more robust analyses than existing methods. Using mass and flow cytometry data from acute myeloid leukemia and myelodysplastic syndrome patients, we show that compaRe can reveal interpatient heterogeneity and recognizable phenotypic profiles. By applying compaRe to high-throughput flow cytometry drug response data in AML models, we robustly identified multiple types of both deep and subtle phenotypic response patterns, highlighting how this analysis could be used for therapeutic discoveries. In conclusion, compaRe is a toolkit that uniquely allows for automated, rapid, and precise comparisons of large-scale multiparameter datasets, including high-throughput screens.

Article activity feed

  1. Evaluation Summary:

    This paper aims to address the current gap in the efficient analysis of large-scale multiparameter flow cytometry and other datasets. The authors offer a software toolkit with an efficient algorithm for comparing numerous samples at once. The study is well presented and is relevant to single cell analysis research.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

  2. Reviewer #1 (Public Review):

    Comprehensive and unbiased multiparameter high-throughput screening by compaRe finds effective and subtle drug responses in AML models by Hajkariim et al introduces a pipeline for pre-processing and analyzing data from multiplex flow cytometry and other technologies. Preprocessing steps include algorithms for correcting common sources of bias in such data. Another key feature is a robust approach to measuring cell similarity across samples. Among the strengths are that the manuscript is well-written, the analysis pipeline is well-motivated, and illustrated with apt examples. The similarity measure is very interesting as well.

    There are a few weaknesses as well. It is not completely clear to me how this pipeline agrees and disagrees with common practice in the field. References 1-3, cited to document ongoing analytic challenges, are all at least 5 years old. Comparisons to other approaches, including the use Jensen-Shannon Divergence for similarity, make a convincing case that the proposed method is both effective and computationally efficient, but it is not clear if the comparators represent true standard of practice, or mere straw men. Methodologies are complex and can be difficult to follow, especially the similarity measure.

  3. Reviewer #2 (Public Review):

    In this manuscript, Hajkarim et al developed compaRe, a user friendly software suite (written in R) for analyzing high-throughput, multi-parameter screening data. There are several modules included in the compaRe toolkit, which can be individually invoked to perform specific tasks, such as quality control, bias correction, pairwise comparisons, clustering and data visualization.

    Strengths:

    1 All of these modules are available as command-line version and a GUI version for users to use in data analysis, visualization and results interpretation.

    2. The authors showed the utility of their toolkit in analyzing multiparameter mass and flow cytometric data from AML and MDS patient samples. Through this analysis using compaRe, the authors showed that they can identify patient heterogeneity and drug response profiles.

    3. Overall, this is a well organized and written manuscript describing the development of the new compaRe toolkit. The method is clearly described, and the user manual/tutorial is easy to follow.

    4. It seems like compaRe will be a useful toolkit for the research community, which is eager for a one-stop pipeline for analyzing high-throughout multiparameter screening data.

    Weakness:

    1. However, the current manuscript lacks comparison with other existing tools/methods in analyzing mass and flow cytometric data.

  4. Reviewer #3 (Public Review):

    Hajkarim et al. implement an algorithm in their presented toolkit compaRe to compare samples based on the similarities of samples, distinct from the more commonly used meta-clustering approaches, such as PhenoGraph, or dimensional reduction with Jenssen-Shannon Divergence analysis. Similarities among samples are calculated based on the proportions of cells within a sample belonging to an n-dimensional "hypercubes" (or "hypergridding" that is actually mass-aware and not blind) that are stratified by expression levels for n number of markers. The authors demonstrate that this method is much more time-efficient, obviates subsampling, and is robust to batch effects. This method is particularly appropriate for large-scale datasets, facilitating the comparison of numerous samples which would be helpful in screening efforts. The manuscript is written and presented well.

    Major strengths:

    1. The study demonstrates sufficiently strong support for the toolkit's ability to determine similarity across samples and its computing efficiency with Figure 2, an important advantage of this tool.
    2. Compared to other approaches, the method is advantageous for identifying groups of samples that may be similar in a very large-scale dataset. CompaRe does not require (or make use of) manual expert annotation of meta-clusters. The workflow is efficient and unbiased.

    Major weakness:

    While the toolkit may clearly be useful in evaluating similarities across many samples, it does not seem to have clearly demonstrated its utility in exploring specific phenotypes in-depth within a high-parameter dataset.