A systematic comparison of novel and existing differential analysis methods for CyTOF data
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Cytometry techniques are widely used to discover cellular characteristics at single-cell resolution. Many data analysis methods for cytometry data focus solely on identifying subpopulations via clustering and testing for differential cell abundance. For differential expression analysis of markers between conditions, only few tools exist. These tools either reduce the data distribution to medians, discarding valuable information, or have underlying assumptions that may not hold for all expression patterns. Here, we systematically evaluated existing and novel approaches for differential expression analysis on real and simulated CyTOF data. We found that methods using median marker expressions compute fast and reliable results when the data are not strongly zero-inflated. Methods using all data detect changes in strongly zero-inflated markers, but partially suffer from overprediction or cannot handle big datasets. We present a new method, CyEMD, based on calculating the earth mover’s distance between expression distributions that can handle strong zero-inflation without being too sensitive. Additionally, we developed CYANUS – CYtometry ANalysis Using Shiny – a user-friendly R Shiny App allowing the user to analyze cytometry data with state-of-the-art tools, including well-performing methods from our comparison. A public web interface is available at https://exbio.wzw.tum.de/cyanus/.
Article activity feed
-
-
SciScore for 10.1101/2021.08.09.455609: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The LMM method builds a linear mixed-effects model and can therefore handle random effects in contrast to the limma method where a grouping variable can be included only as an additional fixed effect (Weber et al. (2019)). limmasuggested: (LIMMA, RRID:SCR_010943)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:A limitation on the level of dataset evaluation is that we could not interpret the results obtained on the PBMC dataset biologically. We could therefore not …
SciScore for 10.1101/2021.08.09.455609: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The LMM method builds a linear mixed-effects model and can therefore handle random effects in contrast to the limma method where a grouping variable can be included only as an additional fixed effect (Weber et al. (2019)). limmasuggested: (LIMMA, RRID:SCR_010943)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:A limitation on the level of dataset evaluation is that we could not interpret the results obtained on the PBMC dataset biologically. We could therefore not describe which markers were falsely classified as differentially expressed and which markers were overlooked. Additionally, we did not include a dataset with batch effects but assumed that the data had already been corrected for it. Theoretically, it should be possible to include a batch effect as a random effect or additional term in a model. This can be done for all the approaches we evaluated but the statistical tests and CyEMD. Finally, the downsampling of the spiked and the CytoGLMM datasets was not repeated multiple times. If that would have been done, the results would be more reliable and robust. In this study, repeating the evaluations that many times was not feasible because of the high runtime requirement of BEZI and ZAGA. All in all, the diffcyt methods perform fast and yield good, trustworthy results when the median of the differentially expressed marker is unequal to zero. Nevertheless, they did not outperform a simple, Wilcoxon signed-rank test or t-test on the medians, meaning that a more complicated model is not certainly necessary to detect significant differences in CyTOF marker medians. The comparison with the Kruskal-Wallis test on marker medians shows that the clear advantage of the Wilcoxon/t-test is the ability to compute a paired test statistic. Regarding the cytoGLMM methods, we observe that smal...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- No funding statement was detected.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-