DISCERN: deep single-cell expression reconstruction for improved cell clustering and cell subtype and state detection

Abstract

Background

Single-cell sequencing provides detailed insights into biological processes including cell differentiation and identity. While providing deep cell-specific information, the method suffers from technical constraints, most notably a limited number of expressed genes per cell, which leads to suboptimal clustering and cell type identification.

Results

Here, we present DISCERN, a novel deep generative network that precisely reconstructs missing single-cell gene expression using a reference dataset. DISCERN outperforms competing algorithms in expression inference resulting in greatly improved cell clustering, cell type and activity detection, and insights into the cellular regulation of disease. We show that DISCERN is robust against differences between batches and is able to keep biological differences between batches, which is a common problem for imputation and batch correction algorithms. We use DISCERN to detect two unseen COVID-19-associated T cell types, cytotoxic CD4 ⁺ and CD8 ⁺ Tc2 T helper cells, with a potential role in adverse disease outcome. We utilize T cell fraction information of patient blood to classify mild or severe COVID-19 with an AUROC of 80% that can serve as a biomarker of disease stage. DISCERN can be easily integrated into existing single-cell sequencing workflow.

Conclusions

Thus, DISCERN is a flexible tool for reconstructing missing single-cell gene expression using a reference dataset and can easily be applied to a variety of data sets yielding novel insights, e.g., into disease mechanisms.

SciScore for 10.1101/2022.03.09.483600: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
covid-lung & covid-blood: The COVID-19 data set we have previously published consists of blood and bronchoalveolar lavage (BAL) samples from four patients with bacterial pneumonia and eight patients with SARS-CoV-2 infection[16].	SARS-CoV-2 suggested: (Active Motif Cat# 91351, RRID:AB_2847848)
MAGIC is provided as a Python package.	Python suggested: (IPython, RRID:SCR_001658)
By modeling gene distributions as a noise model and also computing dropout probabilities of each gene, DCA is able to denoise and impute the missing counts by identifying and correcting dropout events. scImpute. [12]: …

SciScore for 10.1101/2022.03.09.483600: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
covid-lung & covid-blood: The COVID-19 data set we have previously published consists of blood and bronchoalveolar lavage (BAL) samples from four patients with bacterial pneumonia and eight patients with SARS-CoV-2 infection[16].	SARS-CoV-2 suggested: (Active Motif Cat# 91351, RRID:AB_2847848)
MAGIC is provided as a Python package.	Python suggested: (IPython, RRID:SCR_001658)
By modeling gene distributions as a noise model and also computing dropout probabilities of each gene, DCA is able to denoise and impute the missing counts by identifying and correcting dropout events. scImpute. [12]: Similarly to MAGIC, scImpute focuses on identifying cells that are similar, which is challenging due to the high sparsity of single-cell count matrices.	MAGIC suggested: (Magic, RRID:SCR_006406)
Clustering was performed using PARC [67] with default parameters except dist std local=1.5 and small pop=300.	PARC suggested: (Antibiotic Resistance Genes Database, RRID:SCR_007040)
The area under the receiver operating characteristic (AUROC) curve is computed with scikit-learn.	scikit-learn suggested: (scikit-learn, RRID:SCR_002577)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

The basic concept of utilizing a high-quality reference to improve lower quality data might be applied to many other research areas where technological limitations restrict biological insights. The usage of deep generative networks and other artificial intelligence methodology to infer information beyond what is technically measurable could be transformative in future biomedical research.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

DISCERN: deep single-cell expression reconstruction for improved cell clustering and cell subtype and state detection

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Results

Conclusions

Article activity feed

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

Integrative transcriptomic and single-cell analyses identify KLRD1 enrichment in exhausted CD8⁺ T cells in cutaneous squamous cell carcinoma

Accurate, scalable, and unified single-cell atlas integration with scBIOT

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Results

Conclusions

Article activity feed

Related articles

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

Integrative transcriptomic and single-cell analyses identify KLRD1 enrichment in exhausted CD8⁺ T cells in cutaneous squamous cell carcinoma

Accurate, scalable, and unified single-cell atlas integration with scBIOT