Improved marker detection for rare population in single-cell transcriptomics through text mining-inspired scoring approach

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate identification of cell types and states is a crucial step when analysing single cell RNA-seq data. Inaccurate cell type annotation leads to spurious results and biological interpretations in subsequent analyses. Cell type identification is traditionally done using established marker genes of each cell population. However, many existing methods do not perform well for imbalanced cell populations, which often occurs in the presence of rare cell types. Most existing methods tend to be biased towards the dominant cell populations or highly expressed genes, leading to inaccurate results. Here, we introduce the package smartid , which accurately identifies markers from even imbalanced data across batches by employing a modified Term Frequency-Inverse Document Frequency approach with Gaussian Mixture Model. smartid is also a gene-set scoring method which is able to distinguish the target group of interest. smartid is implemented in R and is freely available on Bioconductor at https://bioconductor.org/packages/release/bioc/html/smartid.html .

Article activity feed