scValue: value-based subsampling of large-scale single-cell transcriptomic data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Large single-cell RNA sequencing (scRNA-seq) datasets have the potential to drive significant biological dis-coveries but present computational and memory challenges for visualisation and analysis. Existing subsampling methods aim to improve efficiency but may not guarantee performance in downstream machine/deep learning tasks.

Results

Here, we introduce scValue, a Python package designed for value-based subsampling of large scRNA-seq datasets. scValue prioritises cells of higher value (indicating greater utility for cell type identification) over cells of lower value, and allocates more representation in subsamples to cell types with greater value variability. Using publicly available datasets ranging from tens of thousands to millions of cells, we show that scValue provides fast computation, enhances cell type separation, and maintains balanced cell type proportions. These capabilities support effective downstream learning tasks, including automatic cell type annotation, label transfer, and label harmonisation across datasets.

Availability

scValue is an open-source Python package available for installation via pip from https://pypi.org/project/scvalue/ , with the source code freely accessible at https://github.com/LHBCB/scvalue .

CONTACT

cds@ism.pumc.edu.cn

Article activity feed