scValue: value-based subsampling of large-scale single-cell transcriptomic data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Large single-cell RNA sequencing (scRNA-seq) datasets have the potential to drive significant biological dis-coveries but present computational and memory challenges for visualisation and analysis. Existing subsampling methods aim to improve efficiency but may not guarantee performance in downstream machine/deep learning tasks.
Results
Here, we introduce scValue, a Python package designed for value-based subsampling of large scRNA-seq datasets. scValue prioritises cells of higher value (indicating greater utility for cell type identification) over cells of lower value, and allocates more representation in subsamples to cell types with greater value variability. Using publicly available datasets ranging from tens of thousands to millions of cells, we show that scValue provides fast computation, enhances cell type separation, and maintains balanced cell type proportions. These capabilities support effective downstream learning tasks, including automatic cell type annotation, label transfer, and label harmonisation across datasets.
Availability
scValue is an open-source Python package available for installation via pip from https://pypi.org/project/scvalue/ , with the source code freely accessible at https://github.com/LHBCB/scvalue .
CONTACT
cds@ism.pumc.edu.cn