CellDiffusion: a generative model to annotate single-cell and spatial RNA-seq using bulk references

Xiaochen Zhang
Jiadong Mao
Kim-Anh Lê Cao

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Annotating single-cell and spatial RNA-seq data can be greatly enhanced by leveraging bulk RNA-seq, which remains a cost-effective and well-established benchmark for characterising transcriptional activity in immune cell populations. However, a major technical hurdle lies in the contrasting properties of these data types: single-cell and spatial data are inherently sparse due to its cell-level sampling scheme, leading to much lower sequencing depth compared to bulk RNA-seq.

We developed CellDiffusion, a generative machine learning (ML) tool that bridges this gap. CellDiffusion generates realistic virtual cells to augment the sparse single-cell and spatial data, improving signals and the representation of rare cell types. The augmented data are more comparable to bulk references, increasing the accuracy of cell type annotation using bulk references and automated ML classifiers.

We benchmarked CellDiffusion on single-cell and spatial datasets from human peripheral blood samples, white adipose tissues, and breast tumours. Our method significantly outperforms state-of-the-art methods such as SingleR, Seurat, and scVI. In addition, CellDiffusion provides critical biological insights, including the identification of novel cell subtypes and their function during cell state transition; the discovery of new marker genes for tissue-resident immune cells, revealing their functional shifts in myeloid populations; and the accurate characterisation of cell subtypes in spatial transcriptomics to decipher tumour microenvironment.

Version published to 10.1101/2025.10.27.684671 on bioRxiv
Oct 28, 2025

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

This article has 2 authors:
1. Xiuwei Zhang
2. Yuqi Cheng
This article has no evaluationsLatest version Dec 10, 2025
Accurate, scalable, and unified single-cell atlas integration with scBIOT

This article has 2 authors:
1. Haihui Zhang
2. Peiwu Qin
This article has no evaluationsLatest version Jan 19, 2026
An integrated single-cell transcriptomic dataset for Mouse cortex

This article has 8 authors:
1. Xuefeng Shi
2. Zhihui Qi
3. Hong Huang
4. Zhiming Ye
5. YuMin Wu
6. Kahei Chan
7. Maojin Yao
8. Zhongxing Wang
This article has no evaluationsLatest version Dec 18, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

Accurate, scalable, and unified single-cell atlas integration with scBIOT

An integrated single-cell transcriptomic dataset for Mouse cortex