Joint probabilistic modeling of pseudobulk and single-cell transcriptomics enables accurate estimation of cell type composition

Simon Grouard
Khalil Ouardini
Yann Rodriguez
Jean-Philippe Vert
Almudena Espin-Perez

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Bulk RNA sequencing provides an averaged gene expression profile of the numerous cells in a tissue sample, obscuring critical information about cellular heterogeneity. Computational deconvolution methods can estimate cell type proportions in bulk samples, but current approaches can lack precision in key scenarios due to simplistic statistical assumptions, limited modeling of cell-type heterogeneity and poor handling of rare populations. We present MixupVI, a deep generative model that learns representations of single-cell transcriptomic data and introduces a mixup-based regularization to enable reference-free deconvolution of bulk samples. Our method creates a latent representation with an additive property, where the representation of a pseudobulk sample corresponds to the weighted sum of its constituent cell types. We demonstrate how MixupVI enables accurate estimation of cell type proportions through benchmarking on pseudobulks simulated from a large immune single-cell atlas. To support reproducibility and foster progress in the field, we also release PyDeconv, a Python library that implements multiple state-of-the-art deconvolution algorithms and provides a comprehensive benchmark on simulated pseudobulk datasets.

Version published to 10.1101/2025.05.28.656123 on bioRxiv
Jun 1, 2025

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

This article has 2 authors:
1. Xiuwei Zhang
2. Yuqi Cheng
This article has no evaluationsLatest version Dec 10, 2025
ST-LDAW: A Topic-Model and Damped Weighted Least-Squares Method for Integrative Deconvolution of Single-Cell and Spatial Transcriptomics

This article has 8 authors:
1. Xiaoyang Wang
2. Dongmei Ai
3. Li C. Xia
4. HuiLing Liu
5. Lulu Chen
6. Zhimin Li
7. Yang Du
8. Yujia Li
This article has no evaluationsLatest version Jan 13, 2026
Microenvironment-aware transcriptome reconstruction in spatial transcriptomics

This article has 7 authors:
1. Shi-Tong Yang
2. Pai Peng
3. Hui-Feng He
4. Meng-Guo Wang
5. Bo-Han Si
6. Xiao-Fei Zhang
7. Luonan Chen
This article has no evaluationsLatest version Jan 13, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

ST-LDAW: A Topic-Model and Damped Weighted Least-Squares Method for Integrative Deconvolution of Single-Cell and Spatial Transcriptomics

Microenvironment-aware transcriptome reconstruction in spatial transcriptomics