A Diffusion-Based Autoencoder for Learning Patient-Level Representations from Single-Cell Data

Rebecca Boiarsky
Johann Wenckstern
Nicholas J. Haradhvala
Gad Getz
David Sontag

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Single-cell RNA sequencing (scRNA-seq) offers insights into cellular heterogeneity and tissue composition, yet leveraging this data for patient-level clinical predictions remains challenging due to the set-structured nature of single-cell data, as well as the scarcity of labeled samples. To address these challenges, we introduce scSet, a diffusion-based autoencoder that learns patient-level representations from sets of single-cell transcriptomes. Our method uses a transformer-based encoder to process variably sized and unordered cell inputs, coupled with a conditional diffusion decoder for self-supervised learning on unlabeled data. By pre-training on large-scale unlabeled datasets, scSet generates robust patient representations that can be fine-tuned for downstream clinical prediction tasks. We demonstrate the effectiveness of scSet patient embeddings for clinical prediction across multiple real-world datasets, where they outperform existing patient representations, even with limited labeled data. This work represents an important step toward bridging the gap between single-cell resolution and patient-level insights. Code is available at https://github.com/clinicalml/scset .

Version published to 10.1101/2025.08.21.671613 on bioRxiv
Aug 25, 2025

Accurate, scalable, and unified single-cell atlas integration with scBIOT

This article has 2 authors:
1. Haihui Zhang
2. Peiwu Qin
This article has no evaluationsLatest version Jan 19, 2026
Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

This article has 2 authors:
1. Xiuwei Zhang
2. Yuqi Cheng
This article has no evaluationsLatest version Dec 10, 2025
Attention-Based Hierarchical Graph Autoencoder for Dose-Specific Single-Cell Resistance Dynamics

This article has 3 authors:
1. Sachit Satyal
2. Teng Long
3. Jean Gao
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Accurate, scalable, and unified single-cell atlas integration with scBIOT

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

Attention-Based Hierarchical Graph Autoencoder for Dose-Specific Single-Cell Resistance Dynamics