In Silico Generation of Gene Expression profiles using Diffusion Models

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

RNA-seq data is used for precision medicine (e.g., cancer predictions), which benefits from deep learning approaches to analyze complex gene expression data. However, transcriptomics datasets often have few samples compared to deep learning standards. Synthetic data generation is thus being explored to address this data scarcity. So far, only deep generative models such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) have been used for this aim. Considering the recent success of diffusion models (DM) in image generation, we propose the first generation pipeline that leverages the power of said diffusion models.

Results

This paper presents two state-of-the-art diffusion models (DDPM and DDIM) and achieves their adaptation in the transcriptomics field. DM-generated data of L1000 landmark genes show better predictive performance over TCGA and GTEx datasets. We also compare linear and nonlinear reconstruction methods to recover the complete transcriptome. Results show that such reconstruction methods can boost the performances of diffusion models, as well as VAEs and GANs. Overall, the extensive comparison of various generative models using data quality indicators shows that diffusion models perform best and second-best, making them promising synthetic transcriptomics generators.

Availability and implementation

Data processing and full code available at: https://forge.ibisc.univevry.fr/alacan/rna-diffusion.git

Contact

alice.lacan@univ-evry.fr

Supplementary information

Supplementary data are available at BioRxiv online.

Article activity feed