Learning Genetic Perturbation Effects with Variational Causal Inference

Emily Liu
Jiaqi Zhang
Caroline Uhler

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Advances in sequencing technologies have enhanced the understanding of gene regulation in cells. In particular, Perturb-seq has enabled high-resolution profiling of the transcriptomic response to genetic perturbations at the single-cell level. This understanding has implications in functional genomics and potentially for identifying therapeutic targets. Various computational models have been developed to predict perturbational effects. While deep learning models excel at interpolating observed perturbational data, they tend to overfit and may not generalize well to unseen perturbations. In contrast, mechanistic models, such as linear causal models based on gene regulatory networks, hold greater potential for extrapolation, as they encapsulate regulatory information that can predict responses to unseen perturbations. However, their application has been limited to small studies due to overly simplistic assumptions, making them less effective in handling noisy, large-scale single-cell data. We propose a hybrid approach that combines a mechanistic causal model with variational deep learning, termed Single Cell Causal Variational Autoencoder (SCCVAE). The mechanistic model employs a learned regulatory network to represent perturbational changes as shift interventions that propagate through the learned network. SCCVAE integrates this mechanistic causal model into a variational autoencoder, generating rich, comprehensive transcriptomic responses. Our results indicate that SCCVAE exhibits superior performance over current state-of-the-art baselines for extrapolating to predict unseen perturbational responses. Additionally, for the observed perturbations, the latent space learned by SCCVAE allows for the identification of functional perturbation modules and simulation of single-gene knockdown experiments of varying penetrance, presenting a robust tool for interpreting and interpolating perturbational responses at the single-cell level.

Author summary

Understanding how genes interact and respond to perturbations is crucial for uncovering the mechanisms of cells and identifying potential ways to treat diseases. Recent advances in sequencing technologies now allow us to measure how individual cells react when specific genes are altered. However, making sense of this complex data requires advanced computational tools. In our work, we address the challenge of predicting how cells respond to potentially new untested genetic perturbations. We noticed that while deep learning models perform well on data measured before, they struggle with making predictions on new cases. On the other hand, models based on biological understanding can, in theory, make better predictions, but they often rely on overly simple assumptions that do not hold with real-world data. We developed a new method that combines the strengths of both approaches. Our model, called SCCVAE, uses knowledge of gene networks together with deep learning to better predict how cells will respond to gene changes. It can simulate new experiments and help identify groups of genes that work together. This tool could be valuable for researchers studying perturbational changes, as well as gene functions and diseases.

Version published to 10.1101/2025.06.05.657988 on bioRxiv
Jun 5, 2025

Understanding Pathways in Bioinformatics, Genomics, and Health Applications

This article has 1 author:
1. Diptarup Mallick
This article has no evaluationsLatest version Jan 19, 2026
ST-LDAW: A Topic-Model and Damped Weighted Least-Squares Method for Integrative Deconvolution of Single-Cell and Spatial Transcriptomics

This article has 8 authors:
1. Xiaoyang Wang
2. Dongmei Ai
3. Li C. Xia
4. HuiLing Liu
5. Lulu Chen
6. Zhimin Li
7. Yang Du
8. Yujia Li
This article has no evaluationsLatest version Jan 13, 2026
Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features

This article has 2 authors:
1. Xiuwei Zhang
2. Yuqi Cheng
This article has no evaluationsLatest version Dec 10, 2025

Discuss this preprint

Listed in

Abstract

Author summary

Article activity feed

Related articles

Understanding Pathways in Bioinformatics, Genomics, and Health Applications

ST-LDAW: A Topic-Model and Damped Weighted Least-Squares Method for Integrative Deconvolution of Single-Cell and Spatial Transcriptomics

Discovering cell types and states from reference atlases with heterogeneous single-cell ATAC-seq features