Biologically informed variational autoencoders allow predictive modeling of genetic and drug-induced perturbations

Daria Doncevic
Carl Herrmann

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (Arcadia Science)

Abstract

Motivation

Variational autoencoders (VAEs) have rapidly increased in popularity in biological applications and have already successfully been used on many omic datasets. Their latent space provides a low-dimensional representation of input data, and VAEs have been applied, e.g. for clustering of single-cell transcriptomic data. However, due to their non-linear nature, the patterns that VAEs learn in the latent space remain obscure. Hence, the lower-dimensional data embedding cannot directly be related to input features.

Results

To shed light on the inner workings of VAE and enable direct interpretability of the model through its structure, we designed a novel VAE, OntoVAE (Ontology guided VAE) that can incorporate any ontology in its latent space and decoder part and, thus, provide pathway or phenotype activities for the ontology terms. In this work, we demonstrate that OntoVAE can be applied in the context of predictive modeling and show its ability to predict the effects of genetic or drug-induced perturbations using different ontologies and both, bulk and single-cell transcriptomic datasets. Finally, we provide a flexible framework, which can be easily adapted to any ontology and dataset.

Availability and implementation

OntoVAE is available as a python package under https://github.com/hdsu-bioquant/onto-vae.

Version published to 10.1093/bioinformatics/btad387
Jun 1, 2023
Arcadia Science
Apr 14, 2023

The resulting top terms after this trimming define the latent space.

What is the expected distribution of weights in the latent space? Would a discriminator network to impose different distributions be useful here?

Read the original source
Arcadia Science
Apr 14, 2023

o verify the validity of these predictions, we performed a gene-set enrichment analysis (GSEA) using as a ground truth the differentially expressed genes in a recently published dataset of bulk RNA-seq carried out on muscle samples from LGMD patients (n=16) and healthy individuals (n=15)25, where we had determined the genes that were significantly up- (LGMD_up) or downregulated (LGMD_dn) in patients compared to age-matched controls (Supplementary Table 3).

The significance of the difference in gene expression will be related to the size of the effect on the expression, but many genes that influence a phenotype may only show small changes in expression level. How well does this model deal with genes that show small changes in expression? Would this miss genes that show small changes in expression but are nevertheless important?

Read the original source
Arcadia Science
Dec 10, 2022

o verify the validity of these predictions, we performed a gene-set enrichment analysis (GSEA) using as a ground truth the differentially expressed genes in a recently published dataset of bulk RNA-seq carried out on muscle samples from LGMD patients (n=16) and healthy individuals (n=15)25, where we had determined the genes that were significantly up- (LGMD_up) or downregulated (LGMD_dn) in patients compared to age-matched controls (Supplementary Table 3).

The significance of the difference in gene expression will be related to the size of the effect on the expression, but many genes that influence a phenotype may only show small changes in expression level. How well does this model deal with genes that show small changes in expression? Would this miss genes that show small changes in expression but are nevertheless important?

Read the original source
Arcadia Science
Dec 10, 2022

The resulting top terms after this trimming define the latent space.

What is the expected distribution of weights in the latent space? Would a discriminator network to impose different distributions be useful here?

Read the original source
Version published to 10.1101/2022.09.20.508703v2 on bioRxiv
Oct 13, 2022
Version published to 10.1101/2022.09.20.508703v1 on bioRxiv
Sep 22, 2022

Biologically Guided Variational Inference for Interpretable Multimodal Single-Cell Integration and Mechanistic Discovery

This article has 7 authors:
1. Lucas Arnoldt
2. Julius Upmeier zu Belzen
3. Luis Herrmann
4. Khue Nguyen
5. Fabian Theis
6. Benjamin Wild
7. Roland Eils
This article has no evaluationsLatest version Jun 12, 2025
10 Years of Variational Autoencoder: Insights from Cancer Temporal Progression Studies, a Systematic Literature Review

This article has 3 authors:
1. Guillermo Prol-Castelo
2. Davide Cirillo
3. Alfonso Valencia
This article has no evaluationsLatest version Jun 5, 2025
Learning Genetic Perturbation Effects with Variational Causal Inference

This article has 3 authors:
1. Emily Liu
2. Jiaqi Zhang
3. Caroline Uhler
This article has no evaluationsLatest version Jun 5, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Motivation

Results

Availability and implementation

Article activity feed

Related articles

Biologically Guided Variational Inference for Interpretable Multimodal Single-Cell Integration and Mechanistic Discovery

10 Years of Variational Autoencoder: Insights from Cancer Temporal Progression Studies, a Systematic Literature Review

Learning Genetic Perturbation Effects with Variational Causal Inference