Revisiting Reconstruction Likelihood: Variational Autoencoders for Biological and Biomedical Data Clustering

Andrej Korenić
Ufuk Özkaya
Abdulkerim Çapar

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background and Objective

Variational Autoencoders (VAEs) offer a powerful framework for unsupervised anomaly detection and data clustering, often surpassing traditional methods. A core strength of VAEs lies in their ability to model data distributions probabilistically, enabling robust identification of anomalies and clusters through reconstruction likelihood — a stochastic metric providing a principled alternative to deterministic error scores.

Methods

We investigated how different VAE architectures, combining reconstruction likelihood with a learnable or data-driven prior, performed in a clustering task on a toy dataset such as MNIST. Results were verified using dimensionality reduction techniques like t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), alongside clustering algorithms such as k-means and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN).

Results

The VAE’s encoder inherently maps data points into a latent space exhibiting discernible cluster structure, as evidenced by alignment with ground truth labels. While dimensionality reduction techniques (both t-SNE and UMAP) facilitated the application of clustering algorithms (k-means and HDBSCAN), these methods were primarily used to visualize and interpret the latent space organization.

Conclusions

This study demonstrates that VAEs effectively cluster data by implicitly encoding assignments in their latent representations. Determining cluster membership from encoder output, combined with reconstruction likelihood using semantic features, offers a principled approach for identifying typical samples and anomalies. Future research should focus on leveraging this inherent clustering capability of VAEs to enhance interpretability and facilitate clinical application.

Version published to 10.64898/2026.04.09.717460 on bioRxiv
Apr 12, 2026

CUVAE: Strengthening Latent Representations in Skip-Connection VAEs for High-Fidelity Medical Image Reconstruction

This article has 2 authors:
1. Kailash Kandpal
2. Prabhat Verma
This article has no evaluationsLatest version Mar 28, 2026
Replicability of unsupervised deep learning derived image phenotypes

This article has 5 authors:
1. Tian Xia
2. Sheikh Muhammad Saiful Islam
3. Ziqian Xie
4. Xingzhong Zhao
5. Degui Zhi
This article has no evaluationsLatest version May 19, 2026
Adaptive Cluster-Count Autoencoders with Dirichlet Process Priors for Geometry-Aware Single-Cell Representation Learning

This article has 1 author:
1. Zeyu Fu
This article has no evaluationsLatest version Mar 30, 2026

Discuss this preprint

Listed in

Abstract

Background and Objective

Methods

Results

Conclusions

Article activity feed

Related articles

CUVAE: Strengthening Latent Representations in Skip-Connection VAEs for High-Fidelity Medical Image Reconstruction

Replicability of unsupervised deep learning derived image phenotypes

Adaptive Cluster-Count Autoencoders with Dirichlet Process Priors for Geometry-Aware Single-Cell Representation Learning