Ancestral Sequences Cannot be Accurately Reconstructed via Interpolation in a Variational Autoencoder’s Latent Space

Evan Gorstein
Mengze Tang
Hailey Bruzzone
Claudia Solís-Lemus

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Standard methods for ancestral sequence reconstruction (ASR) rely on substitution models for the residues in a biological sequence and assume independent evolution across these sites, ignoring the epistatic interactions that shape molecular evolution. In contrast, deep learning models like variational autoencoders (VAEs) can learn low-dimensional representations (“embeddings”) of sequences in a protein family that may implicitly handle these dependencies, raising the possibility of performing more accurate ASR by interpolating between extant sequence embeddings within the VAE’s latent space. In this study, we test this hypothesis by developing and evaluating a VAE-based ASR pipeline. Benchmarking this approach against established likelihood-based and parsimony methods using various simulations of protein evolution, including scenarios with and without epistasis, we find that the VAE-based approach is consistently and significantly outperformed by standard methods, even in epistatic regimes where it was hypothesized to have an advantage. We further show that this failure is not due to a lack of phylogenetic signal in the latent space, which does recapitulate evolutionary structure. Rather, the primary limitation is the information loss inherent to the autoencoding process: the VAE’s decoder cannot reconstruct sequences with sufficient fidelity for the precise demands of ASR.

Version published to 10.1101/2025.11.19.689264 on bioRxiv
Nov 20, 2025

Graph attention with structural features improves the generalizability of identifying functional sequences at a protein interface

This article has 6 authors:
1. J. Ash
2. I. M. Francino-Urdaniz
3. S. P. Kells
4. C. N. Davis
5. T. A. Whitehead
6. S. D. Khare
This article has no evaluationsLatest version Nov 10, 2025
Uncertainty in joint Ancestral State Reconstruction: Improving accuracy and biological interpretability of ancestral state prediction

This article has 4 authors:
1. James D. Boyko
2. Kyle J. Gontjes
3. Evan S. Snitkin
4. Stephen A. Smith
This article has no evaluationsLatest version Oct 10, 2025
E1: Retrieval-Augmented Protein Encoder Models

This article has 5 authors:
1. Sarthak Jain
2. Joel Beazer
3. Jeffrey A. Ruffolo
4. Aadyot Bhatnagar
5. Ali Madani
This article has no evaluationsLatest version Nov 13, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Graph attention with structural features improves the generalizability of identifying functional sequences at a protein interface

Uncertainty in joint Ancestral State Reconstruction: Improving accuracy and biological interpretability of ancestral state prediction

E1: Retrieval-Augmented Protein Encoder Models