Learning the Shape of Evolutionary Landscapes: Geometric Deep Learning Reveals Hidden Structure in Phenotype-to-Fitness Maps
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (Arcadia Science)
Abstract
Elucidating the complex relationships between genotypes, phenotypes, and fitness remains one of the fundamental challenges in evolutionary biology. Part of the difficulty arises from the enormous number of possible genotypes and the lack of understanding of the underlying phenotypic differences driving adaptation. Here, we present a computational method that takes advantage of modern high-throughput fitness measurements to learn a map from high-dimensional fitness profiles to a low-dimensional latent space in a geometry-informed manner. We demonstrate that our approach using a Riemannian Hamiltonian Variational Autoencoder (RHVAE) outperforms traditional linear dimensionality reduction techniques by capturing the nonlinear structure of the phenotype-fitness map. When applied to simulated adaptive dynamics, we show that the learned latent space retains information about the underlying adaptive phenotypic space and accurately reconstructs complex fitness landscapes. We then apply this method to a dataset of high-throughput fitness measurements of E. coli under different antibiotic pressures and demonstrate superior predictive power for out-of-sample data compared to linear approaches. Our work provides a data-driven implementation of Fisher’s geometric model of adaptation, transforming it from a theoretical framework into an empirically grounded approach for understanding evolutionary dynamics using modern deep learning methods.
Article activity feed
-
This result is consistent with the hypothesis that the nonlinear latent space coordinates capture more information about the phenotypic state of the system under study, allowing for more accurate predictions of out-of-sample data. Furthermore, increasing the dimensionality of the nonlinear latent space from two to three dimensions only marginally improves the predictive power of the model, suggesting that the 2D nonlinear latent space captures most of the information about the phenotypic state of the system (see Supplementary Material for a detailed discussion).
It seems worth communicating here that in 5 (6?) out of 8 total antibiotics tested, the 2D-VAE outperformed the 2D-RHVAE for out-of-sample prediction. That said, it is noteworthy that, at least for one antibiotic (NQO), the magnitude of improvement over the VAE by the RHVAE is …
This result is consistent with the hypothesis that the nonlinear latent space coordinates capture more information about the phenotypic state of the system under study, allowing for more accurate predictions of out-of-sample data. Furthermore, increasing the dimensionality of the nonlinear latent space from two to three dimensions only marginally improves the predictive power of the model, suggesting that the 2D nonlinear latent space captures most of the information about the phenotypic state of the system (see Supplementary Material for a detailed discussion).
It seems worth communicating here that in 5 (6?) out of 8 total antibiotics tested, the 2D-VAE outperformed the 2D-RHVAE for out-of-sample prediction. That said, it is noteworthy that, at least for one antibiotic (NQO), the magnitude of improvement over the VAE by the RHVAE is the largest observed.
I should emphasize too - I don't think this is necessarily a problem for the RHVAE - it clearly has significant benefits outside of the capacity to generalize! If anything, I'd say it's reasonable to hypothesize that perhaps it doesn't generalize as well because it does a better job of learning where it can and can not make accurate predictions, leading to sharper boundaries in prediction accuracy.
-
a neural network architecture that embeds high-dimensional data into a low-dimensional space while simultaneously learning the geometric transformations that map the learned low-dimensional space back into the high-dimensional space via a metric tensor
I'm very intrigued by this idea. At the very least, this seems like an interesting and effective way to drive VAEs towards learning more biologically meaningful, and increasingly expressive prior distributions rather than using a simple normal. On the other hand, this suggests the promise of a latent space that may even be interpretable.
I'm wondering if you've looked into this - whether these meaningful distances in the high-dimensional latent space are, or eventually could be made to be interpretable such that the contributions of different phenotypes to each latent dimension is …
a neural network architecture that embeds high-dimensional data into a low-dimensional space while simultaneously learning the geometric transformations that map the learned low-dimensional space back into the high-dimensional space via a metric tensor
I'm very intrigued by this idea. At the very least, this seems like an interesting and effective way to drive VAEs towards learning more biologically meaningful, and increasingly expressive prior distributions rather than using a simple normal. On the other hand, this suggests the promise of a latent space that may even be interpretable.
I'm wondering if you've looked into this - whether these meaningful distances in the high-dimensional latent space are, or eventually could be made to be interpretable such that the contributions of different phenotypes to each latent dimension is quantifiable?
-