Generative continuous time model reveals epistatic signatures in protein evolution
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein evolution is fundamentally shaped by epistasis, where the effect of a mutation depends on the sequence context. As standard phylogenetic methods assume independently evolving sites, there is a need for more complex models based on accurate estimations of the fitness landscape. Good candidates are modern generative models – such as the Potts model – which successfully capture epistatic effects. However, recent work on generative evolutionary models usually use discrete time, making them difficult to integrate with the standard frameworks in evolutionary biology. We introduce a continuous-time sequence evolution model using the Gillespie algorithm and parameterized by a generative Potts model. This approach enables us to simulate realistic, family-specific evolutionary trajectories and allows for direct comparison with independent-site models. Surprisingly, we find that while epistasis significantly slows down evolution, it does not change the average evolutionary rates at individual sites. This is explained by the rate heterogeneity caused by context-dependence: we show that the rate at some positions varies between null to high values depending on the context, while other positions are essentially independent from the context. Finally, we show that epistasis leads to a systematic underestimation bias in the inference of evolutionary distance between sequences. Overall, our work provides a new tool for simulating realistic protein evolution and offers novel insights into the complex interplay between epistasis and evolutionary dynamics.
Significance statement
Understanding how proteins evolve is central to molecular biology and phylogenetics. Traditional evolutionary models assume that mutations act independently at each position in a sequence. This neglects epistasis — the fact that the effect of a mutation depends on the rest of the sequence — which is known to be ubiquitous in proteins. By simulating protein evolution in continuous time using a generative model, our approach produces realistic sequences and reveals how epistasis shapes evolutionary dynamics. We find that epistasis slows down evolution and can mislead common methods for estimating evolutionary timescales. This work bridges modern generative models of proteins and phylogenetics, providing new tools to better understand molecular evolution.