Impact of phylogeny on the inference of functional sectors from protein sequence data

Nicola Dietler
Alia Abbara
Subham Choudhury
Anne-Florence Bitbol

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.

Version published to 10.1371/journal.pcbi.1012091
Sep 23, 2024
Version published to 10.1101/2024.04.22.590511 on bioRxiv
Apr 26, 2024

The heterogeneous selection landscape of genome evolution in prokaryotes

This article has 5 authors:
1. Eugene Koonin
2. Sofiya Garushyants
3. Svetlana Karamycheva
4. Nash Rochman
5. Yuri Wolf
This article has no evaluationsLatest version Dec 12, 2025
Molecular Evolution of the <i>Fusion</i> (<i>F</i>) Genes in Human Metapneumovirus Genotype B

This article has 10 authors:
1. Tatsuya Shirai
2. Fuminori Mizukoshi
3. Mitsuru Sada
4. Kazuya Shirato
5. Takeshi Saraya
6. Haruyuki Ishii
7. Ryusuke Kimura
8. Toshiyuki Sugai
9. Akihide Ryo
10. Hirokazu Kimura
This article has no evaluationsLatest version Dec 23, 2025
The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The heterogeneous selection landscape of genome evolution in prokaryotes

Molecular Evolution of the <i>Fusion</i> (<i>F</i>) Genes in Human Metapneumovirus Genotype B

The Evolution of the AlphaFold Architecture