Statistical detection of protein sites associated with continuous traits

Louis Duchemin
Gerard Muntane Medina
Bastien Boussau
Philippe Veber

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Comparative genomic data can be used to look for substitutions in coding sequences that are associated with the variation of a particular phenotypic trait. A few statistical methods have been proposed to do so for phenotypes represented by discrete values. For continuous traits, no such statistical approach has been proposed, and researchers have resorted to sensible but uncharacterized criteria. Here, we investigate a phylogenetic model for coding sequences where amino acid preferences at a site are given by a continuous function of a quantitative trait. This function is inferred from the amino acids and the trait values in extant species and requires inferred point estimates of ancestral values of the trait at internal nodes. For detecting sites whose evolution is associated with this trait, we use a significance test against the hypothesis that amino acid preference does not depend on the trait.

This procedure is compared to simpler strategies on simulated alignments. It displays an increased recall for low false positive rates, which is of special importance for performing whole-genome scans. This comes however at a much higher computational cost, and we suggest using a simple test to filter promising candidate sites. We then revisit a dataset of alignments for 62 species of mammals, using longevity as a phenotypic trait. We apply our method to three protein families that have previously been proposed to display sites associated with variation in lifespan in mammals. Using a graphical representation extracted from the detailed phylogenetic analysis of candidate sites, we suggest that the evidence for this in the sequence data alone is weak.

The proposed method has been added to our Pelican software. It is available at https://gitlab.in2p3.fr/phoogle/pelican and can now be used with both discrete and continuous phenotypes to search for sites associated with phenotypic variation, on data sets with thousands of alignments.

Version published to 10.1101/2025.07.22.665918 on bioRxiv
Jul 26, 2025

Genetic estimates of relatedness: Established practices and new opportunities through low coverage whole genome sequencing

This article has 8 authors:
1. Annika Freudiger
2. Natalie Kestel
3. Vladimir Jovanovic
4. Mariana Madruga de Brito
5. Angelina Ruiz-Lambides
6. Katja Nowick
7. Anja Widdig
8. Harald Ringbauer
This article has no evaluationsLatest version Jan 23, 2026
The heterogeneous selection landscape of genome evolution in prokaryotes

This article has 5 authors:
1. Eugene Koonin
2. Sofiya Garushyants
3. Svetlana Karamycheva
4. Nash Rochman
5. Yuri Wolf
This article has no evaluationsLatest version Dec 12, 2025
Molecular Evolution of the <i>Fusion</i> (<i>F</i>) Genes in Human Metapneumovirus Genotype B

This article has 10 authors:
1. Tatsuya Shirai
2. Fuminori Mizukoshi
3. Mitsuru Sada
4. Kazuya Shirato
5. Takeshi Saraya
6. Haruyuki Ishii
7. Ryusuke Kimura
8. Toshiyuki Sugai
9. Akihide Ryo
10. Hirokazu Kimura
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Genetic estimates of relatedness: Established practices and new opportunities through low coverage whole genome sequencing

The heterogeneous selection landscape of genome evolution in prokaryotes

Molecular Evolution of the <i>Fusion</i> (<i>F</i>) Genes in Human Metapneumovirus Genotype B