Phylogenomic prediction of interaction networks in the presence of gene duplication
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Assigning gene function from genome sequences is a rate-limiting step in molecular biology research. A protein’s position within an interaction network can potentially provide insights into its molecular mechanisms. Phylogenetic analysis of evolutionary rate covariation (ERC) in protein sequence has been shown to be effective for large-scale prediction of functional relationships and interactions. However, gene duplication, gene loss, and other sources of phylogenetic incongruence are barriers for analyzing ERC on a genome-wide basis. Here, we developed ERCnet , a bioinformatic program designed to overcome these challenges, facilitating efficient all- vs-all ERC analyses for large protein sequence datasets. We simulated proteome datasets and found that ERCnet achieves combined false positive and negative error rates well below 10% and that our novel ‘branch-by-branch’ length measurements outperforms ‘root-to-tip’ approaches in most cases, offering a valuable new strategy for performing ERC. We also compiled a sample set of 35 angiosperm genomes to test the performance of ERCnet on empirical data, including its sensitivity to user-defined analysis parameters such as input dataset size and branch-length measurement strategy. We investigated the overlap between ERCnet runs with different species samples to understand how species number and composition affect predicted interactions and to identify the protein sets that consistently exhibit ERC across angiosperms. Our systematic exploration of the performance of ERCnet provides a roadmap for design of future ERC analyses to predict functional interactions in a wide array of genomic datasets. ERCnet code is freely available at https://github.com/EvanForsythe/ERCnet .
