Inferring the history of gene copy number evolution
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Gene duplication plays a crucial role in the adaptive evolution and diversification of organisms by creating extra copies of genes that can evolve new functions while preserving the original. Duplicated genes can become fixed in populations or appear as copy number variants. However, inferring and dating these duplication events from present-day data is challenging, as gene copy count distributions could result from either a few ancient duplication events or many recent ones. Sequence based phylogenetic reconstruction, an often seen practice, does not include the history of individuals and hence may result in inconsistencies, which may lead to misinterpretations. Here, we introduce a novel model for inferring gene copy number evolution, which describes gene duplication and their evolution over time through a random walk on a coalescent duplication network. This approach is solely based on copy number counts and hence independent of the inconsistencies of sequence based inferences. Backward in time we implement structured coalescent simulations, where we re-interprete ‘structure’ as ‘genealogical distance’ based on copy number counts. We apply this model to the NB-ARC domain counts of NLR genes in A. thaliana to infer the number and times of duplication events that have led to the present day copy number distribution.