Selecting a Window Size for the Analysis of Whole Genome Alignments using AIC

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The variation of evolutionary histories along the genome presents a challenge for phylogenomic methods to identify the non-recombining regions and reconstruct the phylogenetic tree for each region. To address this problem, many studies used the non-overlapping window approach, often with an arbitrary selection of fixed window sizes that potentially include intra-window recombination events. In this study, we proposed an information theoretic approach to select a window size that best reflects the underlying histories of the alignment. First, we simulated chromosome alignments that reflected the key characteristics of an empirical dataset and found that the AIC is a good predictor of window size accuracy in correctly recovering the tree topologies of the alignment. Due to the issue of missing data in empirical datasets, we then designed a stepwise non-overlapping window approach and applied this method to the genomes of erato - sara Heliconius butterflies and great apes. We found that the best window sizes for the butterflies’ chromosomes ranged from < 125bp to 250bp, which are much shorter than those used in a previous study even though this difference in window size did not significantly change the most common topologies across the genome. On the other hand, the best window sizes for great apes’ chromosomes ranged from 500bp to 1kb with the proportion of the major topology (grouping human and chimpanzee) falling between 60% and 87%, consistent with previous findings. Additionally, we observed a notable impact of stochastic error and concatenation when using small and large windows, respectively. For instance, the proportion of the major topology for great apes was 50% when using 250bp windows, but reached almost 100% for 64kb windows. In conclusion, our study highlights the challenges associated with selecting a window size in non-overlapping window analyses and proposes the AIC as a more objective way to select the optimal window size for whole genome alignments.

Article activity feed