Polarizing SNPs without outgroup
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Asserting which allele is ancestral or derived, known as polarization, is a prerequisite of many population and quantitative genetic methods. In particular, it allows inference of the unfolded site-frequency spectrum (uSFS). The most widely used approaches are based on outgroup data. However, for studies on species for which closely related outgroups are difficult to obtain, information on many sites of interest may be missed due to alignment problems. Here, we present PolarBEAR (Polarization By Estimation of the Ancestral recombination graph), a method that uses the local genealogies from the ancestral recombination graph (ARG) to infer ancestral states. We show that PolarBEAR can reach high accuracy in polarization and uSFS estimation using simulations under several scenarios. This accuracy, however, heavily depends on the ARG used as input. It is maximal when the true ARGs is used, but can be very low depending on the ARG reconstruction method employed. We also applied our method to human population data and compared it with the outgroup-based method est-sfs. Although our method could not infer the ancestral state with high confidence at certain positions, it obtained results for positions that est-sfs could not polarize due to missing outgroup data. The polarization results of the two methods were highly consistent at positions inferred by both methods. The two methods inferred similar uSFS, with PolarBEAR estimating slightly fewer high-frequency derived alleles. Furthermore, we demonstrate that PolarBEAR is robust to the mutation model used, while est-sfs exhibits a bias in the presence of heterogeneous base composition. PolarBEAR can complement outgroup-based methods, or replace them when no appropriate outgroup sequence is available.