Robustness of Ancestral Recombination Graph Inference Tools to Phasing Errors
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Ancestral Recombination Graphs (ARGs) are fundamental population genetic structures that encode the genealogical history of a sample of haplotypes along the genome. They have recently received substantial attention as they can be used to provide accurate estimates of population history, form the foundation for selection scans, provide time-labeled estimates of changes in mutation rates, and more. ARGs can be inferred using multiple different methods, but these methods all rely on computational haplotype phasing. Previous studies have shown that the accuracy of haplotype phasing can be low when done in small or medium-sized samples, which are typical for most non-model organisms. Conventional wisdom would, therefore, suggest that ARG inferences are of limited utility in most non-model organisms. To test this assumption, we benchmark the robustness of four ARG inference methods — Relate, SINGER, Threads, and tsinfer+tsdate — to phasing errors. Their performance with imperfect phasing was tested under simple and realistic demographic models, and evaluated in terms of estimated pairwise coalescence times, counts of inferred recombination breakpoints, and estimated branch-length-based diversity. We observe only a slight degradation in performance when using computational phasing across all four methods and all tested demographic scenarios, demonstrating a surprising robustness of ARG inference methods to phasing inaccuracies. These findings support broader application of ARG inference in settings where large reference panels are not available.