SAI: A Python Package for Statistics for Adaptive Introgression

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Adaptive introgression is an important evolutionary process, yet widely used summary statistics—such as the number of uniquely shared sites and the quantile of the derived allele frequencies in such sites—lack accessible implementations, limiting reproducibility and methodological clarity. Here, we present SAI, a Python package for computing these statistics, and apply it to three datasets. First, using the 1000 Genomes Project data, we replicated previously reported candidate regions and identified additional ones, including a region detected by studies using supervised deep learning. Second, reanalysis of a Lithuanian genome dataset revealed no candidates in the HLA region. Finally, we investigated bonobo introgression into central chimpanzees and identified a candidate region that overlaps a high-frequency Denisovan-introgressed haplotype block reported in modern Papuans—an intriguing co-occurrence across divergent lineages. Discrepancies with prior results highlight the importance of transparent and reproducible analysis workflows, especially as machine learning becomes increasingly prevalent in evolutionary genomics.

Article activity feed