CLUES2 Companion: Computational pipelines to estimate, visualize, and date selection on multi-locus sites
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Summary
Statistical methods that quantify the selection coefficient associated with alleles provide critical insights into the evolutionary processes underlying organismal adaptation. Among these approaches, CLUES2 was recently developed to estimate selection coefficients for alleles using a statistical framework that captures the maximum amount of information present in genomic data, making it a state-of-the-art method for identifying variation under selection. However, before executing this approach, users first need to apply Relate, a genealogy-based approach, to their data to generate the input files for CLUES2. Moreover, completing this pre-processing step inherently assumes that users have sufficient expertise to successfully run the Relate software. Here, we present the CLUES2 Companion package, which contains user-friendly pipelines that seamlessly apply Relate to multiple sites within a target genomic region and then execute the CLUES2 software to estimate selection coefficients for these sites. CLUES2 Companion also has the capability to present the output of CLUES2 analyses in tabular and graphical formats. In addition, as a new feature, we adapted Relate and CLUES2 to estimate the age of onset of a selective sweep of derived variation, expanding the functionality of our package.
Results
To demonstrate the utility of our approach, we applied CLUES2 Companion to polymorphisms in the MCM6 gene on Chromosome 2 (including the known variants associated with lactase persistence) in the European Finnish, Middle Eastern Bedouin, and East African Maasai populations from the 1000 Genomes Project, the Human Genome Diversity Project (HGDP), and the haplotype map (HapMap) Project Phase 3, respectively. Our analyses uncovered significant selection coefficient estimates at the persistence-associated T −13910 allele (rs4988235; s = 0.09986, CI: 0.08678 – 0.11294) in the Finnish, the G -13915 allele (rs41380347; s = 0.09981, CI: 0.06515 – 0.13448) in the Bedouin, and the C -14010 allele (rs145946881; s = 0.09981, CI: 0.08799 – 0.11163) in the Maasai, indicative of a classic selective sweep. Furthermore, we inferred the age of onset of selection at these alleles to be 9,100 years ago (CI: 6,552 – 10,612 years ago) in the Finnish, 7,700 years ago (CI: 1,864 – 8,064 years ago) in the Bedouin, and 4,900 years ago (CI: 3,864 – 5,936 years ago) in the Maasai, respectively, which coincide well with other estimates based on genetic and archaeological data. To further validate our dating method, we simulated several datasets containing SNPs with known ages of onset of selection, s estimates, and genomic positions using a selective sweep framework implemented in msprime and then applied CLUES2 Companion to the simulated datasets. Using this approach, CLUES2 Companion produced similar estimates of selection onset as the ones specified in the simulations, corroborating the dependability of our method. Overall, CLUES2 Companion is a versatile package that enables users to efficiently explore, interpret, and report evidence of selection in genomic datasets, complementing the CLUES2 software.
Availability and Implementation
CLUES2 Companion is free and open source on GitHub ( https://github.com/alisi1989/CLUES2-Companion ) and on DropBox (https://www.dropbox.com/scl/fo/m5y6aek0twd1jz9grg4p3/ALxMgIljUJRIZZNQXaGU-OE?rlkey=mbbh36ondftnqg0×07eao57eg&st=9jzsl5um&dl=0).
Contact
alisi@usc.edu ; mc44680@usc.edu