Impact of natural selection on global patterns of genetic variation and association with clinical phenotypes at genes involved in SARS-CoV-2 infection

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Viruses are strong sources of natural selection pressure during human evolutionary history. Investigating genetic diversity and detecting signatures of natural selection at host genes related to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection help to identify functionally important variation. We conducted a large study of global genomic variation at host genes that play a role in SARS-CoV-2 infection with a focus on underrepresented African populations. We identified nonsynonymous and regulatory variants at ACE2 that appear to be targets of recent natural selection in some African populations. We detected evidence of ancient adaptive evolution at TMPRSS2 in the human lineage. Genetic variants that are targets of natural selection are associated with clinical phenotypes common in patients with coronavirus disease 2019.

Article activity feed

  1. SciScore for 10.1101/2021.06.28.21259529: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Genomic data: The genomic data used in this study were from three sources: the Africa 6K project (referred to as the the “African Diversity” dataset) which is part of the TopMed consortium25, the 1000 Genomes project (1KG)26, and the Penn Medicine BioBank (PMBB).
    TopMed
    suggested: None
    Whole genome sequencing (WGS) was performed to a median depth of 30X using DNA isolated from blood, PCR-free library construction and Illumina HiSeq X technology, as described elsewhere25.
    WGS
    suggested: None
    The 1000 Genomes datasets with super population ancestry labels (EUR, AFR, EAS, SAS, Other) were used as QDA training datasets to determine the genetic ancestry labels for the PMBB population.
    1000 Genomes
    suggested: (1000 Genomes Project and AWS, RRID:SCR_008801)
    Variant annotations: We used Ensembl Variant Effect Predictor (VEP) for variant annotations27.
    Ensembl Variant Effect Predictor
    suggested: None
    Variant
    suggested: (VARIANT, RRID:SCR_005194)
    For pathogenicity predictions, we used CADD28, SIFT29, PolyPhen30, Condel31, and REVEL scores in Ensembl.
    Ensembl
    suggested: (Ensembl, RRID:SCR_002344)
    We visualized the location of these regulatory and eQTL variants using the UCSC genome browser and highlighted the variants using Adobe Illustrator.
    Adobe Illustrator
    suggested: (Adobe Illustrator, RRID:SCR_010279)
    Association Testing: We used the R SKAT package for conducting a gene-based dispersion test and Biobin37; 38 for gene burden analysis.
    SKAT
    suggested: (SKAT, RRID:SCR_009396)
    The chimpanzee sequence (Clint_PTRv2/panTro6) used in the analysis was obtained from the UCSC genome browser.
    UCSC genome browser
    suggested: (UCSC Genome Browser, RRID:SCR_005780)
    The overlapped SNPs were uploaded to the UCSC browser for visualization.
    UCSC browser
    suggested: None
    The ChIP-seq density dataset was obtained from http://remap.univ-amu.fr/ 33. DNase-seq and ChIP-seq clusters, layered H3K4Me3 (often found near Promoters), H3K4Me1 and H3K27Ac (often found near Regulatory Elements) data are from ENCODE32.
    ChIP-seq
    suggested: (ChIP-seq, RRID:SCR_001237)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.