AncestryDNA COVID-19 Host Genetic Study Identifies Three Novel Loci

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Human infection with SARS-CoV-2, the causative agent of COVID-19, leads to a remarkably diverse spectrum of outcomes, ranging from asymptomatic to fatal. Recent reports suggest that both clinical and genetic risk factors may contribute to COVID-19 susceptibility and severity. To investigate genetic risk factors, we collected over 500,000 COVID-19 survey responses between April and May 2020 with accompanying genetic data from the AncestryDNA database. We conducted sex-stratified and meta-analyzed genome-wide association studies (GWAS) for COVID-19 susceptibility (positive nasopharyngeal swab test, n cases =2,407) and severity (hospitalization, n cases =250). The severity GWAS replicated associations with severe COVID-19 near ABO and SLC6A20 ( P <0.05). Furthermore, we identified three novel loci with P <5×10 −8 . The strongest association was near IVNS1ABP , a gene involved in influenza virus replication 1 , and was associated only in males. The other two novel loci harbor genes with established roles in viral replication or immunity: SRRM1 and the immunoglobulin lambda locus. We thus present new evidence that host genetic variation likely contributes to COVID-19 outcomes and demonstrate the value of large-scale, self-reported data as a mechanism to rapidly address a health crisis.

Article activity feed

  1. SciScore for 10.1101/2020.10.06.20205864: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board StatementConsent: Ethics statement: All data for this research project was from subjects who provided prior informed consent to participate in AncestryDNA’s Human Diversity Project, as reviewed and approved by our external institutional review board, Advarra (formerly Quorum).
    IRB: Ethics statement: All data for this research project was from subjects who provided prior informed consent to participate in AncestryDNA’s Human Diversity Project, as reviewed and approved by our external institutional review board, Advarra (formerly Quorum).
    RandomizationFor all close relative pairs, one individual was randomly selected for exclusion from our study.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variable35 A key recommendation in this plan is to analyze males and females separately when possible; therefore, we conducted four separate GWAS in total: susceptibility in males, susceptibility in females, hospitalization in males, and hospitalization in females.
    Cell Line Authenticationnot detected.

    Table 2: Resources

    Experimental Models: Cell Lines
    SentencesResources
    --psam [file that provides sex information] --covar [covariates file] --covar-name PC1, PC2, PC3, PC4, PC5, PC6, PC7, PC8, PC9, PC10, PC11, PC12, orthogonal_age, orthogonal_age2, platform --covar-variance-standardize --extract-if-info R2 >= 0.3 --freq --glm hide-covar --keep [list of unrelated Europeans] --keep-females OR keep-males --maf 0.01 --pheno [phenotype file] --pheno-name [phenotype column name]
    PC2
    suggested: RRID:CVCL_0483)
    Software and Algorithms
    SentencesResources
    Calculation of principal components to control residual population structure: After selecting unrelated individuals with European ancestry as described above, genetic PCs were calculated to include in the association studies to control residual population structure and were computed using FlashPCA 2.0.31 Input genotypes were linkage disequilibrum (LD)-pruned using PLINK 1.9 command --indep-pairwise 100 5 0.2 --maf 0.05 --geno 0.001. Imputation: Samples were imputed to the Haplotype Reference Consortium (HRC) reference panel32 version 1.1, which consists of 27,165 total individuals and 36 million variants.
    PLINK
    suggested: (PLINK, RRID:SCR_001757)
    We determined best-guess haplotypes with Eagle version 2.4.133 and performed imputation with Minimac4 version 1.0.1.34 We used 1,117,080 unique variants as input and 8,049,082 imputed variants were retained in the final data set.
    Eagle
    suggested: (Eagle, RRID:SCR_017262)
    Minimac4
    suggested: None

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    A key limitation of our data is that COVID-19 cases who suffered very severe or fatal infections are less likely to participate in our survey, which consequently results in undersampling cases with severe outcomes. We also restricted to individuals of European ancestry due to small sample sizes in other genetic ancestry cohorts for the susceptibility and severity outcomes in this early phase of COVID-19 survey data collection. As the COVID-19 survey cohort grows, future analyses will focus on increased ancestral diversity to increase generalizability. Finally, we lack an independent replication cohort for our novel findings and will rely on future ascertainment of additional survey respondents and COVID-19 GWAS consortia27 efforts to determine whether our findings are reproducible. In summary, we collected over 500,000 self-reported COVID-19 outcomes in under two months and conducted one of the largest genetic studies of infection susceptibility and severity to date, thus demonstrating the value of large-scale self-reported data as a mechanism to rapidly address a serious health crisis. We identified three novel loci, all of which harbor genes with established roles in viral replication or immunity, and one of which may provide insight into why men appear to be differently affected by COVID-19 than women. We thus add to growing evidence that host genetic variation contributes to COVID-19 susceptibility and severity and suggest identification of such genetic risk factors may p...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.