Exome-wide association study to identify rare variants influencing COVID-19 outcomes: Results from the Host Genetics Initiative

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Host genetics is a key determinant of COVID-19 outcomes. Previously, the COVID-19 Host Genetics Initiative genome-wide association study used common variants to identify multiple loci associated with COVID-19 outcomes. However, variants with the largest impact on COVID-19 outcomes are expected to be rare in the population. Hence, studying rare variants may provide additional insights into disease susceptibility and pathogenesis, thereby informing therapeutics development. Here, we combined whole-exome and whole-genome sequencing from 21 cohorts across 12 countries and performed rare variant exome-wide burden analyses for COVID-19 outcomes. In an analysis of 5,085 severe disease cases and 571,737 controls, we observed that carrying a rare deleterious variant in the SARS-CoV-2 sensor toll-like receptor TLR7 (on chromosome X) was associated with a 5.3-fold increase in severe disease (95% CI: 2.75–10.05, p = 5.41x10 -7 ). This association was consistent across sexes. These results further support TLR7 as a genetic determinant of severe disease and suggest that larger studies on rare variants influencing COVID-19 outcomes could provide additional insights.

Article activity feed

  1. SciScore for 10.1101/2022.03.28.22273040: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Recombinant DNA
    SentencesResources
    The burden tests were performed by pooling variants in three different variant sets (called masks): “pLoF” which included loss of functions as defined by high impact variants in the Ensembl database20 (i.e. transcript ablation, splice acceptor variant, splice donor variant, stop gained, frameshift variant, stop lost, start lost, transcript amplification), “coding5” which included all variants in pLoF as well as moderate impact indels and any missense variants that was predicted to be deleterious based on all of the in-silico pathogenicity prediction scores used, and “coding1” which included all variants in coding5 as well as all missense variants that were predicted to be deleterious in at least one of the in-silico pathogenicity prediction scores used.
    pLoF
    suggested: None
    Software and Algorithms
    SentencesResources
    The burden tests were performed by pooling variants in three different variant sets (called masks): “pLoF” which included loss of functions as defined by high impact variants in the Ensembl database20 (i.e. transcript ablation, splice acceptor variant, splice donor variant, stop gained, frameshift variant, stop lost, start lost, transcript amplification), “coding5” which included all variants in pLoF as well as moderate impact indels and any missense variants that was predicted to be deleterious based on all of the in-silico pathogenicity prediction scores used, and “coding1” which included all variants in coding5 as well as all missense variants that were predicted to be deleterious in at least one of the in-silico pathogenicity prediction scores used.
    Ensembl
    suggested: (Ensembl, RRID:SCR_002344)
    For in-silico prediction, we used the following five tools: SIFT47, LRT48, MutationTaster49, PolyPhen250 with the HDIV database, and PolyPhen2 with the HVAR database.
    PolyPhen2
    suggested: None
    Summary statistics were then meta-analyzed using a fixed effect model within each ancestry and using a DerSimonian-Laird random effect model across ancestries with the Metal package51 and its random effect extension52.
    Metal
    suggested: (METAL, RRID:SCR_002013)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our study had limitations. First, even if this is one of the world’s largest consortia using sequencing technologies for the study of rare variants, we remain limited by a relatively small sample size. For example, in a recent analyses of UK Biobank exomes, many of the phenotypes for which multiple genes were found using burden tests had a much higher number of cases than in our analyses (e.g. blonde hair colour, with 48,595 cases)19. Further, rare variant signals were commonly found in regions enriched in common variants found in GWASs. The fact that ABO and NSF were the only genes from the COVID-19 HGI GWAS that were also identified in our burden test (albeit using a more liberal significance threshold), also suggests a lack of statistical power. Similarly, GenOMICC, a cohort of similar size, was also unable to find rare variant associations using burden tests11. However, their analysis methods were different from ours, making further comparisons difficult. Nevertheless, this provides clear guidance that smaller studies looking at the effect of rare variants across the genome are at considerable risk of finding both false positive and false negative associations. Second, many cohorts used population controls, which may have decreased statistical power given that some controls may have been misclassified. However, given that COVID-19 critical illness remains a rare phenomenon43, our severe disease phenotype results are unlikely to be strongly affected by this. Further, the u...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.