Whole genome sequencing identifies multiple loci for critical illness caused by COVID-19

This article has been Reviewed by the following groups

Read the full article

Abstract

Critical illness in COVID-19 is caused by inflammatory lung injury, mediated by the host immune system. We and others have shown that host genetic variation influences the development of illness requiring critical care 1 or hospitalisation 2;3;4 following SARS-Co-V2 infection. The GenOMICC (Genetics of Mortality in Critical Care) study recruits critically-ill cases and compares their genomes with population controls in order to find underlying disease mechanisms.

Here, we use whole genome sequencing and statistical fine mapping in 7,491 critically-ill cases compared with 48,400 population controls to discover and replicate 22 independent variants that significantly predispose to life-threatening COVID-19. We identify 15 new independent associations with critical COVID-19, including variants within genes involved in interferon signalling ( IL10RB, PLSCR1 ), leucocyte differentiation ( BCL11A ), and blood type antigen secretor status ( FUT2 ). Using transcriptome-wide association and colocalisation to infer the effect of gene expression on disease severity, we find evidence implicating expression of multiple genes, including reduced expression of a membrane flippase ( ATP11A ), and increased mucin expression ( MUC1 ), in critical disease.

We show that comparison between critically-ill cases and population controls is highly efficient for genetic association analysis and enables detection of therapeutically-relevant mechanisms of disease. Therapeutic predictions arising from these findings require testing in clinical trials.

Article activity feed

  1. SciScore for 10.1101/2021.09.02.21262965: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Organisms/Strains
    SentencesResources
    To compare effect of variants within groups for the age and sex stratified analysis we first adjusted the effect and error of each variant for the standard deviation of the trait in each stratified group and then used the following t-statistic, as in previous studies51;52where b1 is the adjusted effect for group 1, b2 is the adjusted effect for group 2, se1 and se2 are the adjusted standard errors for group 1 and 2 respectively and r is the Spearman rank correlation between groups across all genetic variants.
    b1
    suggested: None
    Recombinant DNA
    SentencesResources
    This resulted in a total of 63,523 high-quality sites from aggV2.
    aggV2
    suggested: None
    We considered a gene-wide significant threshold on the basis of the genes tested per ancestry, correcting for the two masks (pLoF and missense, Supplementary Table 4).
    pLoF
    suggested: None
    Software and Algorithms
    SentencesResources
    The yield of the DNA was measured using Qubit and normalised to 50ng/µl before WGS or genotyping.
    WGS
    suggested: None
    Alignment was performed to genome reference GRCh38 including decoy contigs and alternate haplotypes (ALT contigs), with ALT-aware mapping and variant calling to improve specificity. 100,000 Genome Project cohort (100K-genomes): All genomes from the 100,000 Genomes Project cohort were analysed with the Illumina North Star Version 4 Whole Genome Sequencing Workflow (
    100,000 Genomes Project
    suggested: (100, 000 Genomes Project, RRID:SCR_010502)
    Aggregation for the 100K-Genomes cohort was performed using Illumina’s gvcfgenotyper v2019.02.26, merged with bcftools v1.10.2 and normalised with vt v0.57721.
    Illumina’s
    suggested: None
    bcftools
    suggested: (SAMtools/BCFtools, RRID:SCR_005227)
    Samples were filtered out based on the residuals of eleven QC metrics (calculated using bcftools) after regressing out the effects of sequencing platform and the first three ancestry assignment principal components (including all linear, quadratic, and interaction terms) taken from the sample projections onto the SNP loadings from the individuals of 1000 Genomes Project phase 3 (1KGP3).
    1000 Genomes Project
    suggested: (1000 Genomes Project and AWS, RRID:SCR_008801)
    To create this set, we applied the same variant QC procedure as with the common variants: We selected variants that had missingness <1%, median QC>30, median depth >=30 and >= 90% of heterozygote genotypes passing an ABratio binomial test with P–value > 10−2 per batch of sequencing and genotyping platform (i.e, HiSeq+NSV4, HiSeq+Pipeline 2.0, NovaSeq+Pipeline 2.0).
    NovaSeq+Pipeline
    suggested: None
    Control-control QC filter: 100K aggV2 samples that were aligned and genotype called with the Illumina North Star Version 4 pipeline represented the majority of control samples in our GWAS analyses, whereas all of the cases were aligned and called with Genomics England pipeline 2.0
    Genomics
    suggested: (UTHSCSA Genomics Core, RRID:SCR_012239)
    For each genome-wide significant variant locus, we selected the variants 1.5 Mbp on each side and computed the correlation matrix among them with plink v1.9.
    plink
    suggested: (PLINK, RRID:SCR_001757)
    We also ranked each variant within each credible set according to the predicted consequence and the ranking was based on the table provided by Ensembl: https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html. Trans-ancestry meta-analysis: We performed a meta-analysis across all ancestries using a inverse-variance weighted method and control for population stratification for each separate analysis in the METAL software13.
    METAL
    suggested: (METAL, RRID:SCR_002013)
    The subtraction was performed using MetaSubtract package (version 1.60) for R (version 4.0.2) after removing variants with the same genomic position and using the lambda.cohortswith genomic inflation calculated on the GenOMICC summary statistics.
    MetaSubtract
    suggested: None
    HLA association analysis was run under an additive model using SAIGE; in an identical fashion to the SNV GWAS.
    SAIGE
    suggested: None
    Aggregate variant testing (AVT): Aggregate variant testing on aggCOVID_v4.2 was performed using SKAT-O as implemented in SAIGE-GENE v0.44.516 on all protein-coding genes.
    SAIGE-GENE
    suggested: None
    Post-GWAS analysis: Transcriptome-wide Association Studies (TWAS): We performed TWAS in the MetaXcan framework and the GTExv8 eQTL and sQTL MASHR-M models available for download in (http://predictdb.org/).
    MetaXcan
    suggested: None

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.