Initial whole-genome sequencing and analysis of the host genetic contribution to COVID-19 severity and susceptibility

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The COVID-19 pandemic has accounted for millions of infections and hundreds of thousand deaths worldwide in a short-time period. The patients demonstrate a great diversity in clinical and laboratory manifestations and disease severity. Nonetheless, little is known about the host genetic contribution to the observed interindividual phenotypic variability. Here, we report the first host genetic study in the Chinese population by deeply sequencing and analyzing 332 COVID-19 patients categorized by varying levels of severity from the Shenzhen Third People’s Hospital. Upon a total of 22.2 million genetic variants, we conducted both single-variant and gene-based association tests among five severity groups including asymptomatic, mild, moderate, severe, and critical ill patients after the correction of potential confounding factors. Pedigree analysis suggested a potential monogenic effect of loss of function variants in GOLGA3 and DPP7 for critically ill and asymptomatic disease demonstration. Genome-wide association study suggests the most significant gene locus associated with severity were located in TMEM189–UBE2V1 that involved in the IL-1 signaling pathway. The p.Val197Met missense variant that affects the stability of the TMPRSS2 protein displays a decreasing allele frequency among the severe patients compared to the mild and the general population. We identified that the HLA-A*11:01, B*51:01, and C*14:02 alleles significantly predispose the worst outcome of the patients. This initial genomic study of Chinese patients provides genetic insights into the phenotypic difference among the COVID-19 patient groups and highlighted genes and variants that may help guide targeted efforts in containing the outbreak. Limitations and advantages of the study were also reviewed to guide future international efforts on elucidating the genetic architecture of host–pathogen interaction for COVID-19 and other infectious and complex diseases.

Article activity feed

  1. SciScore for 10.1101/2020.06.09.20126607: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    At least 0.5μg was obtained for each individual and used to create WGS library, which insert sizes 300-500bp for paired-end libraries according to the BGI library preparation pipeline.
    WGS
    suggested: None
    Sequencing reads were mapped to hg38 reference genome using BWA algorithm.
    BWA
    suggested: (BWA, RRID:SCR_010910)
    Variant Quality Score Recalibration and Filtration: Variant Quality Score Recalibration were perform using Genome Analysis Toolkit (GATK version 4.1.2).
    Genome Analysis Toolkit
    suggested: None
    Known variant files were downloaded from the GATK bundle.
    GATK
    suggested: (GATK, RRID:SCR_001876)
    Familial relationship and population structure analysis: PLINK (v1.9)68 and KING (v2.1.5)69 was applied to detect the kinship relatedness between each pair of the individuals.
    PLINK
    suggested: (PLINK, RRID:SCR_001757)
    KING
    suggested: (KING, RRID:SCR_009251)
    The PC-AiR module (Principal components analysis in related samples) in the Genesis R package was used to conduct PCA analysis for the 332 patients including the related family members.
    Genesis
    suggested: (Genesis, RRID:SCR_015775)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Some limitations of the study should be noted. Power analysis indicates that sample size of around 300 is barely sufficient to identify genome-wide significant genetic variants with minor allele frequency greater than 0.2 and odds ratio greater than 1.8 given type I error rate 0.05. We don’t have power to detect causal variants beyond this risk and allele frequency scenario. In addition, although the study of hospitalized patients in a designated hospital includes all severe patients, the design has a limited presentation of the asymptomatic patients (7.5%) which ratio has been estimated to be 30.8% (95% confidence interval 7.7-53.8%)62. Given that RT-PCR test and the seroprevalence immunoglobulin M and G antibody tests targeting the SARS-CoV-2 has been widely adopted in China and around the globe, it will be important to identify and study the extreme asymptomatic patients to understand the host factors contributing to a capable control of the viral infection. As we and the others are continuing to recruit patients and data in China and around the world to understand the host genetic background underlying the varying clinical outcome of the patients, this work represents the first genetic study on the Chinese hospitalized patients where high quality sequencing data were generated and systematic analysis on the genomic and clinical data were conducted. Our results highlight several genetic factors involved in the immune responses including genes involved in the viral entry in...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.