Genetic and non-genetic factors affecting the expression of COVID-19-relevant genes in the large airway epithelium

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

The large airway epithelial barrier provides one of the first lines of defense against respiratory viruses, including SARS-CoV-2 that causes COVID-19. Substantial inter-individual variability in individual disease courses is hypothesized to be partially mediated by the differential regulation of the genes that interact with the SARS-CoV-2 virus or are involved in the subsequent host response. Here, we comprehensively investigated non-genetic and genetic factors influencing COVID-19-relevant bronchial epithelial gene expression.

Methods

We analyzed RNA-sequencing data from bronchial epithelial brushings obtained from uninfected individuals. We related ACE2 gene expression to host and environmental factors in the SPIROMICS cohort of smokers with and without chronic obstructive pulmonary disease (COPD) and replicated these associations in two asthma cohorts, SARP and MAST. To identify airway biology beyond ACE2 binding that may contribute to increased susceptibility, we used gene set enrichment analyses to determine if gene expression changes indicative of a suppressed airway immune response observed early in SARS-CoV-2 infection are also observed in association with host factors. To identify host genetic variants affecting COVID-19 susceptibility in SPIROMICS, we performed expression quantitative trait (eQTL) mapping and investigated the phenotypic associations of the eQTL variants.

Results

We found that ACE2 expression was higher in relation to active smoking, obesity, and hypertension that are known risk factors of COVID-19 severity, while an association with interferon-related inflammation was driven by the truncated, non-binding ACE2 isoform. We discovered that expression patterns of a suppressed airway immune response to early SARS-CoV-2 infection, compared to other viruses, are similar to patterns associated with obesity, hypertension, and cardiovascular disease, which may thus contribute to a COVID-19-susceptible airway environment. eQTL mapping identified regulatory variants for genes implicated in COVID-19, some of which had pheWAS evidence for their potential role in respiratory infections.

Conclusions

These data provide evidence that clinically relevant variation in the expression of COVID-19-related genes is associated with host factors, environmental exposures, and likely host genetic variation.

Article activity feed

  1. SciScore for 10.1101/2020.10.01.20202820: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Cell Lines
    SentencesResources
    To include in our candidate list, we chose genes that 1) have adjusted P-value < 0.05 in the differential expression analysis from primary cells and either cell lines (Calu-3 or ACE2-expressing A549 cells, low-MOI infection; excluded genes with adjusted P = 0) or samples derived from COVID-19 patients, and 2) log2 fold change > 0.5 in absolute scale in primary cells and log2 fold change > 1 in absolute scale in the other experiment.
    A549
    suggested: None
    Experimental Models: Organisms/Strains
    SentencesResources
    pheWAS of lead COVID-19 cis-eQTLs in SPIROMICS: We performed phenome-wide association studies in 1,980 non-Hispanic White (NHW) and 696 individuals from other ethnic and racial groups from SPIROMICS for the 108 lead cis-eQTLs to evaluate for phenotypic associations with spirometric measures, cell count differentials, immunoglobulin concentrations, longitudinal exacerbation risk, self-reported asthma history, cardiovascular diseases, CT scan measures of emphysema (bilateral percentage lung density <-950HFU at total lung capacity), CT scan functional small airways disease (PRM-fSAD), and alpha1-antitrypsin concentrations (subgroup of 1,191 NHW and 396 from other racial/ethnic groups).
    non-Hispanic White
    suggested: None
    Software and Algorithms
    SentencesResources
    FASTQ files were quality filtered and aligned to the Ensembl GRCh38 genome build using STAR50.
    Ensembl
    suggested: (Ensembl, RRID:SCR_002344)
    Differential expression analysis of ACE2 in relation to host/environmental factors: Visualization and analyses of single gene and gene signature analyses were done using RLE normalized and COMBAT batch corrected gene expression from the DESeq2 and SVA packages in R.
    DESeq2
    suggested: (DESeq, RRID:SCR_000154)
    As per the ASpli and EdgeR package recommendations, raw exon counts were adjusted for gene counts to remove the signal from differential gene expression using the formula: (Exon Count in each sample*mean raw ACE2 count)/raw ACE2 gene count in that sample.
    EdgeR
    suggested: (edgeR, RRID:SCR_012802)
    Biological pathway gene sets were built by inputting the genes differentially downregulated between SARS-CoV-2 infection and other viral illness (P < 0.05) into the Ingenuity Pathway Analysis canonical pathway function.
    Ingenuity Pathway Analysis
    suggested: (Ingenuity Pathway Analysis, RRID:SCR_008653)
    Expression quantitative trait mapping: Expression quantitative trait (eQTL) mapping was performed in 144 unrelated individuals from the SPIROMICS bronchoscopy sub-study with WGS genotype data from TOPMed and gene expression from bronchial epithelium profiled with RNA-seq following the analysis pipeline from the Genotype-Tissue Expression (GTEx) Consortium15
    WGS
    suggested: None
    LD pruning was performed using Plink 1.961 based on pairwise genotypic correlation of 200 SNPs at a time, with a step of 100 SNPs, and using LD threshold of > 0.1 to remove one of a pair of SNPs (option --indep-pairwise 200 100 0.1).
    Plink
    suggested: (PLINK, RRID:SCR_001757)
    The top 4 PCs explained > 0.1% of the variance, and were associated with subpopulations inferred from 1000 Genomes Project using k-nearest neighbors clustering (F-test P < 2×10−10
    1000 Genomes Project
    suggested: (1000 Genomes Project and AWS, RRID:SCR_008801)
    Window-size was set to 1 Mb from the transcription start site (TSS) of the gene according to the GENCODE version 33, 10,000 permutations were used to correct for multiple testing, and false discovery rate (FDR) < 0.05 was used to identify genes with statistically significant eQTLs (eGenes).
    GENCODE
    suggested: (GENCODE, RRID:SCR_014966)
    We queried PhenoScanner database based on the rs IDs of the lead cis-eQTLs obtained from dbSNP version 151 (GRCh38p7, including also former rs ID to query).
    dbSNP
    suggested: (dbSNP, RRID:SCR_002338)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.