Regulatory dissection of the severe COVID-19 risk locus introgressed by Neanderthals

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    Scientists had previously discovered that humans and neanderthals mated leading to parts of neanderthal DNA becoming part of the human genome today. More recently, it was found that a genetic region, carrying which has been associated with manifestation of severe COVID-19 symptoms, is one that was "introgressed" into humans from neanderthals. This region contains many genetic variants, and this study set out to identify which of these genetic variants may be causally involved in creating severe symptoms in response to COVID-19 infection. The main critiques of the study stem from details of the functional assays to establish the regulatory role of the 4 variants in creating severe COVID-19 symptoms. In particular, the two genes (critical chemokine receptor genes: CCR1 and CCR5) that the authors identify as down-regulated by these variants are actually up-regulated in severe COVID-19 patients, leading to doubt about the role of these variants in changing response to COVID-19 through the regulation of these genes. In that regard, it seems necessary to conduct follow-up experimental and computational analyses to establish the role of these variants in altering CCR1 and CCR5 gene expression.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Individuals infected with the SARS-CoV-2 virus present with a wide variety of symptoms ranging from asymptomatic to severe and even lethal outcomes. Past research has revealed a genetic haplotype on chromosome 3 that entered the human population via introgression from Neanderthals as the strongest genetic risk factor for the severe response to COVID-19. However, the specific variants along this introgressed haplotype that contribute to this risk and the biological mechanisms that are involved remain unclear. Here, we assess the variants present on the risk haplotype for their likelihood of driving the genetic predisposition to severe COVID-19 outcomes. We do this by first exploring their impact on the regulation of genes involved in COVID-19 infection using a variety of population genetics and functional genomics tools. We then perform a locus-specific massively parallel reporter assay to individually assess the regulatory potential of each allele on the haplotype in a multipotent immune-related cell line. We ultimately reduce the set of over 600 linked genetic variants to identify four introgressed alleles that are strong functional candidates for driving the association between this locus and severe COVID-19. Using reporter assays in the presence/absence of SARS-CoV-2 , we find evidence that these variants respond to viral infection. These variants likely drive the locus’ impact on severity by modulating the regulation of two critical chemokine receptor genes: CCR1 and CCR5 . These alleles are ideal targets for future functional investigations into the interaction between host genomics and COVID-19 outcomes.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    COVID-19 severity has been previously linked to a genetic region on chromosome 3 introgressed from Neandertals. The authors use several computational methods to, within this region, identify specific regions that putatively regulate gene expression, and to identify genes within these regions associated with COVID-19 severity. The use of several complementary computational approaches is a major strength of the paper as it bolsters confidence in the findings and narrows the search for significant genomic regions down to most likely candidates. They find 14 genes that exhibit expression regulated by the identified introgressed genomic regions. Among these are several chemokine receptors including two - CCR1 and CCR5 - whose upregulation is associated with severe COVID-19. The authors then use functional genomics to determine whether the identified regions do regulate gene expression.

    We thank this Reviewer for highlighting these strengths.

    In contrast to the robustness of the computational findings, the authors' MPRA results are less robust with respect to the significance of the paper to clinical severity of COVID-19. The MPRA shows that the computational methods were reasonably effective at identifying regulatory elements within the introgressed region (53%). The authors then focus on emVars where the H.n. allele differentially regulates expression and identify 4 putative emVars that may regulate expression of CCR1 and CCR5. However, the authors found in their MPRA that these emVars downregulate reporter gene expression, whereas the genes of interest CCR1 and CCR5 are upregulated during severe COVID.

    This result highlights the principal weakness of using the MPRA in this context, as it assumes that reporter gene expression using a minimal promoter has identical regulatory determinants as expression of the gene of interest. Its strength is the high-throughput nature of the assay, but its weakness is the lack of specificity with respect to the question at hand. This lack of specificity mitigates the impact of the functional aspect of the work. The authors' computational findings certainly bolster previous work that H.n. introgressed alleles are associated with COVID-19 severity and that this association may be at least partly dependent on gene expression differences between the archaic and modern alleles. However, the specific question at hand, whether chemokine receptor expression is linked to the clinical phenotype, remains unaddressed.

    Ultimately the authors results support the conclusions that the 4 emVars identified do regulate gene expression. However, the hypothesis that these specific regions are linked to COVID-19 severity is not supported. The authors' speculation as to why their results may differ from the observed upregulation during disease is intriguing, but lacks support.

    We thank the Reviewer for providing these important points and we hope through our new experimental approach we helped to strengthen our findings. However, we also have modified the manuscript to also be more critical of our findings in the context of the issues Reviewer has brought up. This is shown in our updated Discussion, whose parts are provided above in the section addressed to the Editor, as well as in the newly revised manuscript.

    Reviewer #2 (Public Review):

    Previous research using GWAS and population genetics approach identified a genetic haplotype on chromosome 3 derived from Neanderthals as the major risk factor for severe COVID-19. However, the specific variants that are causative of the severe COVID-19 phenotype remain unknown. Here, Jagoda et al. aim to identify the causative variants for the severe COVID-19 by leveraging eQTL analysis followed by Massively parallel reporter assays (MPRA). Their datasets and results are unique and novel. Their research is well designed, and will serve as a model strategy for future studies of functional annotation of disease-associated variants.

    We thanks Reviewer #2 for these compliments.

    However, there are following critical weaknesses in this manuscript that reduce the impact of this work; (1) The quantitativity of the MPRA output is questionable because of their incomplete definition of MPRA activity, which is based on absolute barcode counts without comparing negative controls. (2) Molecular mechanisms (binding transcription factors, etc.) of causative variants that underlie the regulation of CCR1/5 expression and COVID19 severity are not analyzed and validated.

    We hope that below we have addressed these comments through our analyses and new experiments.

    Reviewer #3 (Public Review):

    This manuscript by Jagoda et al. addresses the genetic mechanism of the haplotype at chromosome 3 where introgressed from Neanderthals shows the strong association with COVID-19 severity in Europeans. They re-evaluate the adoptively introgressed segment using Sprime and U and Q95 methods and analyze cis- and trans- eQTLs based on the whole blood dataset. All the 361 Sprime-identified introgressed variants act as eQTLs in the whole blood and alter the expression of 14 genes including seven chemokine receptor genes. Then they tested the 613 variants using a Massively Parallel Reporter Assay (MPRA) in K562 cells and narrow downed the 20 emVars. In the end, they selected the four variants based on four criteria regarding the association of COVID-19 severity, eQTL data, chromosomal interaction, and epigenetic marks in immune cells. They highlighted variant rs35454877 (CCR5 regulation), rs71327024, rs71327057, and rs34041956 (CCR1 regulation).

    Narrowing down the four critical variants from the around 800 kb introgressed region is impressive work. However, MPRA and eQTL data are not consistent, and these data don't support clinical gene expression data (increased expression of CCR1 in severe COVID-19 patients).

    We thank this Reviewer for noting our impressive work, we have now addressed these concerns.

  2. eLife assessment

    Scientists had previously discovered that humans and neanderthals mated leading to parts of neanderthal DNA becoming part of the human genome today. More recently, it was found that a genetic region, carrying which has been associated with manifestation of severe COVID-19 symptoms, is one that was "introgressed" into humans from neanderthals. This region contains many genetic variants, and this study set out to identify which of these genetic variants may be causally involved in creating severe symptoms in response to COVID-19 infection. The main critiques of the study stem from details of the functional assays to establish the regulatory role of the 4 variants in creating severe COVID-19 symptoms. In particular, the two genes (critical chemokine receptor genes: CCR1 and CCR5) that the authors identify as down-regulated by these variants are actually up-regulated in severe COVID-19 patients, leading to doubt about the role of these variants in changing response to COVID-19 through the regulation of these genes. In that regard, it seems necessary to conduct follow-up experimental and computational analyses to establish the role of these variants in altering CCR1 and CCR5 gene expression.

  3. Reviewer #1 (Public Review):

    COVID-19 severity has been previously linked to a genetic region on chromosome 3 introgressed from Neandertals. The authors use several computational methods to, within this region, identify specific regions that putatively regulate gene expression, and to identify genes within these regions associated with COVID-19 severity. The use of several complementary computational approaches is a major strength of the paper as it bolsters confidence in the findings and narrows the search for significant genomic regions down to most likely candidates. They find 14 genes that exhibit expression regulated by the identified introgressed genomic regions. Among these are several chemokine receptors including two - CCR1 and CCR5 - whose upregulation is associated with severe COVID-19. The authors then use functional genomics to determine whether the identified regions do regulate gene expression.

    In contrast to the robustness of the computational findings, the authors' MPRA results are less robust with respect to the significance of the paper to clinical severity of COVID-19. The MPRA shows that the computational methods were reasonably effective at identifying regulatory elements within the introgressed region (53%). The authors then focus on emVars where the H.n. allele differentially regulates expression and identify 4 putative emVars that may regulate expression of CCR1 and CCR5. However, the authors found in their MPRA that these emVars downregulate reporter gene expression, whereas the genes of interest CCR1 and CCR5 are upregulated during severe COVID.

    This result highlights the principal weakness of using the MPRA in this context, as it assumes that reporter gene expression using a minimal promoter has identical regulatory determinants as expression of the gene of interest. Its strength is the high-throughput nature of the assay, but its weakness is the lack of specificity with respect to the question at hand. This lack of specificity mitigates the impact of the functional aspect of the work. The authors' computational findings certainly bolster previous work that H.n. introgressed alleles are associated with COVID-19 severity and that this association may be at least partly dependent on gene expression differences between the archaic and modern alleles. However, the specific question at hand, whether chemokine receptor expression is linked to the clinical phenotype, remains unaddressed.

    Ultimately the authors results support the conclusions that the 4 emVars identified do regulate gene expression. However, the hypothesis that these specific regions are linked to COVID-19 severity is not supported. The authors' speculation as to why their results may differ from the observed upregulation during disease is intriguing, but lacks support.

  4. Reviewer #2 (Public Review):

    Previous research using GWAS and population genetics approach identified a genetic haplotype on chromosome 3 derived from Neanderthals as the major risk factor for severe COVID-19. However, the specific variants that are causative of the severe COVID-19 phenotype remain unknown. Here, Jagoda et al. aim to identify the causative variants for the severe COVID-19 by leveraging eQTL analysis followed by Massively parallel reporter assays (MPRA). Their datasets and results are unique and novel. Their research is well designed, and will serve as a model strategy for future studies of functional annotation of disease-associated variants. However, there are following critical weaknesses in this manuscript that reduce the impact of this work; (1) The quantitativity of the MPRA output is questionable because of their incomplete definition of MPRA activity, which is based on absolute barcode counts without comparing negative controls. (2) Molecular mechanisms (binding transcription factors, etc.) of causative variants that underly the regulation of CCR1/5 expression and COVID19 severity are not analyzed and validated.

  5. Reviewer #3 (Public Review):

    This manuscript by Jagoda et al. addresses the genetic mechanism of the haplotype at chromosome 3 where introgressed from Neanderthals shows the strong association with COVID-19 severity in Europeans. They re-evaluate the adoptively introgressed segment using Sprime and U and Q95 methods and analyze cis- and trans- eQTLs based on the whole blood dataset. All the 361 Sprime-identified introgressed variants act as eQTLs in the whole blood and alter the expression of 14 genes including seven chemokine receptor genes.

    Then they tested the 613 variants using a Massively Parallel Reporter Assay (MPRA) in K562 cells and narrow downed the 20 emVars. In the end, they selected the four variants based on four criteria regarding the association of COVID-19 severity, eQTL data, chromosomal interaction, and epigenetic marks in immune cells. They highlighted variant rs35454877 (CCR5 regulation), rs71327024, rs71327057, and rs34041956 (CCR1 regulation).

    Narrowing down the four critical variants from the around 800 kb introgressed region is impressive work. However, MPRA and eQTL data are not consistent, and these data don't support clinical gene expression data (increased expression of CCR1 in severe COVID-19 patients).

  6. SciScore for 10.1101/2021.06.12.448149: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Cell Lines
    SentencesResources
    Additionally, we included 44 control sequences from a past MPRA experiment performed in K562 cells (Jagoda et al. 2021) with the 22 strongest up-regulating sequences from this MPRA serving as positive controls and the 22 sequences with smallest magnitude of effect on expression serving as negative controls.
    K562
    suggested: None
    Recombinant DNA
    SentencesResources
    Barcoded sequences were initially cloned into pGL4:23:ΔxbaΔluc vectors and 4 sequencing libraries were prepared to sequence across the oligos and barcodes to determine oligo-barcode combinations within this mpraΔorf pool.
    pGL4:23:ΔxbaΔluc
    suggested: None
    Software and Algorithms
    SentencesResources
    Sequence read quality was checked with FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), with reads subsequently aligned to the human reference transcriptome (GRCh37.67) obtained from the ENSEMBL database (Hunt et al. 2018) which was indexed using the ‘index’ function of Salmon (version 0.14.0) (Patro et al. 2017) with a k-mer size of 31.
    FastQC
    suggested: (FastQC, RRID:SCR_014583)
    ENSEMBL
    suggested: (Ensembl, RRID:SCR_002344)
    Salmon
    suggested: (Salmon, RRID:SCR_017036)
    We then used Deseq2 to estimate whether an oligo sequence had an effect on transcription by calculating the log fold change (LFC) between the oligo count in the cDNA replicates compared with its count in the plasmid pool.
    Deseq2
    suggested: (DESeq2, RRID:SCR_015687)
    To intersect these human cCREs with our emVar data, we first used LiftOver (Hinrichs et al. 2006) to convert the our emVar coordinates from GRCh37 to GRCh38 and then used Bedtools intersect to search for emVars falling within cCREs.
    Bedtools
    suggested: (BEDTools, RRID:SCR_006646)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.