Investigating the likely association between genetic ancestry and COVID-19 manifestations

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

The novel coronavirus: severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has spread rapidly throughout the world leading to catastrophic consequences. However, SARS-CoV-2 infection has shown discernible variability across the globe. While in some countries people are recovering relatively quickly, in others, recovery times have been comparatively longer and number of individuals succumbing to it are high. This variability in coronavirus disease 2019 (COVID-19) susceptibility is suggestive of a likely association between the genetic-make up of affected individuals modulated by their ancestry and the severity of COVID-19 manifestations.

Objective

In this study, we aimed to evaluate the potential association between an individual’s genetic ancestry and the extent of COVID-19 disease presentation employing Europeans as the case study. In addition, using a genome wide association (GWAS) approach we sought to discern the putative single nucleotide polymorphism (SNP) markers and genes that may be likely associated with differential COVID-19 manifestations by comparative analyses of the European and East Asian genomes.

Method

To this end, we employed 10,215 ancient and modern genomes across the globe assessing 597,573 SNPs obtained from the databank of Dr. David Reich, Harvard Medical School, USA to evaluate the likely correlation between European ancestry and COVID-19 manifestations. Ancestry proportions were determined using qpAdm program implemented in AdmixTools v5.1. Pearson’s correlation coefficient (r) between various ancestry proportions of European genomes and COVID-19 death/recovery ratio was calculated and its significance was statistically evaluated. Genome wide association study (GWAS) was performed in PLINK v1.9 to investigate SNPs with significant allele frequency variations among European and East Asian genomes that likely correlated with differential COVID-19 infectivity.

Results

We found significant positive correlation ( r =0.58, P =0.03) between West European hunter gatherers (WHG) ancestral fractions and COVID-19 death/recovery ratio for data as of 5 th April 2020. This association discernibly amplified ( r =0.77, P =0.009) upon reanalyses based on data as of 30 th June 2020, removing countries with small sample sizes and adding those that are a bridge between Europe and Asia. Using GWAS we further identified 404 immune response related SNPs by comparing publicly available 753 genomes from various European countries against 838 genomes from various Eastern Asian countries. Prominently, we identified that SNPs associated with immune-system related pathways such as interferon stimulated antiviral response, adaptive and innate immune system and IL-6 dependent immune responses show significant differences in allele frequencies [Chi square values (≥1500; P ≈0)] between Europeans and East Asians.

Conclusion

So far, to the best of our knowledge, this is the first study investigating the likely association between host genetic ancestry and COVID-19 severity. These findings improve our overall understanding of the putative genetic modifiers of COVID-19 clinical presentation. We note that the development of effective therapeutics will benefit immensely from more detailed analyses of individual genomic sequence data from COVID-19 patients of varied ancestries.

Article activity feed

  1. SciScore for 10.1101/2020.04.05.20054627: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    File conversions and manipulations were performed using EIGENSTRAT (EIG) v7.2 [17] and PLINK v1.9 [18].
    PLINK
    suggested: (PLINK, RRID:SCR_001757)
    We used qpAdm [20] implemented in AdmixTools v5.1 [21,22] to estimate ancestry proportions in the European genomes originating from a mixture of ‘reference’ populations by utilizing shared genetic drift with a set of ‘outgroup’ populations.
    AdmixTools
    suggested: (ADMIXTOOLS, RRID:SCR_018495)
    Pearson’s correlation coefficient (r) between various ancestry proportions of European genomes and COVID-19 death/recovery ratio was calculated and its significance was statistically evaluated using GraphPad Prism v8.4.0,
    GraphPad Prism
    suggested: (GraphPad Prism, RRID:SCR_002798)
    GraphPad Software, San Diego, California USA [24]
    GraphPad
    suggested: (GraphPad Prism, RRID:SCR_002798)
    A Manhattan plot was generated in Haploview [29] by plotting Chi square values of all assessed SNPs to identify the SNPs that are likely associated with COVID-19 manifestations.
    Haploview
    suggested: (Haploview, RRID:SCR_003076)
    Highly significant SNPs were annotated using SNPnexus web-based server [30].
    SNPnexus
    suggested: (SNPnexus, RRID:SCR_005192)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Limitations and conclusion: The present study shines light of underlying genetic signatures that may be associated with disparate COVID-19 severity and manifestations in worldwide populations. Nevertheless we note that the current work has been performed using publicly available genomic data and a more robust understanding in this regard will emanate from sequencing/genotyping endeavours for COVID-19 patients across the spectrum of varied nationalities/ancestries and geographical locations, including individuals with mild to moderate symptoms, severe manifestations and death. We further note that since the current analyses is performed using pre-existing genomic data, there is a disparity in number of individuals sequenced among various populations which might influence the analyses in terms of statistical power and errors. We also note that populations from most European countries have higher mean ages compared to India, which may accentuate mortality rates among Europeans. However, as reported recently age alone may not suffice in exacerbating death/recovery ratios, and underlying health conditions such as cardiovascular disease, chronic kidney disease, hypertension, chronic respiratory disease, diabetes, cancers, HIV/AIDS, tuberculosis etc., many of which are already described as crucial comorbidity factors for COVID-19 may modify disease manifestations and prognosis in patients [54]. It is imperative to consider here that although the mean population age is lower in India...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.