Poking COVID-19: Insights on Genomic Constraints among Immune-Related Genes between Qatari and Italian Populations

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Host genomic information, specifically genomic variations, may characterize susceptibility to disease and identify people with a higher risk of harm, leading to better targeting of care and vaccination. Italy was the epicentre for the spread of COVID-19 in Europe, the first country to go into a national lockdown and has one of the highest COVID-19 associated mortality rates. Qatar, on the other hand has a very low mortality rate. In this study, we compared whole-genome sequencing data of 14398 adults and Qatari-national to 925 Italian individuals. We also included in the comparison whole-exome sequence data from 189 Italian laboratory-confirmed COVID-19 cases. We focused our study on a curated list of 3619 candidate genes involved in innate immunity and host-pathogen interaction. Two population-gene metric scores, the Delta Singleton-Cohort variant score (DSC) and Sum Singleton-Cohort variant score (SSC), were applied to estimate the presence of selective constraints in the Qatari population and in the Italian cohorts. Results based on DSC and SSC metrics demonstrated a different selective pressure on three genes (MUC5AC, ABCA7, FLNA) between Qatari and Italian populations. This study highlighted the genetic differences between Qatari and Italian populations and identified a subset of genes involved in innate immunity and host-pathogen interaction.

Article activity feed

  1. SciScore for 10.1101/2021.10.04.21264507: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Population description: The Qatari Cohort: the Qatar Genome Program (QGP)[21] is a population-based project launched by the Qatar Foundation to generate a large-scale whole-genome sequence (WGS) dataset, in combination with comprehensive phenotypic information collected by the Qatar Biobank (QBB)[22].
    Qatar Genome Program
    suggested: None
    WGS
    suggested: None
    All data analyzed was aligned to the reference genome’s GRCh38 release, and functional annotations were obtained using the Ensembl VEP tool[26].
    Ensembl
    suggested: (Ensembl, RRID:SCR_002344)
    Principal component analysis: To highlight the study cohorts’ population structure level, we performed a principal component analysis (PCA) using KING software[27].
    KING
    suggested: (KING, RRID:SCR_009251)
    Plink v1.9 software[28] was used to convert data from vcf to plink binary format.
    Plink
    suggested: (PLINK, RRID:SCR_001757)
    QGP and each INGI cohort results were projected into the 1000Genomes Project data[29].
    Project
    suggested: (ARB project, RRID:SCR_000515)
    The primary gene list is curated according to the knowledge-literature base by the Ingenuity® Variant Analysis™ software from QIAGEN[30] and the viral gene panel expert from Genomics England (GEL)[31].
    Variant Analysis™
    suggested: None

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    This feature could be introduced by one of the limitations of this study: the sample size of the COVID-19 positive cohort. Increasing the number of cases will undoubtedly allow us to have a better estimate of the singleton distribution. Moreover, in our model, we didn’t include any of the risk factors that are already linked to a diverse response to the infection. One last limitation could be represented by the inclusion of only one cohort of COVID-19 positive samples, for which only Whole Exome sequence data was available. We chose to include this cohort due to the phenotypical characterization, which allowed us to investigate our hypothesis of a genetic contribution to the disease severity prioritized genes. Nevertheless, for all the cohorts involved, information on the COVID-19 affected samples is already being collected. That will allow us to produce more precise results with further analyses. To our knowledge, this is the first study performing a whole-genome population-level comparison between Arabian and European populations, both differently affected by the pandemic. Recent similar studies focused only on the ACE2 receptor and populations from the 1000Genomes Project[52] or compared allele frequencies on covid-19 related genes in the Brazilian population with data from the 1000Genome and gnomAD datasets[53]. With the development of new vaccines against SARS-CoV-2 infection, we are bound to see a decrease in adverse disease outcomes and disease severity among the immun...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.