Genomic epidemiology of SARS-CoV-2 in the UAE reveals novel virus mutation, patterns of co-infection and tissue specific host immune response

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

To unravel the source of SARS-CoV-2 introduction and the pattern of its spreading and evolution in the United Arab Emirates, we conducted meta-transcriptome sequencing of 1067 nasopharyngeal swab samples collected between May 9th and Jun 29th, 2020 during the first peak of the local COVID-19 epidemic. We identified global clade distribution and eleven novel genetic variants that were almost absent in the rest of the world and that defined five subclades specific to the UAE viral population. Cross-settlement human-to-human transmission was related to the local business activity. Perhaps surprisingly, at least 5% of the population were co-infected by SARS-CoV-2 of multiple clades within the same host. We also discovered an enrichment of cytosine-to-uracil mutation among the viral population collected from the nasopharynx, that is different from the adenosine-to-inosine change previously reported in the bronchoalveolar lavage fluid samples and a previously unidentified upregulation of APOBEC4 expression in nasopharynx among infected patients, indicating the innate immune host response mediated by ADAR and APOBEC gene families could be tissue-specific. The genomic epidemiological and molecular biological knowledge reported here provides new insights for the SARS-CoV-2 evolution and transmission and points out future direction on host–pathogen interaction investigation.

Article activity feed

  1. SciScore for 10.1101/2021.03.09.21252822: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Briefly, total reads were processed using Kraken v0.10.5 (default parameters) with a self-built database of Coronaviridae genomes (including SARS, MERS, and SARS-CoV-2 genome sequences downloaded from GISAID, NCBI, and CNGB) to identify Coronaviridae-like reads in a sensitive manner.
    Kraken
    suggested: (Kraken, RRID:SCR_005484)
    Fastp v0.19.5 (parameters: -q 20 -u 20 -n 1 −l 50) and SOAPnuke v1.5.6 (parameters: −l 20 -q 0.2 -E 50 -n 0.02 −5 0 -Q 2 -G -d) were used to remove low-quality reads, duplications, and adaptor contaminations.
    Fastp
    suggested: (fastp, RRID:SCR_016962)
    SOAPnuke
    suggested: (SOAPnuke, RRID:SCR_015025)
    Low-complexity reads were then removed using PRINSEQ v0.20.4 (parameters: -lc_method dust -lc_threshold 7).
    PRINSEQ
    suggested: (PRINSEQ, RRID:SCR_005454)
    Sequencing depth was measured using samtools depth using the default parameters.
    samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    SARS-CoV-2 consensus sequences were generated using Pilon v1.23 (parameters: --changes –vcf --changes --vcf --mindepth 10 --fix all, amb)16.
    Pilon
    suggested: (Pilon , RRID:SCR_014731)
    We have also applied de novo assembly of the Coronaviridae-like reads from samples with < 100× average sequencing depth using SPAdes (v3.14.0) with the default settings.
    SPAdes
    suggested: (SPAdes, RRID:SCR_000131)
    Jalview (v1.8.3) was used to perform multiple sequence alignment and estimate the conservativeness score of the mutations18.
    Jalview
    suggested: (Jalview, RRID:SCR_006459)
    Analysis of host ADAR and APOBEC gene expression: Reads were aligned to the human genome reference (GRCh38) using hisat2 (parameters: --phred64 --no-discordant --no-mixed -I 1 -X 1000 -p 4).
    hisat2
    suggested: (HISAT2, RRID:SCR_015530)
    We built a maximum likelihood phylogenetic tree using the Nextstrain pipeline; Augur v6.4.3 and MAFFT v7.455 for multiple sequence alignment and IQtree v1.6.12 for phylogenetic tree construction (25).
    Augur
    suggested: None
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    IQtree
    suggested: None
    FigTree v1.4.4 was used to visualize and annotate the phylogenetic tree.
    FigTree
    suggested: (FigTree, RRID:SCR_008515)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.