Host transcriptional responses and SARS-CoV-2 isolates from the nasopharyngeal samples of Bangladeshi COVID-19 patients

This article has been Reviewed by the following groups

Read the full article

Abstract

As the COVID-19 pandemic progresses, fatality and cases of new infections are also increasing at an alarming rate. SARS-CoV-2 follows a highly variable course and it is becoming more evident that individual’s immune system has a decisive influence on the progression of the disease. However, the detailed underlying molecular mechanisms of the SARS-CoV-2 mediate disease pathogenesis are largely unknown. Only a few host transcriptional responses in COVID-19 have been reported so far from the Western world, but no such data has been generated from the South-Asian region yet to correlate the conjectured lower fatality around this part of the globe. In this context, we aimed to perform the transcriptomic profiling of the COVID-19 patients from Bangladesh along with the reporting of the SARS-CoV-2 isolates from these patients. Moreover, we performed a comparative analysis to demonstrate how differently the various SARS-CoV-2 infection systems are responding to the viral pathogen. We detected a unique missense mutation at 10329 position of ORF1ab gene, annotated to 3C like proteinase, which is found in 75% of our analyzed isolates; but is very rare globally. Upon the functional enrichment analyses of differentially modulated genes, we detected a similar host induced response reported earlier; this response was mainly mediated by the innate immune system, interferon stimulation, and upregulated cytokine expression etc. in the Bangladeshi patients. Surprisingly, we did not perceive the induction of apoptotic signaling, phagosome formation, antigen presentation and production, hypoxia response within these nasopharyngeal samples. Furthermore, while comparing with the other SARS-CoV-2 infection systems, we spotted that lung cells trigger the more versatile immune and cytokine signaling which was several folds higher compared to our reported nasopharyngeal samples. We also observed that lung cells did not express ACE2 in a very high amount as suspected, however, the nasopharyngeal cells are found overexpressing ACE2 . But the amount of DPP4 expression within the nasal samples was significantly lower compared to the other cell types. Surprisingly, we observed that lung cells express a very high amount of integrins compared to the nasopharyngeal samples, which might suggest the putative reasons for an increased amount of viral infections in the lungs. From the network analysis, we got clues on the probable viral modulation for the overexpression of these integrins. Our data will provide valuable insights in developing potential studies to elucidate the roles of ethnicity effect on the viral pathogenesis, and incorporation of further data will enrich the search of an effective therapeutics.

Article activity feed

  1. SciScore for 10.1101/2020.07.23.218198: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Data processing and identification of the viral agent: Firstly, the sequencing reads were adapter and quality trimmed using the Trimmomatic program [27].
    Trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)
    The remaining reads were mapped against the SARS-CoV-2 reference sequence (NC_045512.2) using Bowtie 2 [28].
    Bowtie
    suggested: (Bowtie, RRID:SCR_005476)
    Mapping of the RNA-seq reads onto SARS-CoV-2 reference genome: We mapped the normalized (by count per million mapped reads-CPM) RNA-seq reads onto the SARS-CoV-2 genome track of the UCSC genome browser [30] using the “bamCoverage” feature of deepTools2 suite [31].
    UCSC genome browser
    suggested: (UCSC Genome Browser, RRID:SCR_005780)
    Different representations showing the information regarding the variations were produced using the Microsoft Excel program [33].
    Microsoft Excel
    suggested: (Microsoft Excel, RRID:SCR_016137)
    The impacts of the variations were further characterized utilizing the Ensembl Variant Effect Predictor (VEP) tool [34].
    Ensembl Variant Effect Predictor
    suggested: None
    Variant
    suggested: (VARIANT, RRID:SCR_005194)
    We have checked the raw sequence quality using FastQC program (v0.11.9) [36] and found that the "Per base sequence quality", and "Per sequence quality scores" were high over the threshold for all sequences (Supplementary file 2).
    FastQC
    suggested: (FastQC, RRID:SCR_014583)
    The mapping of reads was done with TopHat (tophat v2.1.1 with Bowtie v2.4.1) [37].
    TopHat
    suggested: (TopHat, RRID:SCR_013035)
    After mapping, we used the SubRead package featureCount (v2.21) [41] to calculate absolute read abundance (read count, rc) for each transcript/gene associated to the Ensembl genes.
    SubRead
    suggested: (Subread, RRID:SCR_009803)
    Ensembl
    suggested: (Ensembl, RRID:SCR_002344)
    For differential expression (DE) analysis we used DESeq2 (v1.26.0) with R (v3.6.2; 2019-07-05) [42] that uses a model based on the negative binomial distribution.
    DESeq2
    suggested: (DESeq, RRID:SCR_000154)
    To assess the fidelity of the RNA-seq data used in this study and normalization method applied here, we checked the normalized Log2 expression data quality using R/Bioconductor package “arrayQualityMetrics (v3.44.0)” [43].
    R/Bioconductor
    suggested: None
    We also performed a multifactorial differential gene expression analysis using the edgeR tool [44] following the experimental design-(Sample A/control for sample A)/(Sample B/control for sample B).
    edgeR
    suggested: (edgeR, RRID:SCR_012802)
    Firstly, the genome sequences were aligned using MAFFT [46] tool using the auto-configuration.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Then we used MEGA X [47] for constructing the phylogenetic tree utilizing 500 bootstrapping with substitution model/method: maximum composite likelihood, uniform rates of variation among sites, the partial deletion of gaps/missing data and site coverage cutoff 95%.
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)
    We have utilized the Gene Ontology Biological Processes (GOBP) [49], Bioplanet pathways [50], KEGG pathway [51], and Reactome pathway [52] modules for the overrepresentation analysis.
    KEGG
    suggested: (KEGG, RRID:SCR_012773)
    Construction of biological networks: Construction, visualization, and analysis of biological networks with differentially expressed genes, their associated transcription factors, and interacting viral proteins were executed in the Cytoscape software (v3.8.0) [54].
    Cytoscape
    suggested: (Cytoscape, RRID:SCR_003032)
    We used the STRING [55] database to extract the highest confidences (0.9) edges only for the protein-protein interactions to reduce any false positive connection.
    STRING
    suggested: (STRING, RRID:SCR_005223)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on pages 40, 38 and 39. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.