Intra-host Variation and Evolutionary Dynamics of SARS-CoV-2 Population in COVID-19 Patients

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

As of middle May 2020, the causative agent of COVID-19, SARS-CoV-2, has infected over 4 million people with more than 300 thousand death as official reports 1,2 . The key to understanding the biology and virus-host interactions of SARS-CoV-2 requires the knowledge of mutation and evolution of this virus at both inter- and intra-host levels. However, despite quite a few polymorphic sites identified among SARS-CoV-2 populations, intra-host variant spectra and their evolutionary dynamics remain mostly unknown. Here, using deep sequencing data, we achieved and characterized consensus genomes and intra-host genomic variants from 32 serial samples collected from eight patients with COVID-19. The 32 consensus genomes revealed the coexistence of different genotypes within the same patient. We further identified 40 intra-host single nucleotide variants (iSNVs). Most (30/40) iSNVs presented in single patient, while ten iSNVs were found in at least two patients or identical to consensus variants. Comparison of allele frequencies of the iSNVs revealed genetic divergence between intra-host populations of the respiratory tract (RT) and gastrointestinal tract (GIT), mostly driven by bottleneck events among intra-host transmissions. Nonetheless, we observed a maintained viral genetic diversity within GIT, showing an increased population with accumulated mutations developed in the tissue-specific environments. The iSNVs identified here not only show spatial divergence of intra-host viral populations, but also provide new insights into the complex virus-host interactions.

Article activity feed

  1. SciScore for 10.1101/2020.05.20.103549: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Full-length consensus genomes were generated from reads mapped to the reference genome (GISAID accession: EPI_ISL_402119) using Pilon (v. 1.23)16.
    Pilon
    suggested: (Pilon , RRID:SCR_014731)
    The collected coronaviridae-like reads were also de novo assembled using SPAdes (v. 3.14.0) with default settings17 with a maximum of 100-fold coverage of read data.
    SPAdes
    suggested: (SPAdes, RRID:SCR_000131)
    Nucleotide differences between the consensus sequences and the reference genome were summarized into artificial Variant Call Format (VCF) files, which were annotated using SnpEff (v.2.0.5)18 with default settings.
    SnpEff
    suggested: (SnpEff, RRID:SCR_005191)
    The assembled SARS-CoV-2 and selected representative genomes were aligned using MAFFT with default settings.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    A maximum likelihood (ML) tree was inferred using the software IQ-TREE (v.1.6.12)19, with the best fit nucleotide substitution model selected by ModelFinder from the same software.
    IQ-TREE
    suggested: (IQ-TREE, RRID:SCR_017254)
    The linkage disequilibrium among the identified consensus variants were estimated using VCFtools (v.0.1.16).
    VCFtools
    suggested: (VCFtools, RRID:SCR_001235)
    First, paired-end metatranscriptomic reads were mapped to the reference genome (GISAID accession: EPI_ISL_402119) using BWA aln (v.0.7.16) with default parameters22.
    BWA
    suggested: (BWA, RRID:SCR_010910)
    Duplicated reads were marked using Picard MarkDuplicates (v. 2.10.10) (http://broadinstitute.github.io/picard) with default settings.
    Picard
    suggested: (Picard, RRID:SCR_006525)
    A heatmap was generated to visualize the AAFs for all samples using the pheatmap package in R (v.3.6.1).
    pheatmap
    suggested: (pheatmap, RRID:SCR_016418)
    Statistics of iSNVs: The distribution of iSNVs among genetic components and patients were summarized and visualized using the Python package matplotlib (v.3.2.1)
    Python
    suggested: (IPython, RRID:SCR_001658)
    matplotlib
    suggested: (MatPlotLib, RRID:SCR_008624)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.