Intra-host variation and evolutionary dynamics of SARS-CoV-2 populations in COVID-19 patients

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

Since early February 2021, the causative agent of COVID-19, SARS-CoV-2, has infected over 104 million people with more than 2 million deaths according to official reports. The key to understanding the biology and virus-host interactions of SARS-CoV-2 requires the knowledge of mutation and evolution of this virus at both inter- and intra-host levels. However, despite quite a few polymorphic sites identified among SARS-CoV-2 populations, intra-host variant spectra and their evolutionary dynamics remain mostly unknown.

Methods

Using high-throughput sequencing of metatranscriptomic and hybrid captured libraries, we characterized consensus genomes and intra-host single nucleotide variations (iSNVs) of serial samples collected from eight patients with COVID-19. The distribution of iSNVs along the SARS-CoV-2 genome was analyzed and co-occurring iSNVs among COVID-19 patients were identified. We also compared the evolutionary dynamics of SARS-CoV-2 population in the respiratory tract (RT) and gastrointestinal tract (GIT).

Results

The 32 consensus genomes revealed the co-existence of different genotypes within the same patient. We further identified 40 intra-host single nucleotide variants (iSNVs). Most (30/40) iSNVs presented in a single patient, while ten iSNVs were found in at least two patients or identical to consensus variants. Comparing allele frequencies of the iSNVs revealed a clear genetic differentiation between intra-host populations from the respiratory tract (RT) and gastrointestinal tract (GIT), mostly driven by bottleneck events during intra-host migrations. Compared to RT populations, the GIT populations showed a better maintenance and rapid development of viral genetic diversity following the suspected intra-host bottlenecks.

Conclusions

Our findings here illustrate the intra-host bottlenecks and evolutionary dynamics of SARS-CoV-2 in different anatomic sites and may provide new insights to understand the virus-host interactions of coronaviruses and other RNA viruses.

Article activity feed

  1. SciScore for 10.1101/2020.05.20.103549: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Full-length consensus genomes were generated from reads mapped to the reference genome (GISAID accession: EPI_ISL_402119) using Pilon (v. 1.23)16.
    Pilon
    suggested: (Pilon , RRID:SCR_014731)
    The collected coronaviridae-like reads were also de novo assembled using SPAdes (v. 3.14.0) with default settings17 with a maximum of 100-fold coverage of read data.
    SPAdes
    suggested: (SPAdes, RRID:SCR_000131)
    Nucleotide differences between the consensus sequences and the reference genome were summarized into artificial Variant Call Format (VCF) files, which were annotated using SnpEff (v.2.0.5)18 with default settings.
    SnpEff
    suggested: (SnpEff, RRID:SCR_005191)
    The assembled SARS-CoV-2 and selected representative genomes were aligned using MAFFT with default settings.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    A maximum likelihood (ML) tree was inferred using the software IQ-TREE (v.1.6.12)19, with the best fit nucleotide substitution model selected by ModelFinder from the same software.
    IQ-TREE
    suggested: (IQ-TREE, RRID:SCR_017254)
    The linkage disequilibrium among the identified consensus variants were estimated using VCFtools (v.0.1.16).
    VCFtools
    suggested: (VCFtools, RRID:SCR_001235)
    First, paired-end metatranscriptomic reads were mapped to the reference genome (GISAID accession: EPI_ISL_402119) using BWA aln (v.0.7.16) with default parameters22.
    BWA
    suggested: (BWA, RRID:SCR_010910)
    Duplicated reads were marked using Picard MarkDuplicates (v. 2.10.10) (http://broadinstitute.github.io/picard) with default settings.
    Picard
    suggested: (Picard, RRID:SCR_006525)
    A heatmap was generated to visualize the AAFs for all samples using the pheatmap package in R (v.3.6.1).
    pheatmap
    suggested: (pheatmap, RRID:SCR_016418)
    Statistics of iSNVs: The distribution of iSNVs among genetic components and patients were summarized and visualized using the Python package matplotlib (v.3.2.1)
    Python
    suggested: (IPython, RRID:SCR_001658)
    matplotlib
    suggested: (MatPlotLib, RRID:SCR_008624)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.