Intra-host Variation and Evolutionary Dynamics of SARS-CoV-2 Population in COVID-19 Patients
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
As of middle May 2020, the causative agent of COVID-19, SARS-CoV-2, has infected over 4 million people with more than 300 thousand death as official reports 1,2 . The key to understanding the biology and virus-host interactions of SARS-CoV-2 requires the knowledge of mutation and evolution of this virus at both inter- and intra-host levels. However, despite quite a few polymorphic sites identified among SARS-CoV-2 populations, intra-host variant spectra and their evolutionary dynamics remain mostly unknown. Here, using deep sequencing data, we achieved and characterized consensus genomes and intra-host genomic variants from 32 serial samples collected from eight patients with COVID-19. The 32 consensus genomes revealed the coexistence of different genotypes within the same patient. We further identified 40 intra-host single nucleotide variants (iSNVs). Most (30/40) iSNVs presented in single patient, while ten iSNVs were found in at least two patients or identical to consensus variants. Comparison of allele frequencies of the iSNVs revealed genetic divergence between intra-host populations of the respiratory tract (RT) and gastrointestinal tract (GIT), mostly driven by bottleneck events among intra-host transmissions. Nonetheless, we observed a maintained viral genetic diversity within GIT, showing an increased population with accumulated mutations developed in the tissue-specific environments. The iSNVs identified here not only show spatial divergence of intra-host viral populations, but also provide new insights into the complex virus-host interactions.
Article activity feed
-
SciScore for 10.1101/2020.05.20.103549: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Full-length consensus genomes were generated from reads mapped to the reference genome (GISAID accession: EPI_ISL_402119) using Pilon (v. 1.23)16. Pilonsuggested: (Pilon , RRID:SCR_014731)The collected coronaviridae-like reads were also de novo assembled using SPAdes (v. 3.14.0) with default settings17 with a maximum of 100-fold coverage of read data. SPAdessuggested: (SPAdes, RRID:SCR_000131)Nucleotide differences between the consensus sequences and the reference genome were summarized into artificial Variant Call Format (VCF) files, which were annotated using SnpEff (v.2.0.5)18 with … SciScore for 10.1101/2020.05.20.103549: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Full-length consensus genomes were generated from reads mapped to the reference genome (GISAID accession: EPI_ISL_402119) using Pilon (v. 1.23)16. Pilonsuggested: (Pilon , RRID:SCR_014731)The collected coronaviridae-like reads were also de novo assembled using SPAdes (v. 3.14.0) with default settings17 with a maximum of 100-fold coverage of read data. SPAdessuggested: (SPAdes, RRID:SCR_000131)Nucleotide differences between the consensus sequences and the reference genome were summarized into artificial Variant Call Format (VCF) files, which were annotated using SnpEff (v.2.0.5)18 with default settings. SnpEffsuggested: (SnpEff, RRID:SCR_005191)The assembled SARS-CoV-2 and selected representative genomes were aligned using MAFFT with default settings. MAFFTsuggested: (MAFFT, RRID:SCR_011811)A maximum likelihood (ML) tree was inferred using the software IQ-TREE (v.1.6.12)19, with the best fit nucleotide substitution model selected by ModelFinder from the same software. IQ-TREEsuggested: (IQ-TREE, RRID:SCR_017254)The linkage disequilibrium among the identified consensus variants were estimated using VCFtools (v.0.1.16). VCFtoolssuggested: (VCFtools, RRID:SCR_001235)First, paired-end metatranscriptomic reads were mapped to the reference genome (GISAID accession: EPI_ISL_402119) using BWA aln (v.0.7.16) with default parameters22. BWAsuggested: (BWA, RRID:SCR_010910)Duplicated reads were marked using Picard MarkDuplicates (v. 2.10.10) (http://broadinstitute.github.io/picard) with default settings. Picardsuggested: (Picard, RRID:SCR_006525)A heatmap was generated to visualize the AAFs for all samples using the pheatmap package in R (v.3.6.1). pheatmapsuggested: (pheatmap, RRID:SCR_016418)Statistics of iSNVs: The distribution of iSNVs among genetic components and patients were summarized and visualized using the Python package matplotlib (v.3.2.1) Pythonsuggested: (IPython, RRID:SCR_001658)matplotlibsuggested: (MatPlotLib, RRID:SCR_008624)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
