Temporal dynamics of SARS-CoV-2 mutation accumulation within and across infected hosts

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Analysis of SARS-CoV-2 genetic diversity within infected hosts can provide insight into the generation and spread of new viral variants and may enable high resolution inference of transmission chains. However, little is known about temporal aspects of SARS-CoV-2 intrahost diversity and the extent to which shared diversity reflects convergent evolution as opposed to transmission linkage. Here we use high depth of coverage sequencing to identify within-host genetic variants in 325 specimens from hospitalized COVID-19 patients and infected employees at a single medical center. We validated our variant calling by sequencing defined RNA mixtures and identified viral load as a critical factor in variant identification. By leveraging clinical metadata, we found that intrahost diversity is low and does not vary by time from symptom onset. This suggests that variants will only rarely rise to appreciable frequency prior to transmission. Although there was generally little shared variation across the sequenced cohort, we identified intrahost variants shared across individuals who were unlikely to be related by transmission. These variants did not precede a rise in frequency in global consensus genomes, suggesting that intrahost variants may have limited utility for predicting future lineages. These results provide important context for sequence-based inference in SARS-CoV-2 evolution and epidemiology.

Article activity feed

  1. SciScore for 10.1101/2021.01.19.427330: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board StatementIRB: These studies and the use of residual specimens were approved by the University of Michigan Institutional Review Board.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Analysis of sequence reads: We aligned reads to the MN908947.3 reference genome with BWA-MEM version 0.7.1539.
    BWA-MEM
    suggested: (Sniffles, RRID:SCR_017619)
    We identified single nucleotide variants with iVar 1.2.1 using the following parameters: sample with viral load ≥ 103 copies/μL; sample with consensus genome length of ≥ 29000; sample with ≥ 80% of genome sites above 200x coverage; iSNV frequency threshold of 2%; read depth of ≥ 100 at iSNV sites; ≥ 10 reads with average Phred score of > 35 supporting a given iSNV; iVar p-value of < 0.0001.
    Phred
    suggested: (Phred, RRID:SCR_001017)
    To generate a phylogenetic tree, we aligned consensus genomes with MUSCLE 3.8.31 and masked positions that are known to commonly exhibit homoplasies or sequencing errors41.
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    We generated a maximum likelihood phylogeny with IQ-TREE, using a GTR model and 1000 ultrafast bootstrap replicates42,43.
    IQ-TREE
    suggested: (IQ-TREE, RRID:SCR_017254)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

  2. SciScore for 10.1101/2021.01.19.427330: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.Randomizationnot detected.Blindingnot detected.Power Analysisnot detected.Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Analysis of sequence reads We aligned reads to the MN908947.3 reference genome with BWA-MEM version 0.7.1539.
    BWA-MEM
    suggested: (Sniffles, RRID:SCR_017619)
    We identified single nucleotide variants with iVar 1.2.1 using the following parameters: sample with viral load ≥ 103 copies/μL; sample with consensus genome length of ≥ 29000; sample with ≥ 80% of genome sites above 200x coverage; iSNV frequency threshold of 2%; read depth of ≥ 100 at iSNV sites; ≥ 10 reads with average Phred score of > 35 supporting a given iSNV; iVar p-value of < 0.0001.
    Phred
    suggested: (Phred, RRID:SCR_001017)
    To generate a phylogenetic tree, we aligned consensus genomes with MUSCLE 3.8.31 and masked positions that are known to commonly exhibit homoplasies or sequencing errors41.
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    We generated a maximum likelihood phylogeny with IQ-TREE, using a GTR model and ultrafast bootstrap replicates42,43.
    IQ-TREE
    suggested: (IQ-TREE, RRID:SCR_017254)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.


    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.