Within-patient genetic diversity of SARS-CoV-2

This article has been Reviewed by the following groups

Read the full article

Abstract

SARS-CoV-2, the virus responsible for the current COVID-19 pandemic, is evolving into different genetic variants by accumulating mutations as it spreads globally. In addition to this diversity of consensus genomes across patients, RNA viruses can also display genetic diversity within individual hosts, and co-existing viral variants may affect disease progression and the success of medical interventions. To systematically examine the intra-patient genetic diversity of SARS-CoV-2, we processed a large cohort of 3939 publicly-available deeply sequenced genomes with specialised bioinformatics software, along with 749 recently sequenced samples from Switzerland. We found that the distribution of diversity across patients and across genomic loci is very unbalanced with a minority of hosts and positions accounting for much of the diversity. For example, the D614G variant in the Spike gene, which is present in the consensus sequences of 67.4% of patients, is also highly diverse within hosts, with 29.7% of the public cohort being affected by this coexistence and exhibiting different variants. We also investigated the impact of several technical and epidemiological parameters on genetic heterogeneity and found that age, which is known to be correlated with poor disease outcomes, is a significant predictor of viral genetic diversity.

Author Summary

Since it arose in late 2019, the new coronavirus (SARS-CoV-2) behind the COVID-19 pandemic has mutated and evolved during its global spread. Individual patients may host different versions, or variants, of the virus, hallmarked by different mutations. We examine the diversity of genetic variants coexisting within patients across a cohort of 3939 publicly accessible samples and 749 recently sequenced samples from Switzerland. We find that a small number of patients carry most of the diversity, and that patients with more diversity tend to be older. We also find that most of the diversity is concentrated in certain regions and positions of the virus genome. In particular, we find that a variant reported to increase infectivity is among the most diverse positions. Our study provides a large-scale survey of within-patient diversity of the SARS-CoV-2 genome.

Article activity feed

  1. SciScore for 10.1101/2020.10.12.335919: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variableAll remaining samples were paired-end amplicon sequencing with PCR amplification from the study SRP253798, so those factors were also removed to provide the final regression:

    Of the 1043 samples, 480 (46.0%) were female and 563 (54.0%) were male, while the distribution of ages (Figure S4) has a median value of 46 and lower and upper quartiles at 29 and 60.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Public data: We retrieved data from the Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra) on June 10, 2020.
    Sequence Read Archive
    suggested: (DDBJ Sequence Read Archive, RRID:SCR_001370)
    suggested: (NCBI Sequence Read Archive (SRA, RRID:SCR_004891)
    Subsequent to downloading the selected sample set, we trimmed all read files using PRINSEQ ([58] version 0.20.4, parameters: -ns_max_n 4 -min_qual_mean 30 -trim_qual_left 30 -trim_qual_right 30 -trim_qual_window 10 -min_len <80% of average read length>), mapped them to NC_045512.2 using bwa ([59] version 0.7.17-r1188, subcommand: mem).
    PRINSEQ
    suggested: (PRINSEQ, RRID:SCR_005454)
    Data processing: We used V-pipe ([44]; sars-cov2 branch of https://github.com/cbg-ethz/V-pipe) to call variants for each sample using ShoRAH [60] and default settings, including discarding deletions with a frequency below 0.5%.
    ShoRAH
    suggested: (ShoRAH, RRID:SCR_005211)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    With this caveat, the most diverse gene is the Matrix M gene while highly diverse positions include a mix of low-frequency variants common to a quarter of the cohort or more, and rarer high-frequency subclonal mutations in around 5% of the cohort. The observation of common low-frequency and less common high-frequency genetic variants is in line with previous research on both intra- and inter-host genetic diversity of SARS-CoV-2 [21, 39]. The D614G variant, which appears to increase infectivity and is becoming more dominant over time [50] is the dominant variant in our public cohort. It also exhibits high intra-host diversity with 29.7% of the cohort experiencing subclonal mutations with the different variants coexisting. This diversity is mimicked in the data from Switzerland, where the D614G variant is actually encoded by the second most diverse genomic position.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • No conflict of interest statement was detected. If there are no conflicts, we encourage authors to explicit state so.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.