Within-host genomics of SARS-CoV-2

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Extensive global sampling and whole genome sequencing of the pandemic virus SARS-CoV-2 have enabled researchers to characterise its spread, and to identify mutations that may increase transmission or enable the virus to escape therapies or vaccines. Two important components of viral spread are how frequently variants arise within individuals, and how likely they are to be transmitted. Here, we characterise the within-host diversity of SARS-CoV-2, and the extent to which genetic diversity is transmitted, by quantifying variant frequencies in 1390 clinical samples from the UK, many from individuals in known epidemiological clusters. We show that SARS-CoV-2 infections are characterised by low levels of within-host diversity across the entire viral genome, with evidence of strong evolutionary constraint in Spike, a key target of vaccines and antibody-based therapies. Although within-host variants can be observed in multiple individuals in the same phylogenetic or epidemiological cluster, highly infectious individuals with high viral load carry only a limited repertoire of viral diversity. Most viral variants are either lost, or occasionally fixed, at the point of transmission, consistent with a narrow transmission bottleneck. These results suggest potential vaccine-escape mutations are likely to be rare in infectious individuals. Nonetheless, we identified Spike variants present in multiple individuals that may affect receptor binding or neutralisation by antibodies. Since the fitness advantage of escape mutations in highly-vaccinated populations is likely to be substantial, resulting in rapid spread if and when they do emerge, these findings underline the need for continued vigilance and monitoring.

Article activity feed

  1. SciScore for 10.1101/2020.05.28.118992: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Bioinformatics processing: De-multiplexed sequence read pairs were classified by Kraken v2 (39) using a custom database containing the human genome (GRCh38 build) and the full RefSeq set of bacterial and viral genomes (pulled May 2020)
    Kraken
    suggested: (Kraken, RRID:SCR_005484)
    Remaining reads, comprised of viral and unclassified reads, were trimmed in two stages: first to remove the random hexamer primers from the forward read and SMARTer TSO from the reverse read, and then to remove Illumina adapter sequences using Trimmomatic v0.36(40), with the ILLUMINACLIP options set to “2:10:7:1:true MINLEN:80”.
    Trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)
    Trimmed reads were mapped to the SARS-CoV-2 RefSeq genome of isolate Wuhan-Hu-1 (NC_045512.2), using shiver (41) v1.5.7, with either smalt(42) or bowtie2 (43) as the mapper.
    bowtie2
    suggested: (Bowtie 2, RRID:SCR_016368)
    The resulting set, along with the reference genome Wuhan-Hu-1 (RefSeq ID NC_045512), were aligned using MAFFT (46), with some manual improvement of the algorithmic alignment and removal of problematic sequences performed as a post-processing step.
    RefSeq
    suggested: (RefSeq, RRID:SCR_003496)
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Phylogenetic association of iSNVs and SNPs: Where an iSNV corresponded to a consensus SNP (by the base pair involved, not simply the site), we performed ancestral state reconstruction on the consensus trees using ClonalFrameML (48) to identify all branches upon which that substitution was involved.
    ClonalFrameML
    suggested: (Clonalframe, RRID:SCR_016060)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.