1. Author Response:

Reviewer #1:

By sequencing a large number of SARS-CoV-2 samples in duplicate and to high depth, the authors provide a detailed picture of the mutational processes that shape within-host diversity and go on to generate diversity at the global level.

1. Please add a description of the sequencing methods and how exactly the samples were replicated (two swaps? two RNA extractions? two RT-PCRs?). Have any limiting dilutions been done to quantify the relationship between RNA template input and CT values? Also, the read mapping/assembly pipeline needs to be described.

Limiting dilutions were not performed however the association between Ct and discordance between replicates was explored. Samples with Ct>=24 were found to have considerable discordance between replicates, likely resulting from a low number of input RNA molecules. This is described in the first section of the results and illustrated in Figure 1 - figure supplement 3.

We have now added additional sections to the methods to better describe the sequencing and mapping pipelines.

Sequencing: A single swab was taken for each sample. Two libraries were then generated from two aliquots of each sample with separate reverse transcription (RT), PCR amplification and library preparation steps in order to evaluate the quality and reproducibility of within-host variant calls. The ARTIC protocol v3 was used for library preparation (a full description of the protocol used available at dx.doi.org/10.17504/protocols.io.be3wjgpe).

Alignment and variant calling: Alignment was performed using the ARTIC Illumina nextflow pipeline available from https://github.com/connor-lab/ncov2019-artic-nf...

1. I find the way variants are reported rather unintuitive. Within-host variation is best characterized as minor variants relative to consensus (or first sample consensus when there are multiple samples). Reporting "Major Variants" along with minor variants conflates mutations accumulated prior to infection with diversity that arose within the host. The relative contributions of these two categories to the graphs in Fig 1 would for example be very different if this study was repeated now. Furthermore, it is unclear whether variants at 90% are reversions at 10% or within-host mutations at 90%. I'd suggest calling variants relative to the sample or patient consensus rather than relative to the reference sequence (as is the norm in most within-host sequencing studies of RNA viruses).

We are grateful for this comment and have tried to improve and clarify the reporting of variants to align with previous literature.

Our original classification intended to classify non-reference sites as fixed changes (VAF>95%) or within-host variants (which we called “minor variants”). While we chose 95% as a cutoff (which may have been confusing), the results are analogous with a 99% cutoff, as variants in this set essentially have VAF~100%, and nearly all are expected to have occurred in a previous host. Thus, the previous classification intended to cleanly separate inter-host (fixed) mutations from within-host mutations, to compare their patterns of selection and their mutation spectra.

Following the reviewer’s request, we have modified this classification to better align with other studies of RNA viruses by defining the majority allele at a site as the “consensus”. We note that the results remain largely similar, since the vast majority of within-host variants identified had a low VAFs (<<50%) with the majority/consensus allele most often corresponding to the reference (Wuhan) base.

When considering recurrent mutations we now discuss the number of times variants are observed at each location within a sample. This avoids the issue of how variants are polarised.

1. It is often unclear how numbers reported in the manuscript depend on various thresholds and parameters of the analysis pipeline. On page 2, for example, the median allele frequency will depend critically on the threshold used to call a variant, while the mean will depend on how variation is polarized. Why not report the mean of p(1-p) and show a cumulative histogram of iSNV frequencies on a log-log scale including. I think most of these analyses should be done without strict lower cut-offs or at least be done as a function of a cut-off. In contrast to analyses of cancer and bacteria, the mutation rates of the virus are on the same order of magnitude as errors introduced by RT-PCR and sequencing. Whether biological or technical variation dominates can be assessed straightforwardly, for example by plotting diversity at 1st, 2nd, and 3rd codon position as a function of the frequency threshold. See for example here:

There are more sophisticated ways of doing this, but simpler is better in my mind.

It would be good to explore how estimates of the mean number of mutations per genome (0.72) depend on the cut-offs used. A more robust estimate might be 2\sum_i p_i(1-p_i) (where p_i is the iSNV frequency at site i) as a measure of the expected number of differences between two randomly chosen genomes. Ideally, the results of viral RNA produced of a plasmid would be subtracted from this.

The reviewer raises a number of important points that we have tried to address and clarify.

We think that the quality of our variant calls is supported by several lines of evidence, including: (1) the use of the ShearwaterML calling algorithm, which uses a base-specific overdispersed error model and calls mutations only when read support is statistically above background noise in other genomes, (2) we use two independent replicates from the RT step, (3) we provide several biological signals that cannot be expected to arise from errors, including the fact that the mutation spectra of low VAF iSNVs called in our study recapitulate that of consensus mutations and the clear signal of negative selection acting on iSNVs. We note that this dN/dS analysis is closely related to the suggestion by the reviewer of comparing the frequency of mutations at positions 1/2/3 of a codon.

To address this comment in the manuscript, we have amended the text to include these arguments and we provide two new supplementary figures: (1) a figure of the frequency of mutations at the three codon positions, as requested by the reviewer, and (2) the mutation spectra of low VAF iSNVs, demonstrating the quality of the mutation calls. Similar to the finding in Dyrak et al., (2019), and as expected from the dN/dS ratios, the distribution of variant sites is dominated by variants at the third position and not equally distributed as one might expect if errors were dominating the signal.

We have amended the relevant section of the text to read:

“To reliably detect within-host variants with the ARTIC protocol, we used ShearwaterML, an algorithm designed to detect variants at low allele frequencies. ShearwaterML uses a base-specific overdispersed error model and calls mutations only when read support is statistically above background noise in other genomes \cite{Gerstung2014-av,Martincorena2015-ef} (Methods). Two samples were excluded, as they had an unusually high number of low frequency variants unlikely to be of biological origin, leaving 1,179 samples for analysis, comprising 1,121 infected individuals of whom 49 had multiple samples. For all analyses we used only within-host variants that were statistically supported by both replicates (q-value<0.05 in at least one replicate and p-value<0.01 in the other, Methods). Within each sample, we classified variant calls as consensus' if they were present in the majority of reads aligned to a position in the reference or as within-host variants otherwise. The allele frequency for each variant was taken as the frequency of the variant in the combined set of reads for both replicates.”

...

“The use of replicates and a base-specific statistical error model for calling within-host diversity reduces the risk of erroneous calls at low allele frequencies. We noticed a slight increase in the number of within-host diversity calls for samples with high Ct values, which may be caused by a small number of errors or by the amplification of rare alleles and that could inflate within-host diversity estimates (Figure 1 - figure supplement 3) \cite{McCrone2016-se}. However, the overall quality of the within-host mutation calls is supported by a number of biological signals. As described in the following sections, this includes the fact that the mutational spectrum of within-host mutations closely resembles that of consensus mutations and inter-host differences and the observation of a clear signal of negative selection from within-host mutations, as demonstrated by dN/dS and by an enrichment of within-host mutations at third codon positions \cite{Dyrdak2019-xk} (Figure 1 - figure supplement 4).”

Whilst we believe the remaining variant calls are reliable we acknowledge that how variants are polarised could impact some of the summary statistics reported. To help improve this we have amended Figure 1 to include a cumulative histogram of within-host variant frequencies on a log-log scale as suggested by the reviewer. We have also included estimates of the mean value of sqrt(p(1-p)) (indicating an estimate of the standard deviation of within-host variants assuming a Bernoulli distribution). We have also replaced the estimates of the mean number of mutations per genome with the expected number of differences between two randomly chosen genomes. The amended Figure 1C now displays a histogram of the expected number of differences between two genomes for each sample rather than the mean number of mutations.

1. This paper provides an important baseline characterization of within-host diversity, while the patterns themselves are not extremely surprising. It is thus important that the data are provided in a form that facilitates reuse. It would be helpful to provide intermediate analysis results in addition to the raw reads in the SRA and the shearwater calls. I would like to see simple csv tables with the number of times A,C,G,U,- was observed at every position in the genomes for every sample. This would greatly facilitate the reuse of the data.

We have now added raw count tables for each sample and each replicate to the GitHub repository. We have also archived this data using Zenodo to ensure it remains easily accessible.

Reviewer #2:

The paper by Tonkin-Hill and colleagues describes the analysis of intra-host variation across a large number of SARS-CoV-2 samples. The authors invested a lot of effort in replicate sequencing, allowing them to focus on more reliable data. They obtained several important insights regarding patterns of mutation and selection in this virus. Overall, this is an excellent paper that adds much novelty to our understanding of intra-host variation that develops during the time course of infection, its impact on transmission, and what we can or cannot learn on relationships between samples.

We are grateful to the reviewer for their positive comments.

Read the original source
2. Evaluation Summary:

Tonkin-Hill and colleagues present a large set of deep sequencing data from acute SARS-CoV-2 infections with each sample sequenced in duplicate. They use these data to characterize the within-host mutational patterns and diversity and relate them to SARS-CoV-2 diversity in consensus sequences sampled around the globe. It further allows understanding how this variation can or cannot be used to understand transmission dynamics and other applications in genomic epidemiology.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their names with the authors.)

Read the original source
3. Reviewer #1 (Public Review):

By sequencing a large number of SARS-CoV-2 samples in duplicate and to high depth, the authors provide a detailed picture of the mutational processes that shape within-host diversity and go on to generate diversity at the global level.

1. Please add a description of the sequencing methods and how exactly the samples were replicated (two swaps? two RNA extractions? two RT-PCRs?). Have any limiting dilutions been done to quantify the relationship between RNA template input and CT values? Also, the read mapping/assembly pipeline needs to be described.

2. I find the way variants are reported rather unintuitive. Within-host variation is best characterized as minor variants relative to consensus (or first sample consensus when there are multiple samples). Reporting "Major Variants" along with minor variants conflates mutations accumulated prior to infection with diversity that arose within the host. The relative contributions of these two categories to the graphs in Fig 1 would for example be very different if this study was repeated now. Furthermore, it is unclear whether variants at 90% are reversions at 10% or within-host mutations at 90%. I'd suggest calling variants relative to the sample or patient consensus rather than relative to the reference sequence (as is the norm in most within-host sequencing studies of RNA viruses).

3. It is often unclear how numbers reported in the manuscript depend on various thresholds and parameters of the analysis pipeline. On page 2, for example, the median allele frequency will depend critically on the threshold used to call a variant, while the mean will depend on how variation is polarized. Why not report the mean of p(1-p)` and show a cumulative histogram of iSNV frequencies on a log-log scale including. I think most of these analyses should be done without strict lower cut-offs or at least be done as a function of a cut-off. In contrast to analyses of cancer and bacteria, the mutation rates of the virus are on the same order of magnitude as errors introduced by RT-PCR and sequencing. Whether biological or technical variation dominates can be assessed straightforwardly, for example by plotting diversity at 1st, 2nd, and 3rd codon position as a function of the frequency threshold. See for example here:

There are more sophisticated ways of doing this, but simpler is better in my mind.

It would be good to explore how estimates of the mean number of mutations per genome (0.72) depend on the cut-offs used. A more robust estimate might be 2\sum_i p_i(1-p_i) (where p_i is the iSNV frequency at site i) as a measure of the expected number of differences between two randomly chosen genomes. Ideally, the results of viral RNA produced of a plasmid would be subtracted from this.

1. This paper provides an important baseline characterization of within-host diversity, while the patterns themselves are not extremely surprising. It is thus important that the data are provided in a form that facilitates reuse. It would be helpful to provide intermediate analysis results in addition to the raw reads in the SRA and the shearwater calls. I would like to see simple csv tables with the number of times A,C,G,U,- was observed at every position in the genomes for every sample. This would greatly facilitate the reuse of the data.
Read the original source
4. Reviewer #2 (Public Review):

The paper by Tonkin-Hill and colleagues describes the analysis of intra-host variation across a large number of SARS-CoV-2 samples. The authors invested a lot of effort in replicate sequencing, allowing them to focus on more reliable data. They obtained several important insights regarding patterns of mutation and selection in this virus. Overall, this is an excellent paper that adds much novelty to our understanding of intra-host variation that develops during the time course of infection, its impact on transmission, and what we can or cannot learn on relationships between samples.

Read the original source
5. Reviewer #3 (Public Review):

This study by Tonkin-Hill et al. analyzes the intrahost diversity of SARS-CoV-2 in patient samples collected in early 2020. The authors sequenced >1000 samples in duplicate to decrease errors in variant calling. They show that sequencing replicates have good concordance at higher viral loads. They investigate the abundance of within-host variants per specimen, strand biases in within-host variants, and assess within-host purifying selection by dN/dS analysis. They show that within-host variants arise recurrently across disparate genetic backgrounds, which is consistent with either mutation hotspots or positive selection. They also find evidence for a relatively small number of mixed infections.

Within-host diversity of SARS-CoV-2 is a topic of high interest in the fields of viral evolution and genomic epidemiology. This is a strong and timely analysis with several unique features that set it apart from previous studies. To my knowledge, this is the largest dataset for which there are sequencing duplicates, for which the authors should be commended. The finding of recurrent mutations across genetic backgrounds is highly important for genomic epidemiology. The authors rightly interpret these data as grounds for caution when using intrahost variants for transmission inference. My comments on this paper are largely related to data presentation and organization of the manuscript. There are also a few points where the authors could be clearer about their analytic choices and perhaps consider some caveats to their conclusions.

Read the original source
6. SciScore for 10.1101/2020.12.23.424229: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
SentencesResources
Selection analysis: Analyses of selection were carried out using the dNdScv package (28).
dNdScv
suggested: (dndSCV, RRID:SCR_017093)
Sites that have previously been identified to create difficulties in generating phylogenies were masked using bedtools v2.29.2 using the VCF described in De Maio et al, 2020 (30, 53).
bedtools
suggested: (BEDTools, RRID:SCR_006646)
Fasttree v2.1.11 was used to generate a maximum likelihood phylogeny (54).
Fasttree
suggested: (FastTree, RRID:SCR_015501)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:
• Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
• Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
• No protocol registration statement was detected.