Tiled-ClickSeq for targeted sequencing of complete coronavirus genomes with simultaneous capture of RNA recombination and minority variants

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The ability to fully resolve the whole genome of viral pathogens is often hampered by a multitude of different obstacles, one of which is optimally amplifying different regions of the genome. Part of this challenge lies in designing the best primers pairs that can consistently amplify PCR products despite the presence of changes (mutations) in the genome. The work described by Jaworski and colleagues can potentially provide an alternative approach that does not depend on primer pairs to fully sequence one such viral pathogen, SARS-CoV-2, and can also be applied towards other viral families.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

High-throughput genomics of SARS-CoV-2 is essential to characterize virus evolution and to identify adaptations that affect pathogenicity or transmission. While single-nucleotide variations (SNVs) are commonly considered as driving virus adaption, RNA recombination events that delete or insert nucleic acid sequences are also critical. Whole genome targeting sequencing of SARS-CoV-2 is typically achieved using pairs of primers to generate cDNA amplicons suitable for next-generation sequencing (NGS). However, paired-primer approaches impose constraints on where primers can be designed, how many amplicons are synthesized and requires multiple PCR reactions with non-overlapping primer pools. This imparts sensitivity to underlying SNVs and fails to resolve RNA recombination junctions that are not flanked by primer pairs. To address these limitations, we have designed an approach called ‘Tiled-ClickSeq’ , which uses hundreds of tiled-primers spaced evenly along the virus genome in a single reverse-transcription reaction. The other end of the cDNA amplicon is generated by azido-nucleotides that stochastically terminate cDNA synthesis, removing the need for a paired-primer. A sequencing adaptor containing a Unique Molecular Identifier (UMI) is appended to the cDNA fragment using click-chemistry and a PCR reaction generates a final NGS library. Tiled-ClickSeq provides complete genome coverage, including the 5’UTR, at high depth and specificity to the virus on both Illumina and Nanopore NGS platforms. Here, we analyze multiple SARS-CoV-2 isolates and clinical samples to simultaneously characterize minority variants, sub-genomic mRNAs (sgmRNAs), structural variants (SVs) and D-RNAs. Tiled-ClickSeq therefore provides a convenient and robust platform for SARS-CoV-2 genomics that captures the full range of RNA species in a single, simple assay.

Article activity feed

  1. Author Response:

    Reviewer #1:

    Click-Seq represents a novel method of sequencing RNA viruses such as SARS-CoV-2, with evidence of successfully sequencing the SARS-CoV-2 genome and identification of recombinations and variants. This does appear to be a potential advantage that needs a direct comparison with existing methods to be fully convincing.

    Thank you for your time and comments on our manuscript and approach.

    Specific comments:

    1. The actual sensitivity in terms of number of copies would be useful to know and tocompare with other methods. Here, cultures are used, not clinical samples that make this even more important

    We now present results from three independent batches of Tiled-ClickSeq libraries of 60 NP swabs obtained through routine diagnostics for COVID19. We compare genome coverage and genome completeness with CT values of these samples. This presents the utility and potential application of the method with different clinical specimens and illustrates that with only 18 cycles of PCR we can obtain high quality data with most samples at a CT < 25.

    1. Is the large difference in coverage across the genome shown in Fig 2B, due to methodological issues to random variation. How would this compare to coverage variation by the ARCTIC protocol by different methods

    If the reviewer is referring to the high-frequency and regular dips in coverage (which we refer to as ‘saw-teeth’) then this is an expected feature of the stochastic termination of the cDNA by the azido-nucleotides upstream of the tiled-primers. The sharp changes in coverage here are highly comparable to coverage in ARTIC protocols. We provide an equivalent read coverage map in the new SFig 2 when using the ARTIC approach of the same samples presented in Fig 2B.

    If the reviewer is referring to the difference in coverage from different tiled primers (e.g. at nt ~14000), then this is likely an issue with the specific primer used in the ‘v1’ set of primers initially used. The ‘v3’ primers presented in Fig4A illustrate that these drops in coverage are removed, which indeed is an advantage or our approach that allows for multiple closely spaced tiled-primers in the same RT-PCR reaction. To further illustrate sample-to-sample variability, we now present read coverage using Tiled-ClickSeq v3 primers for 60 clinical isolates at different CT values which gives an overview of the variation that can be expected across multiple samples with our method.

    Reviewer #2:

    The authors present a novel method of sequencing SARS-CoV-2, arguing its overcomes many limitations of other currently used methods, particularly the ARTIC protocol. Generally the method is interesting and encouraging to see these limitations can be overcome. Although the authors walk through evidence that their method can successfully sequence the SARS-CoV-2 genome and use the data to identify minor variants and recombination events, the manuscript doesn't contain any direct comparisons of their method with the ARTIC protocol. Consequently, the assertions made throughout the paper of reduced bias and increased sensitivity and utility are not supported empirically.

    Thank you for your time and comments on our manuscript. To address these concerns, we have provided substantial new data comparing to ARTIC protocols and applying our methods to study clinical sample, described further below in response to your specific comments.

    Specific comments:

    For instance, in figure 2, I think it is important to present an equivalent plot to Fig 2A for artic samples with equivalent read depths using both MiSeq and Nanopore. This sequence data could be obtained from the COG-UK data deposited on NCBI SRA, and sub-sampled to match sequence depth between methods.

    Thank you for your comments. We have provided this information in Supplementary Figure 2. Using the ARTIC approach, we sequenced the 12 WRCEVA isolates described in the manuscript and presented in Figure 3. As can be seen, peaks and troughs are observed in the ARTIC data, as is expected and previously reported.

    I specifically wonder if this approach only outperforms artic using Nanopore sequencing given the frequent drops in coverage observed in the MiSeq data.

    The frequent drops in coverage observed in the MiSeq data in figure 2 is a symptom of the first primer set we used (v1) that only contained 72 primers. Similar frequent drops in coverage are also observed in the ARTIC approach (e.g. as seen in SFig2). The v3 primer set that we subsequently developed is presented in Figure 4. As can be seen, the drops in coverage are largely removed. We further illustrate this in the new Supplementary Figure 4 where we provide coverage plots using the v3 primers for 60 clinical samples of SARS-CoV-2 at different CT values. As can be seen, the variability in coverage is greatly improved.

    An additional point about figure 2: I understand that this figure is based on the depth of a single run, I think readers that are interested in using this method would be interested to know about the run-to-run variability, so I think it would be a valuable addition to this manuscript to show the average read depth (relative to total nucleotides sequenced per sample) across multiple samples with confidence intervals or equivalent to visualize run-to-run variability.

    Thank you for this point. As mentioned above, we present a new Supplementary Figure 4 where we provide coverage plots using the v3 primers for 60 clinical samples of SARS-CoV-2 at different CT values. Run-to-run variability is additionally addressed in Figure 6A where we correlate genome completeness/coverage with CT values across three different NGS library preparations.

    Further, the authors describe previously detecting recombinant RNA molecules in SARS-CoV-2 in another manuscript, and highlight that the method presented in this manuscript can detect recombinant RNA molecules that could be missed using the artic protocol. Were any such RNA sequences observed in these samples, or was there perfect correspondence between the methods?

    As described above, in the revision, we describe the recombination analysis of multiple clinical samples of SARS-CoV-2. We provide an example of a large genome duplication (annotated as 29442^29323) found in multiple clinical samples, but not any cell-culture samples (providing support that these are not sequence artifacts). To our knowledge these have not been observed before. Our previous manuscript (Gribble et al, PLoS Path, 2021) used both random-primed RNAseq and direct RNA sequencing of poly(A)-enriched RNAs, rather than targeted approaches. Neither of these are currently feasible for clinical samples. Given the hundreds of different DVGs observed in our previous studies, it is not possible for there to be perfect correspondence. Nevertheless, the trends and distributions of RNA recombination events are very similar between our previous study and the ones presented here, as described in the manuscript.

    As well , the authors state: "Phylogenetic tree reconstruction using NextStrain (45) placed 10 of the isolates in the A2a clade (Fig 3D). Three of these isolates (WRCEVA_00506, WRCEVA_00510, WRCEVA_00515) were most closely related to European ancestors. Two isolates (WRCEVA_00508, WRCEVA_00513) were Clade B/B1 most closely related to Asian ancestors. Together, these data thus supported a model for multiple independent introductions of SARS-CoV-2 into the USA and subsequently into Galveston, Texas." This analysis seems out of place in the manuscript and not robust enough to support the claims made. How did the authors come to the conclusion that different sequences are of "European" or "Asian" origin? Due to the limited amount of genetic variation present in circulating strains prior to March 2020 combined with the wide geographic range that many genotypes were circulating, it is not enough to conclude the geographic origin of a viral isolate from clade membership alone.

    Thank you for this comment. We agree that this statement was not properly supported and have simply removed it in the revised manuscript.

    Reviewer #3:

    Strengths. While current NGS method(s), namely the ARTIC protocol, has made phenomenal contributions to resolving the genome of SARS-CoV-2, there is room for improvement. Towards this end, Jaworski and company have devised an alternative approach that utilizes a one-step RT PCR that combines ClickSeq with tiled amplification of the viral genome. This negates the use of primer pairs, which may encounter problems with amplification of structural variants. The method appears to be straightforward and amendable for sequencing on Illumina and Oxford platforms. The results generated do support the claims of the authors and have the potential to contribute significantly to understanding the evolutionary dynamics of SARS-CoV-2.

    Weaknesses. The main shortcoming of the manuscript in its current form is that the samples used for sequencing as proof of concept were cell-grown viral isolates and not directly of the samples. The method described has the potential for providing the field with an alternative to produce high quality sequence, but without performing the work directly on nasopharyngeal swab samples, then it may have limited used for public health laboratories, resource-poor environments or laboratories with little expertise in viral isolation, etc. Validation of the method can benefit if the authors can compare the quality of the sequence generated compared to the ARTIC protocol using primary samples rather than cell-grown viral isolates. It is difficult to assess whether this method will provide a viable alternative over current state-of-the-art protocols.

    Thank you for your comments and time reviewing our manuscript. To address these concerns, we have provided substantial new data where we apply the Tiled-ClickSeq approach to assay clinical specimens.

    Specific comments.

    The methods should include detailed steps in the construction of the NGS library, such as whether or not cDNA input has an impact in the quality of the data output, coverage etc.

    We have previously published detailed protocols describing how to make ClickSeq libraries emphasizing issues that affect success and quality of the output data. We have emphasized this point in the methods section. Assuming we continue to utilize and improve our design, we will release updates through online freely available resources such as protocols.io.

    To address these questions here: the input RNA (not cDNA) in the RT-PCR step is addressed in Figure 1. All the cDNA generated after RT is used as input in the subsequent steps and the click-reaction. We do believe that the quality of the input RNA in the clinical specimens is very important, however, beyond CT value, we have no viable way of measuring the quantity and quality of the tiny amounts of RNA that we extract from NP swabs.

    While the authors mentioned that equimolar of primers were used - there should be data to demonstrate that this results in even covering of the whole genome. Figure 2. There is a slight dip in the coverage at around 17000 to 18000 (Figure 2A) on both the Illumina and Oxford runs, do the authors know if it is due to the primer(s) covering that area and if so, have they tried to address this by improving the design.

    The dip in the coverage in Fig 2 is resolved by using the v3 primers presented in Figure 4. Additional coverage maps for clinical samples in SFig 3 also demonstrate this. Even coverage over the entire genome can be seen for the low CT value samples, which begins to wane in clinical samples with CT values greater than ~25, as described in the new main text and presented in the new Figure 6A.

    The different colors of the graph (Figure 2B) should be defined in the legend. Is the read depth a representation of both Illumina and Oxford runs - either way, this should be indicated.

    Fixed. Thank you.

  2. Evaluation Summary:

    The ability to fully resolve the whole genome of viral pathogens is often hampered by a multitude of different obstacles, one of which is optimally amplifying different regions of the genome. Part of this challenge lies in designing the best primers pairs that can consistently amplify PCR products despite the presence of changes (mutations) in the genome. The work described by Jaworski and colleagues can potentially provide an alternative approach that does not depend on primer pairs to fully sequence one such viral pathogen, SARS-CoV-2, and can also be applied towards other viral families.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    Click-Seq represents a novel method of sequencing RNA viruses such as SARS-CoV-2, with evidence of successfully sequencing the SARS-CoV-2 genome and identification of recombinations and variants. This does appear to be a potential advantage that needs a direct comparison with existing methods to be fully convincing.

    Specific comments:

    1. The actual sensitivity in terms of number of copies would be useful to know and tocompare with other methods. Here, cultures are used, not clinical samples that make this even more important.

    2. Is the large difference in coverage across the genome shown in Fig 2B, due to methodological issues to random variation. How would this compare to coverage variation by the ARCTIC protocol by different methods

  4. Reviewer #2 (Public Review):

    The authors present a novel method of sequencing SARS-CoV-2, arguing its overcomes many limitations of other currently used methods, particularly the ARTIC protocol. Generally the method is interesting and encouraging to see these limitations can be overcome. Although the authors walk through evidence that their method can successfully sequence the SARS-CoV-2 genome and use the data to identify minor variants and recombination events, the manuscript doesn't contain any direct comparisons of their method with the ARTIC protocol. Consequently, the assertions made throughout the paper of reduced bias and increased sensitivity and utility are not supported empirically.

    Specific comments:

    For instance, in figure 2, I think it is important to present an equivalent plot to Fig 2A for artic samples with equivalent read depths using both MiSeq and Nanopore. This sequence data could be obtained from the COG-UK data deposited on NCBI SRA, and sub-sampled to match sequence depth between methods. I specifically wonder if this approach only outperforms artic using Nanopore sequencing given the frequent drops in coverage observed in the MiSeq data.

    An additional point about figure 2: I understand that this figure is based on the depth of a single run, I think readers that are interested in using this method would be interested to know about the run-to-run variability, so I think it would be a valuable addition to this manuscript to show the average read depth (relative to total nucleotides sequenced per sample) across multiple samples with confidence intervals or equivalent to visualize run-to-run variability.

    Further, the authors describe previously detecting recombinant RNA molecules in SARS-CoV-2 in another manuscript, and highlight that the method presented in this manuscript can detect recombinant RNA molecules that could be missed using the artic protocol. Were any such RNA sequences observed in these samples, or was there perfect correspondence between the methods?

    As well , the authors state: "Phylogenetic tree reconstruction using NextStrain (45) placed 10 of the isolates in the A2a clade (Fig 3D). Three of these isolates (WRCEVA_00506, WRCEVA_00510, WRCEVA_00515) were most closely related to European ancestors. Two isolates (WRCEVA_00508, WRCEVA_00513) were Clade B/B1 most closely related to Asian ancestors. Together, these data thus supported a model for multiple independent introductions of SARS-CoV-2 into the USA and subsequently into Galveston, Texas." This analysis seems out of place in the manuscript and not robust enough to support the claims made. How did the authors come to the conclusion that different sequences are of "European" or "Asian" origin? Due to the limited amount of genetic variation present in circulating strains prior to March 2020 combined with the wide geographic range that many genotypes were circulating, it is not enough to conclude the geographic origin of a viral isolate from clade membership alone.

  5. Reviewer #3 (Public Review):

    Strengths. While current NGS method(s), namely the ARTIC protocol, has made phenomenal contributions to resolving the genome of SARS-CoV-2, there is room for improvement. Towards this end, Jaworski and company have devised an alternative approach that utilizes a one-step RT PCR that combines ClickSeq with tiled amplification of the viral genome. This negates the use of primer pairs, which may encounter problems with amplification of structural variants. The method appears to be straightforward and amendable for sequencing on Illumina and Oxford platforms. The results generated do support the claims of the authors and have the potential to contribute significantly to understanding the evolutionary dynamics of SARS-CoV-2.

    Weaknesses. The main shortcoming of the manuscript in its current form is that the samples used for sequencing as proof of concept were cell-grown viral isolates and not directly of the samples. The method described has the potential for providing the field with an alternative to produce high quality sequence, but without performing the work directly on nasopharyngeal swab samples, then it may have limited used for public health laboratories, resource-poor environments or laboratories with little expertise in viral isolation, etc. Validation of the method can benefit if the authors can compare the quality of the sequence generated compared to the ARTIC protocol using primary samples rather than cell-grown viral isolates. It is difficult to assess whether this method will provide a viable alternative over current state-of-the-art protocols.

    Specific comments. The methods should include detailed steps in the construction of the NGS library, such as whether or not cDNA input has an impact in the quality of the data output, coverage etc. While the authors mentioned that equimolar of primers were used - there should be data to demonstrate that this results in even covering of the whole genome.

    Figure 2. There is a slight dip in the coverage at around 17000 to 18000 (Figure 2A) on both the Illumina and Oxford runs, do the authors know if it is due to the primer(s) covering that area and if so, have they tried to address this by improving the design. The different colors of the graph (Figure 2B) should be defined in the legend. Is the read depth a representation of both Illumina and Oxford runs - either way, this should be indicated.

  6. SciScore for 10.1101/2021.03.10.434828: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.
    Cell Line Authenticationnot detected.

    Table 2: Resources

    Experimental Models: Cell Lines
    SentencesResources
    Viruses and RNA extraction: For WRCEVA isolates, viral RNA was obtained from supernatant materials of viral isolates amplified on Vero cells originally obtained from nasopharyngeal swab samples that tested positive in clinical laboratory assays for SARS-CoV-2 RNA, as described previously (26).
    Vero
    suggested: CLS Cat# 605372/p622_VERO, RRID:CVCL_0059)
    Wild-type and mutant SARS-CoV-2 were titrated and propagated on Vero E6 cells.
    Vero E6
    suggested: RRID:CVCL_XD71)
    Software and Algorithms
    SentencesResources
    Final NGS libraries containing fragment sizes ranging 300-700 nts were pooled and sequenced on Illumina MiSeq, MiniSeq or NextSeq platforms using paired-end sequencings.
    MiniSeq
    suggested: None
    Single-plex or pooled cDNA libraries with ONT adaptors were loaded onto MIN-FLO109 flowcells on a MinION Mk1C and sequenced using the MinKNOW controller software for >24 hours.
    MinION
    suggested: (MinION, RRID:SCR_017985)
    Bioinformatics: All batch scripts and custom python scripts used in this manuscript are available in Sdata 1.
    python
    suggested: (IPython, RRID:SCR_001658)
    These reads were mapped to the WA-1 strain (NC_045512.2) of SARS-CoV-2 using bowtie2 (33) and a new reference consensus genome was rebuilt for each dataset using pilon (34).
    bowtie2
    suggested: (Bowtie 2, RRID:SCR_016368)
    SAM files were manipulated using samtools (36) and de-duplicated using umi-tools (37).
    samtools
    suggested: (SAMTOOLS, RRID:SCR_002105)
    umi-tools
    suggested: (UMI-tools, RRID:SCR_017048)
    For Nanopore reads, porechop (https://github.com/rrwick/Porechop) was used to remove Illumina adaptor sequences and reads greater than 100nts in length were retained.
    https://github.com/rrwick/Porechop
    suggested: (Porechop, RRID:SCR_016967)
    Output SAM files were processed using samtools (36) and bedtools (40) to generate coverage maps.
    bedtools
    suggested: (BEDTools, RRID:SCR_006646)
    Data Availability Statement: All raw sequencing data (Illumina and Nanopore in FASTQ format) are available in the NCBI Small Read Archive with BioProject PRJNA707211.
    NCBI Small Read Archive
    suggested: None
    BioProject
    suggested: (NCBI BioProject, RRID:SCR_004801)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Together, using the Tiled ClickSeq approach, we have the opportunity to identify rare and unexpected recombination events and are not biased by the limitation of primer-pair approaches. Coupled with its cross-sequencing platform capabilities, the work highlights the utility of Tiled-ClickSeq for analysis of SARS-CoV-2.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.