Multiple approaches for massively parallel sequencing of SARS-CoV-2 genomes directly from clinical samples

Abstract

Background

COVID-19 (coronavirus disease 2019) has caused a major epidemic worldwide; however, much is yet to be known about the epidemiology and evolution of the virus partly due to the scarcity of full-length SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) genomes reported. One reason is that the challenges underneath sequencing SARS-CoV-2 directly from clinical samples have not been completely tackled, i.e., sequencing samples with low viral load often results in insufficient viral reads for analyses.

Methods

We applied a novel multiplex PCR amplicon (amplicon)-based and hybrid capture (capture)-based sequencing, as well as ultra-high-throughput metatranscriptomic (meta) sequencing in retrieving complete genomes, inter-individual and intra-individual variations of SARS-CoV-2 from serials dilutions of a cultured isolate, and eight clinical samples covering a range of sample types and viral loads. We also examined and compared the sensitivity, accuracy, and other characteristics of these approaches in a comprehensive manner.

Results

We demonstrated that both amplicon and capture methods efficiently enriched SARS-CoV-2 content from clinical samples, while the enrichment efficiency of amplicon outran that of capture in more challenging samples. We found that capture was not as accurate as meta and amplicon in identifying between-sample variations, whereas amplicon method was not as accurate as the other two in investigating within-sample variations, suggesting amplicon sequencing was not suitable for studying virus-host interactions and viral transmission that heavily rely on intra-host dynamics. We illustrated that meta uncovered rich genetic information in the clinical samples besides SARS-CoV-2, providing references for clinical diagnostics and therapeutics. Taken all factors above and cost-effectiveness into consideration, we proposed guidance for how to choose sequencing strategy for SARS-CoV-2 under different situations.

Conclusions

This is, to the best of our knowledge, the first work systematically investigating inter- and intra-individual variations of SARS-CoV-2 using amplicon- and capture-based whole-genome sequencing, as well as the first comparative study among multiple approaches. Our work offers practical solutions for genome sequencing and analyses of SARS-CoV-2 and other emerging viruses.

SciScore for 10.1101/2020.03.16.993584: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	IRB: Ethics statement: The Institutional Review Boards (IRB) of the First Affiliated Hospital of Guangzhou Medical University approved the clinical studies.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.
Cell Line Authentication	not detected.

Table 2: Resources

Experimental Models: Cell Lines
Sentences	Resources
Sequencing was attempted on all samples regardless of Ct value including negative controls prepared from nuclease-free water and NA12878 human gDNA.	NA12878 suggested: Coriell Cat# GM12878, RRID:CVCL_7526)
Software and Algorithms
Sentences	Resources
DNBs-based libraries were constructed and sequenced …

SciScore for 10.1101/2020.03.16.993584: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	IRB: Ethics statement: The Institutional Review Boards (IRB) of the First Affiliated Hospital of Guangzhou Medical University approved the clinical studies.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.
Cell Line Authentication	not detected.

Table 2: Resources

Experimental Models: Cell Lines
Sentences	Resources
Sequencing was attempted on all samples regardless of Ct value including negative controls prepared from nuclease-free water and NA12878 human gDNA.	NA12878 suggested: Coriell Cat# GM12878, RRID:CVCL_7526)
Software and Algorithms
Sentences	Resources
DNBs-based libraries were constructed and sequenced on the MGISEQ-2000 platform with paired-end 100 nt strategy using the same protocol described above, generating 37 Gb sequencing data for each sample on average.	MGISEQ-2000 suggested: (DNBSEQ-G400, RRID:SCR_017980)
Identification of HCoV-19-like reads from Massively Parallel Sequencing data: For metatranscriptomic and hybrid capture sequencing data, total reads were first processed by Kraken v0.10.526 (default parameters) with a self-build database of Coronaviridae genomes (including SARS, MERS and HCoV-19 genome sequences downloaded from GISAID, NCBI and CNGB) to efficiently identify candidate viral reads with a loose manner.	Kraken suggested: (Kraken, RRID:SCR_005484)
These candidate reads were further qualified with fastp v0.19.527 (parameters: -q 20 -u 20 - n 1 -l 50) and SOAPnuke v1.5.628 (parameters: -l 20 -q 0.2 -E 50 -n 0.02 -5 0 -Q 2 -G -d) to remove low-quality reads, duplications and adaptor contaminations.	SOAPnuke suggested: (SOAPnuke, RRID:SCR_015025)
Low-complexity reads were next filtered by PRINSEQ v0.20.429 (parameters: -lc_method dust -lc_threshold 7).	PRINSEQ suggested: (PRINSEQ, RRID:SCR_005454)
Assembling viral genome: HCoV-19-like reads of metatranscriptomic and hybrid capture sequencing data were de novo assembled with SPAdes (v3.14.0)31 using the default settings to obtain virus genome sequences.	SPAdes suggested: (SPAdes, RRID:SCR_000131)
Due to the uneven read coverage in amplicon sequencing of HCoV-19, virus consensus sequences of amplicon samples were generated by Pilon v1.2332 (parameters: --changes - vcf --changes --vcf --mindepth 1 --fix all, amb).	Pilon suggested: (Pilon , RRID:SCR_014731)
Assessment the coverage depth across the viral genome: HCoV-19-like reads of metatranscriptomic and hybrid capture sequencing data were aligned to the HCoV-19 reference genome (GISAID accession: EPI_ISL_402119) with BWA aln (v0.7.16)33.	BWA suggested: (BWA, RRID:SCR_010910)
For each sample, we calculated the depth of coverage at each nucleotide position of the HCoV-19 reference genome with Samtools (v1.9)34 and scaled the values to the mean depth.	Samtools suggested: (SAMTOOLS, RRID:SCR_002105)
Consistency in variants calling performance among methods: Except for amplicon sequencing samples, variants calling in metatranscriptomic and hybrid capture sequencing samples was performed in the previous BAM files of identified HCoV-19 reads after removing duplications from alignment output by Picard Markduplicates (http://broadinstitute.github.io/picard).	Picard suggested: (Picard, RRID:SCR_006525)
After QC, we mapped high-quality reads to hg19 and removed human ribosomal RNA (rRNA) reads by SOAP2 v2.2142 (parameters: - m 0 -x 1000 -s 28 -l 32 -v 5 -r 1), and the remaining RNA reads were then aligned to hg19 by HISAT243 with default settings to identify non-rRNA human transcripts as previously described.	SOAP2 suggested: None
Bracken45 (Bayesian Reestimation of Abundance with Kraken) was further applied to estimate microbial relative abundances based on taxonomic ranks of reads assigned by Kraken2.	Kraken2 suggested: None

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

No conflict of interest statement was detected. If there are no conflicts, we encourage authors to explicit state so.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Multiple approaches for massively parallel sequencing of SARS-CoV-2 genomes directly from clinical samples

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

“Multiplex RT-PCR for SARS-CoV-2 variant surveillance in resource-limited settings: an in-house validation study in Cuba”

PCR-free, targeted genomic sequencing using Dynamically optimized reference Adaptive Sampling (DORAS)

Landscape of non-SARS-CoV-2 respiratory virus sequence data in Africa

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

“Multiplex RT-PCR for SARS-CoV-2 variant surveillance in resource-limited settings: an in-house validation study in Cuba”

PCR-free, targeted genomic sequencing using Dynamically optimized reference Adaptive Sampling (DORAS)

Landscape of non-SARS-CoV-2 respiratory virus sequence data in Africa