Multiple approaches for massively parallel sequencing of HCoV-19 (SARS-CoV-2) genomes directly from clinical samples
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
COVID-19 has caused a major epidemic worldwide, however, much is yet to be known about the epidemiology and evolution of the virus. One reason is that the challenges underneath sequencing HCoV-19 directly from clinical samples have not been completely tackled. Here we illustrate the application of amplicon and hybrid capture (capture)-based sequencing, as well as ultra-high-throughput metatranscriptomic (meta) sequencing in retrieving complete genomes, inter-individual and intra-individual variations of HCoV-19 from clinical samples covering a range of sample types and viral load. We also examine and compare the bias, sensitivity, accuracy, and other characteristics of these approaches in a comprehensive manner. This is, to date, the first work systematically implements amplicon and capture approaches in sequencing HCoV-19, as well as the first comparative study across methods. Our work offers practical solutions for genome sequencing and analyses of HCoV-19 and other emerging viruses.
Article activity feed
-
SciScore for 10.1101/2020.03.16.993584: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement IRB: Ethics statement: The Institutional Review Boards (IRB) of the First Affiliated Hospital of Guangzhou Medical University approved the clinical studies. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Cell Line Authentication not detected. Table 2: Resources
Experimental Models: Cell Lines Sentences Resources Sequencing was attempted on all samples regardless of Ct value including negative controls prepared from nuclease-free water and NA12878 human gDNA. NA12878suggested: Coriell Cat# GM12878, RRID:CVCL_7526)Software and Algorithms Sentences Resources DNBs-based libraries were constructed and sequenced … SciScore for 10.1101/2020.03.16.993584: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement IRB: Ethics statement: The Institutional Review Boards (IRB) of the First Affiliated Hospital of Guangzhou Medical University approved the clinical studies. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Cell Line Authentication not detected. Table 2: Resources
Experimental Models: Cell Lines Sentences Resources Sequencing was attempted on all samples regardless of Ct value including negative controls prepared from nuclease-free water and NA12878 human gDNA. NA12878suggested: Coriell Cat# GM12878, RRID:CVCL_7526)Software and Algorithms Sentences Resources DNBs-based libraries were constructed and sequenced on the MGISEQ-2000 platform with paired-end 100 nt strategy using the same protocol described above, generating 37 Gb sequencing data for each sample on average. MGISEQ-2000suggested: (DNBSEQ-G400, RRID:SCR_017980)Identification of HCoV-19-like reads from Massively Parallel Sequencing data: For metatranscriptomic and hybrid capture sequencing data, total reads were first processed by Kraken v0.10.526 (default parameters) with a self-build database of Coronaviridae genomes (including SARS, MERS and HCoV-19 genome sequences downloaded from GISAID, NCBI and CNGB) to efficiently identify candidate viral reads with a loose manner. Krakensuggested: (Kraken, RRID:SCR_005484)These candidate reads were further qualified with fastp v0.19.527 (parameters: -q 20 -u 20 - n 1 -l 50) and SOAPnuke v1.5.628 (parameters: -l 20 -q 0.2 -E 50 -n 0.02 -5 0 -Q 2 -G -d) to remove low-quality reads, duplications and adaptor contaminations. SOAPnukesuggested: (SOAPnuke, RRID:SCR_015025)Low-complexity reads were next filtered by PRINSEQ v0.20.429 (parameters: -lc_method dust -lc_threshold 7). PRINSEQsuggested: (PRINSEQ, RRID:SCR_005454)Assembling viral genome: HCoV-19-like reads of metatranscriptomic and hybrid capture sequencing data were de novo assembled with SPAdes (v3.14.0)31 using the default settings to obtain virus genome sequences. SPAdessuggested: (SPAdes, RRID:SCR_000131)Due to the uneven read coverage in amplicon sequencing of HCoV-19, virus consensus sequences of amplicon samples were generated by Pilon v1.2332 (parameters: --changes - vcf --changes --vcf --mindepth 1 --fix all, amb). Pilonsuggested: (Pilon , RRID:SCR_014731)Assessment the coverage depth across the viral genome: HCoV-19-like reads of metatranscriptomic and hybrid capture sequencing data were aligned to the HCoV-19 reference genome (GISAID accession: EPI_ISL_402119) with BWA aln (v0.7.16)33. BWAsuggested: (BWA, RRID:SCR_010910)For each sample, we calculated the depth of coverage at each nucleotide position of the HCoV-19 reference genome with Samtools (v1.9)34 and scaled the values to the mean depth. Samtoolssuggested: (SAMTOOLS, RRID:SCR_002105)Consistency in variants calling performance among methods: Except for amplicon sequencing samples, variants calling in metatranscriptomic and hybrid capture sequencing samples was performed in the previous BAM files of identified HCoV-19 reads after removing duplications from alignment output by Picard Markduplicates (http://broadinstitute.github.io/picard). Picardsuggested: (Picard, RRID:SCR_006525)After QC, we mapped high-quality reads to hg19 and removed human ribosomal RNA (rRNA) reads by SOAP2 v2.2142 (parameters: - m 0 -x 1000 -s 28 -l 32 -v 5 -r 1), and the remaining RNA reads were then aligned to hg19 by HISAT243 with default settings to identify non-rRNA human transcripts as previously described. SOAP2suggested: NoneBracken45 (Bayesian Reestimation of Abundance with Kraken) was further applied to estimate microbial relative abundances based on taxonomic ranks of reads assigned by Kraken2. Kraken2suggested: NoneResults from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- No conflict of interest statement was detected. If there are no conflicts, we encourage authors to explicit state so.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
