Oligonucleotide capture sequencing of the SARS-CoV-2 genome and subgenomic fragments from COVID-19 individuals

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

The newly emerged and rapidly spreading SARS-CoV-2 causes coronavirus disease 2019 (COVID-19). To facilitate a deeper understanding of the viral biology we developed a capture sequencing methodology to generate SARS-CoV-2 genomic and transcriptome sequences from infected patients. We utilized an oligonucleotide probe-set representing the full-length genome to obtain both genomic and transcriptome (subgenomic open reading frames [ORFs]) sequences from 45 SARS-CoV-2 clinical samples with varying viral titers. For samples with higher viral loads (cycle threshold value under 33, based on the CDC qPCR assay) complete genomes were generated. Analysis of junction reads revealed regions of differential transcriptional activity and provided evidence of expression of ORF10. Heterogeneous allelic frequencies along the 20kb ORF1ab gene suggested the presence of a defective interfering viral RNA species subpopulation in one sample. The associated workflow is straightforward, and hybridization-based capture offers an effective and scalable approach for sequencing SARS-CoV-2 from patient samples.

Article activity feed

  1. SciScore for 10.1101/2020.07.27.223495: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

  2. SciScore for 10.1101/2020.07.27.223495: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.Randomizationnot detected.Blindingnot detected.Power Analysisnot detected.Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Trimmed non-human sequence reads were analyzed using the VirMAP19 pipeline where between 7- 86.4% of total reads from post-capture libraries mapped to the SARS-CoV-2 reference.
    VirMAP19
    suggested: None
    Genome reconstruction and genomic variations In order to assess the ability of the capture methodology to assemble full-length genomes, both the nine pre-capture and 45 post capture libraries were assembled using both the VirMAP pipeline and the SPAdes de novo assembler20.
    VirMAP
    suggested: None
    Data analysis Sequence Mapping, genome reconstruction and variant calling: Raw fastq sequences were processed using BBDuk (sourceforge.net/projects/bbmap/; BBMap version 38.82) to quality trim, remove Illumina adapters and filter PhiX reads.
    BBMap
    suggested: (BBmap, RRID:SCR_016965)
    Reads with a minimum average Phred quality score below 23 and length shorter than 50 bp after trimming were discarded.
    Phred
    suggested: (Phred, RRID:SCR_001017)
    SPAdes assembler20 was also used for genome reconstruction.
    SPAdes
    suggested: (SPAdes, RRID:SCR_000131)
    Plots were generated using R (version 3.6.1) and the tidyverse (version 1.3.0) and ggplot2 (version 3.2.1) packages.
    ggplot2
    suggested: (ggplot2, RRID:SCR_014601)
    For heterozygous variant analysis, the sequence reads were aligned to the reference genome using BWA-mem27 with default parameters, realigned using GATK28, and variants were called using Atlas-SNP229.
    BWA-mem27
    suggested: None
    Subgenomic mRNA and junction reads analysis: Illumina sequence reads were aligned to SARS-CoV-2 reference genome NC_045512.2 using STAR aligner v2.7.3a31 with penalties for non-canonical splicing turned off as described by Kim et al1
    STAR
    suggested: (STAR, RRID:SCR_015899)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.

  3. SciScore for 10.1101/2020.07.27.223495: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.Randomizationnot detected.Blindingnot detected.Power Analysisnot detected.Sex as a biological variablenot detected.Cell Line Authenticationnot detected.

    Table 2: Resources

    Experimental Models: Cell Lines
    SentencesResources
    Kim et al.1, reported SARS-CoV-2 quantitative expression in SARS-CoV-2 infected Vero cells (ATCC, CCL-81) based on junction reads obtained from Nanopore based direct RNA sequencing.
    Vero
    suggested: None
    ORF10 was also undetected in the other transcriptome study by Taiaroa et al., 2020 using ONT and SARS-CoV-2 infected Vero/hSLAM cells.
    Vero/hSLAM
    suggested: ECACC Cat# 04091501, CVCL_L037
    Software and Algorithms
    SentencesResources
    Trimmed non-human sequence reads were analyzed using the VirMAP19 pipeline where between 7- 86.4% of total reads from post-capture libraries mapped to the SARS-CoV-2 reference.
    VirMAP19
    suggested: None
    Genome reconstruction and genomic variations In order to assess the ability of the capture methodology to assemble full-length genomes, both the nine pre-capture and 45 post capture libraries were assembled using both the VirMAP pipeline and the SPAdes de novo assembler20.
    VirMAP
    suggested: None
    Trimming parameters were set to a k-mer length of 19 and a minimum Phred quality score of 25.
    Phred
    suggested: (Phred, SCR_001017)
    SPAdes assembler20 was also used for genome reconstruction.
    SPAdes
    suggested: (SPAdes, SCR_000131)
    Plots were generated using R (version 3.6.1) and the tidyverse (version 1.3.0) and ggplot2 (version 3.2.1) packages.
    ggplot2
    suggested: (ggplot2, SCR_014601)
    Alignments and reference mapping were done using mafft26 (version 1.4.0) and BBMap (version 38.82).
    BBMap
    suggested: (BBmap, SCR_016965)
    For heterozygous variant analysis, the sequence reads were aligned to the reference genome using BWA-mem27 with default parameters, realigned using GATK28, and variants were called using Atlas-SNP229.
    BWA-mem27
    suggested: None
    Subgenomic mRNA and junction reads analysis: Illumina sequence reads were aligned to SARS-CoV-2 reference genome NC_045512.2 using STAR aligner v2.7.3a31 with penalties for non-canonical splicing turned off as described by Kim et al1.
    STAR
    suggested: (STAR, SCR_015899)

    Data from additional tools added to each annotation on a weekly basis.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.