A Computational Toolset for Rapid Identification of SARS-CoV-2, other Viruses, and Microorganisms from Sequencing Data
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
In this paper, we present a toolset and related resources for rapid identification of viruses and microorganisms from short-read or long-read sequencing data. We present fastv as an ultra-fast tool to detect microbial sequences present in sequencing data, identify target microorganisms, and visualize coverage of microbial genomes. This tool is based on the k-mer mapping and extension method. K-mer sets are generated by UniqueKMER, another tool provided in this toolset. UniqueKMER can generate complete sets of unique k-mers for each genome within a large set of viral or microbial genomes. For convenience, unique k-mers for microorganisms and common viruses that afflict humans have been generated and are provided with the tools. As a lightweight tool, fastv accepts FASTQ data as input, and directly outputs the results in both HTML and JSON formats. Prior to the k-mer analysis, fastv automatically performs adapter trimming, quality pruning, base correction, and other pre-processing to ensure the accuracy of k-mer analysis. Specifically, fastv provides built-in support for rapid SARS-CoV-2 identification and typing. Experimental results showed that fastv achieved 100% sensitivity and 100% specificity for detecting SARS-CoV-2 from sequencing data; and can distinguish SARS-CoV-2 from SARS, MERS, and other coronaviruses. This toolset is available at: https://github.com/OpenGene/fastv .
Article activity feed
-
SciScore for 10.1101/2020.05.12.092163: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Although this can be done with a common aligner such as BWA or Bowtie2, these are not ideal for partial mapping of short sequences. BWAsuggested: (BWA, RRID:SCR_010910)Bowtie2suggested: (Bowtie 2, RRID:SCR_016368)The first dataset is the NCBI viral genomes RefSeq database [32], which can be found at https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/. RefSeqsuggested: (RefSeq, RRID:SCR_003496)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We …SciScore for 10.1101/2020.05.12.092163: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Although this can be done with a common aligner such as BWA or Bowtie2, these are not ideal for partial mapping of short sequences. BWAsuggested: (BWA, RRID:SCR_010910)Bowtie2suggested: (Bowtie 2, RRID:SCR_016368)The first dataset is the NCBI viral genomes RefSeq database [32], which can be found at https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/. RefSeqsuggested: (RefSeq, RRID:SCR_003496)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
