PACIFIC: A lightweight deep-learning classifier of SARS-CoV-2 and co-infecting RNA viruses
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
Viral co-infections occur in COVID-19 patients, potentially impacting disease progression and severity. However, there is currently no dedicated method to identify viral co-infections in patient RNA-seq data. We developed PACIFIC, a deep-learning algorithm that accurately detects SARS-CoV-2 and other common RNA respiratory viruses from RNA-seq data. Using in silico data, PACIFIC recovers the presence and relative concentrations of viruses with >99% precision and recall. PACIFIC accurately detects SARS-CoV-2 and other viral infections in 63 independent in vitro cell culture and patient datasets. PACIFIC is an end-to-end tool that enables the systematic monitoring of viral infections in the current global pandemic.
Article activity feed
-
SciScore for 10.1101/2020.07.24.219097: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources We have used Python (version 3), scipy (v1.4.1), numpy (v1.18.1), scikit (v0.23.1), pandas (v1.0.1), tensorflow (v2.2.0), keras (v2.3.1), R (v3.6), tidyverse (v1.3.0), Biobase (v2.46.0) and Perl (v5.26) in our analysis. Pythonsuggested: (IPython, RRID:SCR_001658)scipysuggested: (SciPy, RRID:SCR_008058)Training data: We downloaded 362 virus genomes from the NCBI assembly database corresponding to five classes of single stranded RNA viruses (Table 4, Additional file 1: Table S1). NCBIsuggested: (NCBI, RRID:SCR_006472)We included Human GENCODE (48) canonical transcript sequences (downloaded … SciScore for 10.1101/2020.07.24.219097: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources We have used Python (version 3), scipy (v1.4.1), numpy (v1.18.1), scikit (v0.23.1), pandas (v1.0.1), tensorflow (v2.2.0), keras (v2.3.1), R (v3.6), tidyverse (v1.3.0), Biobase (v2.46.0) and Perl (v5.26) in our analysis. Pythonsuggested: (IPython, RRID:SCR_001658)scipysuggested: (SciPy, RRID:SCR_008058)Training data: We downloaded 362 virus genomes from the NCBI assembly database corresponding to five classes of single stranded RNA viruses (Table 4, Additional file 1: Table S1). NCBIsuggested: (NCBI, RRID:SCR_006472)We included Human GENCODE (48) canonical transcript sequences (downloaded from Ensembl v99 database (49)) as an additional class to distinguish sequencing reads derived from the human transcriptome. Ensemblsuggested: (Ensembl, RRID:SCR_002344)Synthetic data contained 150nt single end reads derived from seven classes; the five model virus classes, a human class, and an “unrelated” class composed of 32,550 distinct virus genomes downloaded from the NCBI Assembly database. NCBI Assemblysuggested: (NCBI Assembly Archive Viewer, RRID:SCR_012917)Detecting viruses in human datasets and comparison with other tools: We downloaded 63 RNA-seq experiments from NCBI SRA database. NCBI SRAsuggested: NoneWe compared PACIFIC’s predictions with two alternative methods for virus detection: an alignment-based approach using BWA-MEM (53), and a k-mer based approach using Kraken2 (19), described below. Kraken2suggested: NoneFor Kraken2, we first downloaded the Kraken taxonomy database and built a k-mer database using the same genomes used to train PACIFIC (Table 4). Krakensuggested: (Kraken, RRID:SCR_005484)To investigate the origin of reads for all reads in samples that were discordantly predicted for the presence of a virus class by PACIFIC, BWA-MEM or Kraken2, we used the BLAST suite (v2.10.1+) (54,55) to align reads to the NCBI nucleotide (nt) database, which includes sequences from all domains of life. BWA-MEMsuggested: (Sniffles, RRID:SCR_017619)BLASTsuggested: (BLASTX, RRID:SCR_001653)BLASTN was used with the following parameters: -task ‘megablast’ -max_target_seqs 1 -max_hsps 1 -evalue 1e-6 to query discordant viral class assignments between PACIFIC, BWA-MEM and Kraken2. BLASTNsuggested: (BLASTN, RRID:SCR_001598)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:To overcome these potential limitations, we used multiple diverse and independent simulated datasets reflecting realistic scenarios to validate the performance of PACIFIC. Importantly, PACIFIC was successfully applied to 63 RNA-seq datasets derived from infected cell cultures and patient samples for the detection of viral infections, demonstrating that PACIFIC can be applied to human-derived RNA-seq datasets and assist in clinical settings. In 2013, the World Health Organisation launched the Battle against Respiratory Viruses (BRaVe) initiative, which identified six research strategies to tackle and mitigate risks of death due to respiratory tract infections. One of the proposed strategies was to “improve severe acute respiratory infection diagnosis and diagnostic tests amongst others” (40). High-throughput sequencing-based approaches can provide immense diagnostic potential and facilitate molecular epidemiological studies, thereby contributing towards the BRaVe initiative’s goals (41,42). It is more important than ever to explore and determine the diagnostic potential of RNA-seq for the SARS-CoV-2 pandemic. A comprehensive study using multiplex RT-PCR and a sequencing-based metagenomic approach revealed that RNA-seq has sufficient sensitivity and specificity to be applicable in the clinic for respiratory viruses (42). However, the use of RNA-seq in diagnostic settings is often complicated due to complex analytical workflows (34,42). A typical workflow for virus detection in ...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
