DeePhage: distinguishing virulent and temperate phage-derived sequences in metavirome data with a deep learning approach

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

Prokaryotic viruses referred to as phages can be divided into virulent and temperate phages. Distinguishing virulent and temperate phage–derived sequences in metavirome data is important for elucidating their different roles in interactions with bacterial hosts and regulation of microbial communities. However, there is no experimental or computational approach to effectively classify their sequences in culture-independent metavirome. We present a new computational method, DeePhage, which can directly and rapidly judge each read or contig as a virulent or temperate phage–derived fragment.

Findings

DeePhage uses a “one-hot” encoding form to represent DNA sequences in detail. Sequence signatures are detected via a convolutional neural network to obtain valuable local features. The accuracy of DeePhage on 5-fold cross-validation reaches as high as 89%, nearly 10% and 30% higher than that of 2 similar tools, PhagePred and PHACTS. On real metavirome, DeePhage correctly predicts the highest proportion of contigs when using BLAST as annotation, without apparent preferences. Besides, DeePhage reduces running time vs PhagePred and PHACTS by 245 and 810 times, respectively, under the same computational configuration. By direct detection of the temperate viral fragments from metagenome and metavirome, we furthermore propose a new strategy to explore phage transformations in the microbial community. The ability to detect such transformations provides us a new insight into the potential treatment for human disease.

Conclusions

DeePhage is a novel tool developed to rapidly and efficiently identify 2 kinds of phage fragments especially for metagenomics analysis. DeePhage is freely available via http://cqb.pku.edu.cn/ZhuLab/DeePhage or https://github.com/shufangwu/DeePhage.

Article activity feed

  1. Now published in GigaScience doi: 10.1093/gigascience/giab056

    Shufang Wu 1State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, China2Center for Quantitative Biology, Peking University, Beijing 100871, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteZhencheng Fang 1State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, China2Center for Quantitative Biology, Peking University, Beijing 100871, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteJie Tan 1State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, China2Center for Quantitative Biology, Peking University, Beijing 100871, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteMo Li 3Peking University-Tsinghua University - National Institute of Biological Sciences (PTN) joint PhD program, School of Life Sciences, Peking University, Beijing 100871, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteChunhui Wang 3Peking University-Tsinghua University - National Institute of Biological Sciences (PTN) joint PhD program, School of Life Sciences, Peking University, Beijing 100871, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteQian Guo 1State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, China2Center for Quantitative Biology, Peking University, Beijing 100871, China4Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Georgia 30332, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteCongmin Xu 1State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, China2Center for Quantitative Biology, Peking University, Beijing 100871, China4Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Georgia 30332, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteXiaoqing Jiang 1State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, China2Center for Quantitative Biology, Peking University, Beijing 100871, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteHuaiqiu Zhu 1State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, China2Center for Quantitative Biology, Peking University, Beijing 100871, China4Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Georgia 30332, USA5Institute of Medical Technology, Peking University Health Science Center, Beijing 100191, ChinaFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Huaiqiu ZhuFor correspondence: hqzhu@pku.edu.cn

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giab056 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    These peer reviews were as follows:

    Reviewer 1: http://dx.doi.org/10.5524/REVIEW.102812 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.102813