Apclusterv: Refinement of Viral Genome Clustering with Affinity Propagation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Clustering assemblies is a fundamental process of metagenomic analysis. In an era where researchers from a variety of export domains are conducting heavy efforts on viral metagenomics, unsupervised clustering becomes a critical bioinformatics tool to overcome the shortage of viral reference genomes with known taxonomy information. Results Here we present Apclusterv, a novel software for clustering viral genome assemblies in an unsupervised manner. Our clustering pipeline relies on gene prediction from contigs and protein sequence alignment. The program is implemented as an open-source Python package. Apclusterv integrates two clustering procedures: Markov Clustering (MCL) and Affinity Propagation (AP). MCL and AP are both clustering algorithms that can determine the number of clusters automatically. Also, they display great synergy in our work. In the task of clustering metagenomic assemblies of viral genomes, our algorithm shows significant improvement in the quality of clusters obtained. The software freely available at https://github.com/hbyaoherbert/Apclusterv Conclusions Assemblies of metagenomic reads are largely incomplete. Apclusterv resolves the limitation of short-reads assembly by identifying confident local alignments through a self-adaptive clustering system. The software can give accurate genera-level viral clusters from metagenomic contigs, which are critical to subsequent classification, Operation Taxonomy Unit (OUT) construction, or gene-sharing network analysis.

Article activity feed