GraphK-LR: Enhancing Long-read Metagenomic Binning with Read-overlap Graphs Across Microbial Kingdoms

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Metagenomics, the study of genetic material from environmental samples, relies on binning - the process of grouping DNA sequences from the same organism to disentangle complex species mixtures. Recently, metagenomics has shown a rising interest in using long-reads from third-generation sequencing technologies to overcome the limitations of short-reads. These long-reads contain species-specific signals for direct grouping into taxonomic bins prior to assembly. Previous studies have successfully used nucleotide composition and coverage for binning long-reads. The advent of less error-prone sequencing technologies has paved the way for incorporating additional information to enhance binning accuracy. In this paper, we introduce GraphK-LR, a long-read binning refiner that uses connectivity information between the reads and machine-learning-based graph techniques to refine potentially misclassified reads from an initial binning tool. Additionally, our tool uses marker-gene-based kingdom-level analysis to address the challenge of species from different microbial kingdoms being present in the same metagenomic sample, making it complex to bin using existing tools. This approach is inspired by the multitude of short-read refiners, addressing the gap in the unavailability of refining tools for long reads. Results: We evaluated the tool using publicly available mock community datasets sequenced with Oxford Nanopore R.10.x chemistry, initially binned using the existing tools OBLR and LRBinner. Upon refinement, we observed a marginal improvement of 2-3\% in binning accuracy, which indicates that both these tools are highly effective at correctly binning reads. Another long-reads binning tool named SemiBin2, discarded nearly 20\% of reads while binning, and after refinement we observed a significant improvement ranging from 20-30\% in evaluation criteria. These results demonstrate that GraphK-LR adds an additional layer of accuracy over the binning tools, particularly in cases with unclassified reads. Conclusion: Although there is still room for further enhancement, our tool represents an important initial step in exploring the capacity to further improve the accuracy of long-read binning by combining existing methods with more sophisticated techniques. The underlying concept of GraphK-LR holds the potential to advance long-read-based metagenomics analyses across a wide range of applications. The source code for GraphK-LR can be found at https://github.com/NethmiRanasinghe/GraphK-LR.

Article activity feed