GraphK-LR: Enhancing Long-read Metagenomic Binning with Read-overlap Graphs Across Microbial Kingdoms

Nethmi Ranasinghe
Sathsarani Aththanayaka
Jayathri Ranasinghe
Vijini Mallawaarachchi
Damayanthi Herath

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Metagenomics, the study of genetic material from environmental samples, relies on binning - the process of grouping DNA sequences from the same organism to disentangle complex species mixtures. Recently, metagenomics has shown a rising interest in using long-reads from third-generation sequencing technologies to overcome the limitations of short-reads. These long-reads contain species-specific signals for direct grouping into taxonomic bins prior to assembly. Previous studies have successfully used nucleotide composition and coverage for binning long-reads. The advent of less error-prone sequencing technologies has paved the way for incorporating additional information to enhance binning accuracy. In this paper, we introduce GraphK-LR, a long-read binning refiner that uses connectivity information between the reads and machine-learning-based graph techniques to refine potentially misclassified reads from an initial binning tool. Additionally, our tool uses marker-gene-based kingdom-level analysis to address the challenge of species from different microbial kingdoms being present in the same metagenomic sample, making it complex to bin using existing tools. This approach is inspired by the multitude of short-read refiners, addressing the gap in the unavailability of refining tools for long reads. Results: We evaluated the tool using publicly available mock community datasets sequenced with Oxford Nanopore R.10.x chemistry, initially binned using the existing tools OBLR and LRBinner. Upon refinement, we observed a marginal improvement of 2-3\% in binning accuracy, which indicates that both these tools are highly effective at correctly binning reads. Another long-reads binning tool named SemiBin2, discarded nearly 20\% of reads while binning, and after refinement we observed a significant improvement ranging from 20-30\% in evaluation criteria. These results demonstrate that GraphK-LR adds an additional layer of accuracy over the binning tools, particularly in cases with unclassified reads. Conclusion: Although there is still room for further enhancement, our tool represents an important initial step in exploring the capacity to further improve the accuracy of long-read binning by combining existing methods with more sophisticated techniques. The underlying concept of GraphK-LR holds the potential to advance long-read-based metagenomics analyses across a wide range of applications. The source code for GraphK-LR can be found at https://github.com/NethmiRanasinghe/GraphK-LR.

Version published to 10.21203/rs.3.rs-7390699/v1 on Research Square
Sep 8, 2025

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

This article has 8 authors:
1. Louis-Maël Guéguen
2. Alban Mathieu
3. Simon Pelletier
4. Anthony Woo
5. Namita Misra
6. Magali Moreau
7. Olivier Perin
8. Arnaud Droit
This article has no evaluationsLatest version Jan 29, 2026
Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis

This article has 7 authors:
1. Can Luo
2. Yichen Liu
3. Han Liu
4. Zhenmiao Zhang
5. Lu Zhang
6. Brock Peters
7. Xin Maizie Zhou
This article has no evaluationsLatest version Jan 12, 2026
Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

This article has 7 authors:
1. Grazia Visci
2. Elisabetta Notario
3. Giuseppe Defazio
4. Mariano Francesco Caratozzolo
5. Bruno Fosso
6. Marinella Marzano
7. Graziano Pesole
This article has no evaluationsLatest version Jan 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world