LRMD: Reference-Free Misassembly Detection Based on Multiple Features from Long-Read Alignments

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Genome assembly serves as the cornerstone of genomics research, with the detection of misassembly playing a crucial role in downstream analyses. Reference-free methods for misassembly detection, leveraging read alignments, enable us to circumvent the need for high-quality reference genomes and broaden their applicability. However, existing methods struggle to effectively utilize alignment data, leading to a noticeable deficiency in sensitivity for detecting misassemblies. We introduce LRMD, a novel reference-free tool for misassembly detection. LRMD integrates depth, clipping, and read pileup information derived from long-read-to-assembly alignments to significantly enhance sensitivity in identifying misassemblies. Experimental evaluations on both simulated and real datasets demonstrate that LRMD consistently outperforms existing tools in terms of sensitivity and F1-score. Notably, its results are closest to the reference-based evaluation tool QUAST. As an evaluation tool, LRMD also outputs metrics such as base quality, assembly size, contig N50, and others. LRMD is public available at http://github.com/sxfss/LRMD .

Article activity feed