LRMD: Reference-Free Misassembly Detection Based on Multiple Features from Long-Read Alignments
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genome assembly serves as the cornerstone of genomics research, with the detection of misassembly playing a crucial role in downstream analyses. Reference-free methods for misassembly detection, leveraging read alignments, enable us to circumvent the need for high-quality reference genomes and broaden their applicability. However, existing methods struggle to effectively utilize alignment data, leading to a noticeable deficiency in sensitivity for detecting misassemblies. We introduce LRMD, a novel reference-free tool for misassembly detection. LRMD integrates depth, clipping, and read pileup information derived from long-read-to-assembly alignments to significantly enhance sensitivity in identifying misassemblies. Experimental evaluations on both simulated and real datasets demonstrate that LRMD consistently outperforms existing tools in terms of sensitivity and F1-score. Notably, its results are closest to the reference-based evaluation tool QUAST. As an evaluation tool, LRMD also outputs metrics such as base quality, assembly size, contig N50, and others. LRMD is public available at http://github.com/sxfss/LRMD .