PAQman: reference-free ensemble evaluation of long-read eukaryotic genome assemblies

Read the full article See related articles

Discuss this preprint

Start a discussion

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Advances in long-read sequencing and assembly algorithms have made it easier and more cost effective to generate high-quality genome assemblies. However, assessing assembly quality remains challenging, as existing tools frequently rely on few metrics and/or require a reference assembly for comparison. To address this, we developed the Post-Assembly Quality manager (PAQman), a tool that evaluates seven reference-free features of genome quality: Contiguity, Gene content, Completeness, Accuracy, Correctness, Coverage, and Telomerality. PAQman integrates multiple commonly used tools alongside custom scripts, requiring users to provide only the genome assembly and its underlying long-read data, while providing a streamlined and consistent framework for quality assessment across datasets and organisms.

Summary

PAQman is an ensemble-based tool for the comprehensive and reference-free evaluation of eukaryotic genome assemblies derived from long-read sequencing data. The simultaneous integration of seven quality features allows users to easily assess assembly quality using a standardized and reproducible framework.

Availability and implementation

PAQman is distributed as a conda environment and an Apptainer image. Source code and documentation are freely available through the GitHub repository at https://github.com/samtobam/paqman and a Zenodo archive at https://doi.org/10.5281/zenodo.16039705 for v1.1.0 used in this study.

Article activity feed