PAQman: reference-free ensemble evaluation of long-read eukaryotic genome assemblies

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Advances in long-read sequencing and assembly algorithms have made it easier and more cost effective to generate high-quality genome assemblies. However, assessing assembly quality remains challenging, as existing tools frequently rely on few metrics and/or require a reference assembly for comparison. To address this, we developed the Post-Assembly Quality manager (PAQman), a tool that evaluates seven reference-free features of genome quality: Contiguity, Gene content, Completeness, Accuracy, Correctness, Coverage, and Telomerality. PAQman integrates multiple commonly used tools alongside custom scripts, requiring users to provide only the genome assembly and its underlying long-read data, while providing a streamlined and consistent framework for quality assessment across datasets and organisms.

Summary

PAQman is an ensemble-based tool for the comprehensive and reference-free evaluation of eukaryotic genome assemblies derived from long-read sequencing data. The simultaneous integration of seven quality features allows users to easily assess assembly quality using a standardized and reproducible framework.

Availability and implementation

PAQman is distributed as a conda environment and an Apptainer image. Source code and documentation are freely available through the GitHub repository at https://github.com/samtobam/paqman and a Zenodo archive at https://doi.org/10.5281/zenodo.16039705 for v1.1.0 used in this study.

Article activity feed