XtractPAV: An Automated Pipeline for Identifying Presence–Absence Variations Across Multiple Genomes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Presence-absence variations (PAVs) significantly influence phenotypic diversity across and within species by modulating functional modules involved in stress responsiveness, adaptation, and developmental processes. This modulation ultimately contributes to genetic diversity at both inter- and intra-species levels. However, existing tools for detecting PAVs offer limitations in achieving optimal analysis because they lack scalable workflows for multi-genome comparisons and frequently necessitate manual integration. To address these challenges, we developed XtractPAV, an end-to-end pipeline that automates the extraction, annotation, and interactive visualization of PAVs across large-scale genomic datasets.
Results
XtractPAV was evaluated using assembled genomes of both eukaryotic and prokaryotic organisms, including Pyrus communis, Arabidopsis thaliana, Mus musculus , and Salmonella enterica , to assess its ability to detect the genomic variations across diverse species. The performance of XtractPAV was benchmarked against other established pipelines, demonstrating superior precision and a more comprehensive extraction of PAV segments. Notably, our pipeline not only identified the known PAVs from the reference set but also revealed novel variations in genes associated with various functions such as flowering time regulation and disease resistance. Furthermore, the automated report generation feature of XtractPAV produces publication-ready summaries of PAV distributions and related metrics.
Availability
XtractPAV is freely accessible at https://github.com/SherazAhmadd/XtractPAV and on the XtractPAV webpage. The Package includes all requisite files, a user manual, test data, and a license permitting non-commercial use.
Supplementary material
Supplementary data are accessible online at Bioinformatics .