BABAPPA: A Codeml-Centered Bash-Based Positive Selection Analysis Pipeline with GUI Integration

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Codon-based maximum likelihood tests of positive selection implemented in codeml (PAML) are fundamental in molecular evolution studies. However, they remain laborious to execute at scale because of the complex sequence processing, model management, and statistical post-analysis required. Manual coordination of alignment, recombination detection, model fitting, and multiple testing across large datasets often leads to inefficiency and lack of reproducibility. BABAPPA (Bash-Based Automated Parallel Positive-selection Pipeline) is introduced as a modular, automated, and cross-platform solution that executes complete codeml workflows in a reproducible and computationally efficient manner. It performs sequence quality control, codon-aware alignment, optional trimming and recombination detection, model execution, likelihood ratio testing, and false discovery rate correction. The pipeline supports full CPU parallelization and includes a graphical user interface for Windows users through WSL integration. Benchmarking across three representatives Brassicaceae orthogroups demonstrated consistent runtime behavior, reproducible outputs across repeated runs, and efficient multi-thread scaling up to 690% CPU utilization, with all runs terminating successfully. Statistical testing confirmed the stability and predictability of the pipeline across datasets of differing complexity. BABAPPA is freely available under the MIT license at https://github.com/sinhakrishnendu/babappa.git . A Windows installer with GUI and WSL configuration is provided via Zenodo and FigShare.

Article activity feed