BABAPPA: A Bash-Based Automated Pipeline for Codeml-Mediated Selection Analysis Integrating PRANK and IQ-TREE2
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Detecting signatures of positive selection in protein-coding genes is fundamental to understanding adaptive evolution. However, conventional workflows using codeml (PAML) require extensive manual configuration (e.g. editing control files) and are limited by sequential execution, which is impractical for large-scale analyses. To address these challenges, BABAPPA (Bash-Based Automated Parallel Positive Selection Analysis), a fully automated pipeline has been developed that streamlines every step of codeml analysis. BABAPPA integrates codon-aware sequence filtering, alignment, phylogenetic reconstruction, branch designation, model fitting, likelihood-ratio testing, and multiple-testing correction into a single reproducible workflow. The BABAPPA has been benchmarked on three Brassicaceae orthologous gene sets and found that it produces identical positive-selection results (after FDR correction) to a standard codeml approach, while dramatically reducing computation time. Mean wall-clock time dropped from 2,579 s (standard workflow) to 1,430 s with BABAPPA, a 44.6% reduction in runtime which is equivalent to an 80.3% speedup ( p = 8.227e-5). These gains are achieved without loss of sensitivity. The robust automation and efficiency of BABAPPA make it well-suited for large-scale genomic surveys of adaptive evolution.