SwarmGenomics: A Unified Pipeline for Individual-Based Whole-Genome Analyses
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Advances in sequencing technologies have made whole-genome data widely accessible, enabling research in population genetics, evolutionary biology, and conservation. However, analyzing whole-genome sequencing (WGS) data remains challenging, often requiring multiple specialized tools and substantial bioinformatics expertise. We present SwarmGenomics, a modular, user-friendly command-line pipeline for reference-based genome assembly and individual-based genetic analyses. The pipeline integrates seven modules: heterozygosity estimation, runs of homozygosity detection, Pairwise Sequentially Markovian Coalescent (PSMC) analysis, unmapped reads classification, repeat analysis, mitochondrial genome assembly, and nuclear mitochondrial DNA segment (NUMT) identification. Each module can be run independently or as part of a complete workflow. We demonstrate the pipeline’s utility with a case study on the giant panda ( Ailuropoda melanoleuca ), revealing insights into genetic diversity, inbreeding history, historical population size changes, transposable element activity, and microbial contamination. SwarmGenomics lowers the entry barrier for genomic analysis of diploid, non-model species, serving both as a research and teaching tool. The pipeline and documentation are available at https://github.com/AureKylmanen/Swarmgenomics .