tidygenclust : Clustering for Population Genetics in R
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Population structure analysis is crucial for evolutionary research and medical genomics. Clustering methods, broadly categorized as model-based (e.g. ADMIXTURE) or non-model-based (e.g. SCOPE), differ in their methodology and computational efficiency. Recently, fastmixture , a model-based approach, has improved scalability and performance, while replicate alignment tools, such as Clumppling, extend previous methods by also aligning the modes across K values. However, all the existing tools are standalone and generate numerous untracked text files, as well as offering limited plot customisability.
Results
We introduce an R package, tidygenclust , which brings the functionalities of the original ADMIXTURE, fastmixture and Clumppling software into R, enabling a streamlined and integrated workflow. By integrating with tidypopgen , a package designed to handle large SNP datasets, these new tools maintain metadata, simplify data handling, and produce results as customisable ggplot2 objects for flexible visualisation.
Conclusions
The R package tidygenclust advances population genetic analysis by combining computational efficiency with reproducible workflows and user-friendly plotting. The source code and instructions can be accessed on https://github.com/EvolEcolGroup/tidygenclust .