tidygenclust : Clustering for Population Genetics in R

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Population structure analysis is crucial for evolutionary research and medical genomics. Clustering methods, broadly categorized as model-based (e.g. ADMIXTURE) or non-model-based (e.g. SCOPE), differ in their methodology and computational efficiency. Recently, fastmixture , a model-based approach, has improved scalability and performance, while replicate alignment tools, such as Clumppling, extend previous methods by also aligning the modes across K values. However, all the existing tools are standalone and generate numerous untracked text files, as well as offering limited plot customisability.

Results

We introduce an R package, tidygenclust , which brings the functionalities of the original ADMIXTURE, fastmixture and Clumppling software into R, enabling a streamlined and integrated workflow. By integrating with tidypopgen , a package designed to handle large SNP datasets, these new tools maintain metadata, simplify data handling, and produce results as customisable ggplot2 objects for flexible visualisation.

Conclusions

The R package tidygenclust advances population genetic analysis by combining computational efficiency with reproducible workflows and user-friendly plotting. The source code and instructions can be accessed on https://github.com/EvolEcolGroup/tidygenclust .

Article activity feed