lcUMAPtSNE: Use of non-linear dimensionality reduction techniques with genotype likelihoods

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

  • Understanding population structure is essential for conservation genetics, as it provides insights into population connectivity and supports the development of targeted strategies to preserve genetic diversity and adaptability.

  • T-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) have proven effective for revealing population genetic structures in human and model organisms using hard-called genotypes, but their application in wild species using genotype likelihoods from low coverage sequencing (as a cost-saving measure) remains underexplored.

  • Here, we present a Jupyter Notebook-based workflow that facilitates the use of UMAP and t-SNE on genotype likelihood-derived principal components.

  • This workflow is demonstrated using medium to low-coverage whole-genome sequencing data from scimitar-horned oryx, which has been reintroduced into the wild and faces multiple conservation challenges.

  • Detailed guidance on hyperparameter tuning and practical implementation is also provided, enhancing the application of these methods in wildlife genetics to potentially support biodiversity conservation.

  • Article activity feed