Inference of admixture in dogs from whole genome sequences

Gregory Kislik
Garrett Moore
Liudmilla Rubbi
Veninka Nikki Supara
Grace Chen
Matteo Pellegrini

Curated by GigaByte

Editors Assessment:

In this new methodological work researchers investigate the genetic structure and admixture patterns among dog breeds through a comprehensive analysis using whole genome sequencing data. A reference population was established comprising 349 individuals across 65 breeds, from which breed-informative single nucleotide polymorphisms (SNPs) were derived. Using the SCOPE algorithm previously employed in many global ancestry studies to estimate admixture proportions effectively, this demonstrated strong accuracy even at low sequencing depths (<1x). After peer review suggested changes to data processing the work was suitably solid to make some interesting findings using this approach. Results indicate that specific breeds, such as Catahoula Leopard Dogs and Greek Tracers, present unique challenges in admixture inference due to their genetic proximity to other breeds. With challenges in estimating Pit Bull Terrier ancestry/admixture, suggesting that there could be several genotypes associated with the Pit Bull Terrier breed . The methods provide a robust framework for future assessments of canine genetic diversity and health implications in canid populations. And processed reference population data is also available in the Github repository for reuse.

This evaluation refers to version 1 of the preprint

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (GigaByte)

Abstract

Background Understanding the genetic architecture of domestic dogs provides unique insights into the processes of domestication, breed formation, and the genetic basis of complex traits and diseases. Dog populations, characterized by their diverse morphologies and behaviors, also exhibit extensive evidence of historical and ongoing admixture. This widespread mixing, driven by both natural migration and selective breeding practices, has profoundly shaped the genomic landscape of modern dog breeds. Though global admixture has been extensively estimated in human population studies, where the number of subgroups is typically limited, there has been more limited analysis in canines, where there may be dozens of ancestral groups, or breeds. Results Here we present a procedure for estimating global admixture in dogs from whole genome sequence data using SCOPE. We created a reference population of 65 dog breeds that included 349 individuals, from which we determined breed-informative SNPs. We demonstrate that SCOPE can accurately infer breed composition in both simulated and real admixed samples, even at low sequencing depths. We also characterized the genetic similarity between our reference dog breeds and recovered previously reported relationships. Conclusion This approach allows us to identify the strength of the genetic signature of breeds and place error bounds on admixture estimates. It also provides evidence that admixture can be accurately inferred in subjects that may originate from multiple ancestral populations.

GigaByte
Mar 25, 2026

Editors Assessment:

In this new methodological work researchers investigate the genetic structure and admixture patterns among dog breeds through a comprehensive analysis using whole genome sequencing data. A reference population was established comprising 349 individuals across 65 breeds, from which breed-informative single nucleotide polymorphisms (SNPs) were derived. Using the SCOPE algorithm previously employed in many global ancestry studies to estimate admixture proportions effectively, this demonstrated strong accuracy even at low sequencing depths (<1x). After peer review suggested changes to data processing the work was suitably solid to make some interesting findings using this approach. Results indicate that specific breeds, such as Catahoula Leopard Dogs and Greek Tracers, present unique challenges in admixture inference due …

Editors Assessment:

In this new methodological work researchers investigate the genetic structure and admixture patterns among dog breeds through a comprehensive analysis using whole genome sequencing data. A reference population was established comprising 349 individuals across 65 breeds, from which breed-informative single nucleotide polymorphisms (SNPs) were derived. Using the SCOPE algorithm previously employed in many global ancestry studies to estimate admixture proportions effectively, this demonstrated strong accuracy even at low sequencing depths (<1x). After peer review suggested changes to data processing the work was suitably solid to make some interesting findings using this approach. Results indicate that specific breeds, such as Catahoula Leopard Dogs and Greek Tracers, present unique challenges in admixture inference due to their genetic proximity to other breeds. With challenges in estimating Pit Bull Terrier ancestry/admixture, suggesting that there could be several genotypes associated with the Pit Bull Terrier breed . The methods provide a robust framework for future assessments of canine genetic diversity and health implications in canid populations. And processed reference population data is also available in the Github repository for reuse.

This evaluation refers to version 1 of the preprint

Read the original source
GigaByte
Mar 25, 2026

AbstractBackground Understanding the genetic architecture of domestic dogs provides unique insights into the processes of domestication, breed formation, and the genetic basis of complex traits and diseases. Dog populations, characterized by their diverse morphologies and behaviors, also exhibit extensive evidence of historical and ongoing admixture. This widespread mixing, driven by both natural migration and selective breeding practices, has profoundly shaped the genomic landscape of modern dog breeds. Though global admixture has been extensively estimated in human population studies, where the number of subgroups is typically limited, there has been more limited analysis in canines, where there may be dozens of ancestral groups, or breeds.Results Here we present a procedure for estimating global admixture in dogs from whole genome …

AbstractBackground Understanding the genetic architecture of domestic dogs provides unique insights into the processes of domestication, breed formation, and the genetic basis of complex traits and diseases. Dog populations, characterized by their diverse morphologies and behaviors, also exhibit extensive evidence of historical and ongoing admixture. This widespread mixing, driven by both natural migration and selective breeding practices, has profoundly shaped the genomic landscape of modern dog breeds. Though global admixture has been extensively estimated in human population studies, where the number of subgroups is typically limited, there has been more limited analysis in canines, where there may be dozens of ancestral groups, or breeds.Results Here we present a procedure for estimating global admixture in dogs from whole genome sequence data using SCOPE. We created a reference population of 65 dog breeds that included 349 individuals, from which we determined breed-informative SNPs. We demonstrate that SCOPE can accurately infer breed composition in both simulated and real admixed samples, even at low sequencing depths. We also characterized the genetic similarity between our reference dog breeds and recovered previously reported relationships.Conclusion This approach allows us to identify the strength of the genetic signature of breeds and place error bounds on admixture estimates. It also provides evidence that admixture can be accurately inferred in subjects that may originate from multiple ancestral populations.Competing Interest StatementMatteo Pellegrini is affiliated with ProsperK9, which developed a direct to consumer test for dog ancestry.

This paper is now published in GigaByte, with the paper and peer reviews shared under a CC-BY license:

https://doi.org/10.46471/gigabyte.173

Reviewer 1. Professor.Tracy Smith

Is the code executable? Unable to test.

Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?

No. I would like to see full plink code too as well as R dependencies.

See review here: https://gigabyte-review.rivervalleytechnologies.com/api/download-documents?payload=dLQpgSxI41Ksf6QFmEn6UrYcoBRhxttwE1cPXu8tMOJByVSthbzG5HM9e=CR3Vljzdt8InzTYtFxfQeD7116f6ET5X03k=la7k1ex9dyGXUGWQUtgC1E7llsk2kR3mX6mYDp

Reviewer 2. Tatiana Feuerborn

The premise of the paper could be an interesting test of the use of the SCOPE software on dogs. I can appreciate the idea of the manuscript and the profile journal selected for the submission, but even for the journal the intentions of the journal don't appear to align fully with the way the testing of the method was carried out. Additionally, the dataset of dog breeds is insufficient to be informative. This is particularly true of the number of mixed dogs tested and the down-sampling. Furthermore, any interpretation of the results lacks the observation of these limitations and the nuance of the geographical bias of the dataset.

Reviewer comments: “Bergstrom et al. 2012” wrong citation “Global ancestry, inferred by tools such as SCOPE and ADMIXTURE (Alexander et al. 2009), attempts to infer the proportions of an individual’s genome that belong to an ancestral breed or group.”

Citation for SCOPE missing If studies such as Parker et al 2017 have used 160 breeds and the authors have mentioned the numerous subpopulations of dogs, why did the authors choose to use such a small number of breeds for their study?

Why were the top 2500 SNPs used? Why not 1000 or 10000, etc? Testing the number that are needed would be very informative.

Figure 1, I would recommend sorting the breeds by value so that the results can be interpreted more easily. “We also note that certain groups of breeds tend to group together. For example Samoyed, Basenji, and Husky samples are found near each other on the UMAP. This group has been shown to represent ancient breeds (Larson et al. 2012, Pickrell and Pritchard 2012, Wojcik and Powierza 2021).” There are other explanations for this pattern, almost all of the other breeds examined are breeds of European origin, there is very little representation of non-European ancestry within the small sample size of dog breeds included in the study. “Despite this the more distant relationships between breeds differ from some of the previous studies, as these may be more difficult to define using our markers.” If this is the case and could be influencing results, it would be relevant to mention which breeds these are.

Why is SNP chip data rather than whole genome sequencing being used for the study? This should be clearly established, any explanation for this choice is completely absent. Is it because SCOPE can only handle a small number of sites? Is it because of the availability of the dataset? If so, note my previous concern with the small sample size, despite the public availability of much larger datasets. Is there another reason?

Figure 5, The size of the legend versus the figure itself is very unbalanced, and I would recommend making a clearer delineation of the breeds, it is unclear where one breed ends and the next begins. The size of the figure is also difficult to see the individuals with more than one bar colour. A continuous colour scheme is also probably the wrong choice for the plot, the already difficult delineation of the breeds is nearly impossible, given that the breeds are sorted alphabetically I know the colour choice is purely incidental thus making the continuous palate even more inappropriate.

Figure 7, most of my comments on Figure 5 also apply to Figure 7. The colour choices make it very difficult to see how many segments are present in each bar. Also an indicator of which simulated individuals were determined to be successful versus unsuccessful would be helpful. For example the rightmost four bars look fairly unsuccessful to me as they are all missing a component in the estimate that was present in the truth. Using three mixed dogs seems like a very small number of samples to test the accuracy of the tool on real datasets. Downsampling only one individual to test the impact of coverage is likely not representative of the impact of low coverage across all breed compositions. A larger number of individuals downsampled would be more informative for the accuracy of the results. In the discussion in page 11, many old citations are used to back up the interpretation of the close relationship of Siberian Huskies, Basenjis, and other non-European breeds. No mention is made of the geographical bias of the dataset as a reason behind this as I previously mentioned.

General comments: There are frequent issues with a lack of spaces ‘ ‘ between parentheses and neighbouring words and punctuation. A different colour palette should be used for the figures. It is very difficult to determine the breed due to the poor colour choice used throughout the manuscript. Inconsistent citation styles are used, eg. “Similarly, prior maximum likelihood estimation based techniques have suggested that Huskies and Samoyeds are both ancient breeds and related to Basenji (23, 29).”

Read the original source
Version published to 10.46471/gigabyte.173
Mar 17, 2026
Version published to 10.64898/2026.02.09.704954 on bioRxiv
Feb 10, 2026

Whole-Genome Sequencing Reveals Breed-Specific SNPs, Indels, and Signatures of Selection in Royal White and White Dorper Sheep

This article has 5 authors:
1. Mingsi Liao
2. Amanda Kravitz
3. David C. Haak
4. Nammalwar Sriranganathan
5. Rebecca R. Cockrum
This article has no evaluationsLatest version Mar 5, 2026
Mitochondrial and Retroviral Markers Reveal High Genetic Diversity and Regional Structure in the Lebanese Awassi Sheep

This article has 7 authors:
1. Jeanne El Hage
2. Frédéric Boyer
3. Barbara Viginier
4. Christophe Terzian
5. François Pompanon
6. Frédérick Arnaud
7. Alain Abi-Rizk
This article has no evaluationsLatest version Mar 4, 2026
Analysis of genetic diversity and inbreeding levels in the Korean native black goat population using whole-genome sequencing

This article has 5 authors:
1. Ho-Chan Kang
2. Cheol-Hyun Myung
3. Ji-Yeong Kim
4. Seung-Chang Kim
5. Hyun-Tae Lim
This article has no evaluationsLatest version Mar 2, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Whole-Genome Sequencing Reveals Breed-Specific SNPs, Indels, and Signatures of Selection in Royal White and White Dorper Sheep

Mitochondrial and Retroviral Markers Reveal High Genetic Diversity and Regional Structure in the Lebanese Awassi Sheep

Analysis of genetic diversity and inbreeding levels in the Korean native black goat population using whole-genome sequencing