Inference of admixture in dogs from whole genome sequences
Curation statements for this article:-
Curated by GigaByte
Editors Assessment:
In this new methodological work researchers investigate the genetic structure and admixture patterns among dog breeds through a comprehensive analysis using whole genome sequencing data. A reference population was established comprising 349 individuals across 65 breeds, from which breed-informative single nucleotide polymorphisms (SNPs) were derived. Using the SCOPE algorithm previously employed in many global ancestry studies to estimate admixture proportions effectively, this demonstrated strong accuracy even at low sequencing depths (<1x). After peer review suggested changes to data processing the work was suitably solid to make some interesting findings using this approach. Results indicate that specific breeds, such as Catahoula Leopard Dogs and Greek Tracers, present unique challenges in admixture inference due to their genetic proximity to other breeds. With challenges in estimating Pit Bull Terrier ancestry/admixture, suggesting that there could be several genotypes associated with the Pit Bull Terrier breed . The methods provide a robust framework for future assessments of canine genetic diversity and health implications in canid populations. And processed reference population data is also available in the Github repository for reuse.
This evaluation refers to version 1 of the preprint
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (GigaByte)
Abstract
Background Understanding the genetic architecture of domestic dogs provides unique insights into the processes of domestication, breed formation, and the genetic basis of complex traits and diseases. Dog populations, characterized by their diverse morphologies and behaviors, also exhibit extensive evidence of historical and ongoing admixture. This widespread mixing, driven by both natural migration and selective breeding practices, has profoundly shaped the genomic landscape of modern dog breeds. Though global admixture has been extensively estimated in human population studies, where the number of subgroups is typically limited, there has been more limited analysis in canines, where there may be dozens of ancestral groups, or breeds. Results Here we present a procedure for estimating global admixture in dogs from whole genome sequence data using SCOPE. We created a reference population of 65 dog breeds that included 349 individuals, from which we determined breed-informative SNPs. We demonstrate that SCOPE can accurately infer breed composition in both simulated and real admixed samples, even at low sequencing depths. We also characterized the genetic similarity between our reference dog breeds and recovered previously reported relationships. Conclusion This approach allows us to identify the strength of the genetic signature of breeds and place error bounds on admixture estimates. It also provides evidence that admixture can be accurately inferred in subjects that may originate from multiple ancestral populations.
Article activity feed
-
Editors Assessment:
In this new methodological work researchers investigate the genetic structure and admixture patterns among dog breeds through a comprehensive analysis using whole genome sequencing data. A reference population was established comprising 349 individuals across 65 breeds, from which breed-informative single nucleotide polymorphisms (SNPs) were derived. Using the SCOPE algorithm previously employed in many global ancestry studies to estimate admixture proportions effectively, this demonstrated strong accuracy even at low sequencing depths (<1x). After peer review suggested changes to data processing the work was suitably solid to make some interesting findings using this approach. Results indicate that specific breeds, such as Catahoula Leopard Dogs and Greek Tracers, present unique challenges in admixture inference due …
Editors Assessment:
In this new methodological work researchers investigate the genetic structure and admixture patterns among dog breeds through a comprehensive analysis using whole genome sequencing data. A reference population was established comprising 349 individuals across 65 breeds, from which breed-informative single nucleotide polymorphisms (SNPs) were derived. Using the SCOPE algorithm previously employed in many global ancestry studies to estimate admixture proportions effectively, this demonstrated strong accuracy even at low sequencing depths (<1x). After peer review suggested changes to data processing the work was suitably solid to make some interesting findings using this approach. Results indicate that specific breeds, such as Catahoula Leopard Dogs and Greek Tracers, present unique challenges in admixture inference due to their genetic proximity to other breeds. With challenges in estimating Pit Bull Terrier ancestry/admixture, suggesting that there could be several genotypes associated with the Pit Bull Terrier breed . The methods provide a robust framework for future assessments of canine genetic diversity and health implications in canid populations. And processed reference population data is also available in the Github repository for reuse.
This evaluation refers to version 1 of the preprint
-
AbstractBackground Understanding the genetic architecture of domestic dogs provides unique insights into the processes of domestication, breed formation, and the genetic basis of complex traits and diseases. Dog populations, characterized by their diverse morphologies and behaviors, also exhibit extensive evidence of historical and ongoing admixture. This widespread mixing, driven by both natural migration and selective breeding practices, has profoundly shaped the genomic landscape of modern dog breeds. Though global admixture has been extensively estimated in human population studies, where the number of subgroups is typically limited, there has been more limited analysis in canines, where there may be dozens of ancestral groups, or breeds.Results Here we present a procedure for estimating global admixture in dogs from whole genome …
AbstractBackground Understanding the genetic architecture of domestic dogs provides unique insights into the processes of domestication, breed formation, and the genetic basis of complex traits and diseases. Dog populations, characterized by their diverse morphologies and behaviors, also exhibit extensive evidence of historical and ongoing admixture. This widespread mixing, driven by both natural migration and selective breeding practices, has profoundly shaped the genomic landscape of modern dog breeds. Though global admixture has been extensively estimated in human population studies, where the number of subgroups is typically limited, there has been more limited analysis in canines, where there may be dozens of ancestral groups, or breeds.Results Here we present a procedure for estimating global admixture in dogs from whole genome sequence data using SCOPE. We created a reference population of 65 dog breeds that included 349 individuals, from which we determined breed-informative SNPs. We demonstrate that SCOPE can accurately infer breed composition in both simulated and real admixed samples, even at low sequencing depths. We also characterized the genetic similarity between our reference dog breeds and recovered previously reported relationships.Conclusion This approach allows us to identify the strength of the genetic signature of breeds and place error bounds on admixture estimates. It also provides evidence that admixture can be accurately inferred in subjects that may originate from multiple ancestral populations.Competing Interest StatementMatteo Pellegrini is affiliated with ProsperK9, which developed a direct to consumer test for dog ancestry.
This paper is now published in GigaByte, with the paper and peer reviews shared under a CC-BY license:
https://doi.org/10.46471/gigabyte.173
Reviewer 1. Professor.Tracy Smith
Is the code executable? Unable to test.
Is there a clearly-stated list of dependencies, and is the core functionality of the software documented to a satisfactory level?
No. I would like to see full plink code too as well as R dependencies.
Reviewer 2. Tatiana Feuerborn
The premise of the paper could be an interesting test of the use of the SCOPE software on dogs. I can appreciate the idea of the manuscript and the profile journal selected for the submission, but even for the journal the intentions of the journal don't appear to align fully with the way the testing of the method was carried out. Additionally, the dataset of dog breeds is insufficient to be informative. This is particularly true of the number of mixed dogs tested and the down-sampling. Furthermore, any interpretation of the results lacks the observation of these limitations and the nuance of the geographical bias of the dataset.
Reviewer comments: “Bergstrom et al. 2012” wrong citation “Global ancestry, inferred by tools such as SCOPE and ADMIXTURE (Alexander et al. 2009), attempts to infer the proportions of an individual’s genome that belong to an ancestral breed or group.”
Citation for SCOPE missing If studies such as Parker et al 2017 have used 160 breeds and the authors have mentioned the numerous subpopulations of dogs, why did the authors choose to use such a small number of breeds for their study?
Why were the top 2500 SNPs used? Why not 1000 or 10000, etc? Testing the number that are needed would be very informative.
Figure 1, I would recommend sorting the breeds by value so that the results can be interpreted more easily. “We also note that certain groups of breeds tend to group together. For example Samoyed, Basenji, and Husky samples are found near each other on the UMAP. This group has been shown to represent ancient breeds (Larson et al. 2012, Pickrell and Pritchard 2012, Wojcik and Powierza 2021).” There are other explanations for this pattern, almost all of the other breeds examined are breeds of European origin, there is very little representation of non-European ancestry within the small sample size of dog breeds included in the study. “Despite this the more distant relationships between breeds differ from some of the previous studies, as these may be more difficult to define using our markers.” If this is the case and could be influencing results, it would be relevant to mention which breeds these are.
Why is SNP chip data rather than whole genome sequencing being used for the study? This should be clearly established, any explanation for this choice is completely absent. Is it because SCOPE can only handle a small number of sites? Is it because of the availability of the dataset? If so, note my previous concern with the small sample size, despite the public availability of much larger datasets. Is there another reason?
Figure 5, The size of the legend versus the figure itself is very unbalanced, and I would recommend making a clearer delineation of the breeds, it is unclear where one breed ends and the next begins. The size of the figure is also difficult to see the individuals with more than one bar colour. A continuous colour scheme is also probably the wrong choice for the plot, the already difficult delineation of the breeds is nearly impossible, given that the breeds are sorted alphabetically I know the colour choice is purely incidental thus making the continuous palate even more inappropriate.
Figure 7, most of my comments on Figure 5 also apply to Figure 7. The colour choices make it very difficult to see how many segments are present in each bar. Also an indicator of which simulated individuals were determined to be successful versus unsuccessful would be helpful. For example the rightmost four bars look fairly unsuccessful to me as they are all missing a component in the estimate that was present in the truth. Using three mixed dogs seems like a very small number of samples to test the accuracy of the tool on real datasets. Downsampling only one individual to test the impact of coverage is likely not representative of the impact of low coverage across all breed compositions. A larger number of individuals downsampled would be more informative for the accuracy of the results. In the discussion in page 11, many old citations are used to back up the interpretation of the close relationship of Siberian Huskies, Basenjis, and other non-European breeds. No mention is made of the geographical bias of the dataset as a reason behind this as I previously mentioned.
General comments: There are frequent issues with a lack of spaces ‘ ‘ between parentheses and neighbouring words and punctuation. A different colour palette should be used for the figures. It is very difficult to determine the breed due to the poor colour choice used throughout the manuscript. Inconsistent citation styles are used, eg. “Similarly, prior maximum likelihood estimation based techniques have suggested that Huskies and Samoyeds are both ancient breeds and related to Basenji (23, 29).”
-
-