A comprehensive water buffalo pangenome reveals extensive structural variation linked to population specific signatures of selection
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (GigaScience)
Abstract
Water buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization—complicated by the divergent karyotypes of its two sub-species (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.
We present a comprehensive pangenome that integrates four newly generated, highly contiguous assemblies of Pakistani river buffalo with available assemblies from both sub- species. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the sub-species to date. Using the pangenome to assay variation across 711 global samples, we uncovered extensive genomic diversity, including thousands of large structural variants absent from the reference genome, spanning over 140 Mb of additional sequence. We demonstrate the utility of these data by identifying putative functional indels and structural variants linked to selective sweeps in key genes involved in productivity and immune response across 26 populations.
This study represents one of the first successful applications of graph genomics in water buffalo and offers valuable insights into how integrating assemblies can transform analyses of water buffalo and other species with complex evolutionary histories. We anticipate that these assemblies, and the pangenome and putative functional structural variants we have released, will accelerate efforts to unlock water buffalo’s genetic potential, improving productivity and resilience in this economically important species.
Article activity feed
-
AbstractWater buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization—complicated by the divergent karyotypes of its two sub-species (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.We present a comprehensive pangenome that integrates four newly generated, highly contiguous assemblies of Pakistani river buffalo with available assemblies from both sub- species. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the sub-species to date. Using the pangenome …
AbstractWater buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization—complicated by the divergent karyotypes of its two sub-species (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.We present a comprehensive pangenome that integrates four newly generated, highly contiguous assemblies of Pakistani river buffalo with available assemblies from both sub- species. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the sub-species to date. Using the pangenome to assay variation across 711 global samples, we uncovered extensive genomic diversity, including thousands of large structural variants absent from the reference genome, spanning over 140 Mb of additional sequence. We demonstrate the utility of these data by identifying putative functional indels and structural variants linked to selective sweeps in key genes involved in productivity and immune response across 26 populations.This study represents one of the first successful applications of graph genomics in water buffalo and offers valuable insights into how integrating assemblies can transform analyses of water buffalo and other species with complex evolutionary histories. We anticipate that these assemblies, and the pangenome and putative functional structural variants we have released, will accelerate efforts to unlock water buffalo’s genetic potential, improving productivity and resilience in this economically important species.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf099), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 4: Wai Yee Low
Review of "A comprehensive water buffalo pangenome reveals extensive structural variation linked to population specific signatures of selection". This is an impressive work at the frontier of buffalo genomics. I truly enjoy reading the work and my questions/comments are aimed at improving it further. My detailed comments are below: Line 30: I think it is better you include the actual number of publicly available assemblies used to create the pangenome graph. Line 71: There is now a swamp buffalo reference genome with annotation too (NCBI accession: PCC_UOA_SB_1v2). Perhaps consider to cite the swamp buffalo ref https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giae053/7753516 and rewrite the sentence to say a pangenome can be used for both swamp and river, but a single linear ref from either subspecies for read mapping is not good enough. Line 79: "highlighted" Line 82: What do you mean by "higher quality"? The assemblies have been discussed in this review: https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2021.629861/full Line 105: Technically, the graph method for bovine species, which includes water buffalo, is being investigated by the Bovine Pangenome Consortium (BPC). However, nothing useful has been published on the buffalo graph but perhaps consider citing the BPC since your paper overlaps with it (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02975-0). Line 165: It will be good if you add a bit more context of the PanGenie method here as the researchers in buffalo community are not used to this. Additionally, it will be great if all code is made available on GitHub or as Supplementary Info. Line 170: To produce phase pangenome graph, don't you need all input assemblies to be phased? All are input assemblies phased? The UOA_WB_1 is locally phased, not phased throughout the genome. Line 235: "a list of 403 unrelated individuals." What does this translate to in terms that geneticists can understand? Do you mean siblings have been removed? Or individuals sharing the same grandparents were removed? Line 246: Can you please explain how did you get the coordinates to match between the GATK and PanGenie method? You'll need matching coordinates for concordance analysis. As I understand it, the GATK was based on UOA_WB_1? Line 254: Why these 3 chromosomes? Line 257: If you had not filtered for relatedness, how will it impact the selective sweep work? I think including some context will help the readers. Line 259: do you mean at least six samples per group? If yes, is 6 samples enough? Line 261: genotype quality less than 25 according to bcftools? Since you only used biallelic variants, please provide the breakdown between biallelic and multiallelic. Line 281: "… we first PacBio HiFi sequenced one female" Please rewrite this. Line 282: How common are these two breeds in percentage? Line 291: Is this already known? Perhaps cite the literature to show the agreement with previous studies? Fig 1D: This is a bit too small to see especially the SV distribution at the bottom. I can hardly see the median? Line 310: Why did you choose UOA_WB_1 as the reference? Line 311: the ~32.8 mil variants are comprised of SNPs as well? Fig 2: This is probably a panel of a figure but should not be the entire figure. The size of the circle indicates sample size but there should be a legend on the plot for this to say the sizes, right? Darker colour should be used to highlight the countries with samples instead of white? Maybe this could be a Supp figure too. Line 356: S Figure 4 and 5 should be main figures? You will need to annotate the abbreviation of sample-country in the legend of S Figure 5. Line 360: "To enable reuse we have made this dataset available …" The dataset should be made available to reviewers? Line 368: "76% of SNVs were called by both callers" 76% seem low. Also, called does not mean concordant. What is the concordance among called SNVs in both? Did the pangenome approach called most of the variants found in GATK? If not, what might be the reasons? Fig 3B: It is not immediately clear what the difference is, between non repetitive and repetitive regions. The overlapping text in the x-axes makes it hard to read. Line 390: "Analyses such as the study of selective sweeps or genome-wide association studies where low frequency variants are often filtered out will benefit less from the advantages of GATK, particularly given its longer run time." From here on, in this paragraph, it's Discussion, not Results. Line 418: Why human? Could you use cattle? Line 427: I tried the browser and not sure what I can learn from it. It will be helpful if there is a README with some examples on what can be explored. Line 450: How large before you considered it as larger variant? Is this ability to study larger variants still hold despite using only ~10 assemblies in the graph? The use of short reads for selective sweep study will still benefit from being able to incorporate these larger variants? As I understand it, the larger variants were found only from graph, not from the short reads. As such, the selective sweep may not be associated with any larger variants? Line 470: Fig S8 should be a main figure? Line 513: Instead of uniprot link, perhaps consider including this as Supplementary info or text. The info in the link may change in the future. Line 551: However, without scaffolding, the assemblies of Pakistani river buffalo may not be good enough to function as reference genomes for river buffalo? Line 552: When considering new bases, did you do this for each assembly independently or the new bases were discovered cumulatively? Line 581: Some of my questions at Line 450 can be discussed here. Line 586: Perhaps consider discussing the limitations of the small number of assemblies used to create the graph. As such, many SVs are likely still missing and we are still unable to properly assess allele frequency of these larger SVs. Additionally, while some SVs may not be considered as large in this work, it does not mean they have no impact.
-
AbstractWater buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization—complicated by the divergent karyotypes of its two sub-species (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.We present a comprehensive pangenome that integrates four newly generated, highly contiguous assemblies of Pakistani river buffalo with available assemblies from both sub- species. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the sub-species to date. Using the pangenome …
AbstractWater buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization—complicated by the divergent karyotypes of its two sub-species (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.We present a comprehensive pangenome that integrates four newly generated, highly contiguous assemblies of Pakistani river buffalo with available assemblies from both sub- species. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the sub-species to date. Using the pangenome to assay variation across 711 global samples, we uncovered extensive genomic diversity, including thousands of large structural variants absent from the reference genome, spanning over 140 Mb of additional sequence. We demonstrate the utility of these data by identifying putative functional indels and structural variants linked to selective sweeps in key genes involved in productivity and immune response across 26 populations.This study represents one of the first successful applications of graph genomics in water buffalo and offers valuable insights into how integrating assemblies can transform analyses of water buffalo and other species with complex evolutionary histories. We anticipate that these assemblies, and the pangenome and putative functional structural variants we have released, will accelerate efforts to unlock water buffalo’s genetic potential, improving productivity and resilience in this economically important species.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf099), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 3: Laura Caquelin
SummaryoftheStudy This study used graph genomics to better characterize water buffalo genomes. By building a pangenome from new and existing assemblies, the authors analyzed 711 samples. These samples revealed structural variation. These results highlight the value of graph genomics. This method
Scopeofreproducibility According to our assessment the primary objective is: to identify genomic variants within selective sweep regions in the water buffalo genome.
- Outcome: Enrichment of high-impact structural variants (SVs), insertions/deletions (indels) and single nucleotide variants (SNVs) in selective sweep regions.
- Analysis method outcome: Variants were compared between selective sweep regions and genome-wide. Fisher's exact test was used to assess enrichment of functional variants.
- Main result: "Prior to annotation, multiallelic variants were normalized by splitting them into separate biallelic entries, resulting in 6,159,686 indels, 28,669,966 SNVs, and 160,921 SVs entries. Within putative selective sweep regions we identified 208,862 indels, 997,500 SNVs and 6,748 SVs. Notably an enrichment of HIGH impact SVs, indels and SNVs were observed within selective sweep regions (Figure 5A, Supplementary Table S6), with 50-80% more variants in these areas having a HIGH impact compared to genome-wide. Among the high impact variants in selective sweep regions only 20% were SNVs, with the remainder being SVs and indels, suggesting high impact larger variants may underlie putative selective sweeps." (Lines 453 to 461)
- AvailabilityofMaterials a. Data
- Data availability: Open
- Data completeness: Complete, all data necessary to reproduce main results are available
- Access Method: Supplementary files - Repository: -
- Data quality: Structured b. Code
- Code availability: Shared for the review after request - Programming Language(s): R
- Repository link: -
- License: -
- Repository status: -
- Documentation: No documentation
- Computational environment of reproduction analysis
- Operating system for reproduction: MacOS 14.7.4
- Programming Language(s): R
- Code implementation approach: Creating script according to the methodology description/Using shared code
- Version environment for reproduction: R version 4.4.1/RStudio 2024.09.0
- Results 5.1 Original study results
Results 1: Results are presented in Figure 5A. 5.2 Steps for reproduction -> Reproduce the results The code was not shared initially, but as the data were provided and the test was a Fisher's exact test, I wrote code to reproduce the p-values.
Issue 1: P-values for the SNVs variant as well as the « Modifier » impact class were not provided. -- Resolved: Authors provided an updated Supplementary table S6 with exact numerical p-values for each variant and each impact class. The code "variantEnrichAtPeaks.R" to generate the Figure 5A and the Supplementary table S6 was also shared. New version of the supplementary Table S6: (see screenshot)
The comparison between the reproduced results and the original results was then performed using the shared code. (Notably, the results from the R script written allowed for the generation of the same p-value as the one presented in Figure 5A).
- Issue 2: In the script "variantEnrichAtPeaks.R", only the figures were generated, not the new supplementary Table S6 with the numerical p-values. -- Resolved: Some code lines was added in the function "makePlot" to generate this table in addition to the figure.
Line 159 to 178 of the script "variantEnrichAtPeaks_RCC."
- Supplementary table S6 (add)
summary_table <- df %>% mutate( Type = variantType, Genome_Wide_Prop = Genome_wide / sum(Genome_wide), Selective_Sweep_peaks_Prop = Sweep / sum(Sweep), Ratio_of_proportions = Selective_Sweep_peaks_Prop / Genome_Wide_Prop) %>% left_join(pval_df, by = "Impact") %>% select( Impact, Type, Genome_Wide = Genome_wide,
Selective_Sweep peaks= Sweep,Genome_Wide Prop= Genome_Wide_Prop,Selective_Sweep peaks Prop= Selective_Sweep_peaks_Prop,Ratio of proportions= Ratio_of_proportions,Fishers exact P= p_value)return(list(plot = p, summary_table = summary_table))
5.3 Statistical comparison Original vs Reproduced results
Results: Figure and table S6 were reproduced for each variant type and impact: -- SVs type: (see screenshot) -- Indels type: (see screenshot) -- And SNVs type: (see screenshot)
Comments: The shared code was used to compute the p-values and generated the Figures. Minor numerical error discrepancy was observed for some p-values, likely due to rounding differences. The p-values in the original Excel file appear to be stored with less decimal precision than those computed in R. This difference is negligible and does not indicate a reproducibility issue.
Errors detected: No error detected.
Statistical Consistency: The results were successfully reproduced with the share code.
- Conclusion
- Summary of the computational reproducibility review The Fisher's exact tests for enrichment across variant and impact categories, presented in Figure 5A of the manuscript, were successfully reproduced using the data in supplementary table S6 and the shared code. Results were consistent with the original, with only negligible rounding differences in p-values.
- Recommendations for authors We were able to reproduce study with the data and information provided in the Figure 5A description. To further improve transparency and ensure full reproducibility of your manuscript, the following recommendations are suggested: -- Make the codes to reproduce all analyses in the paper openly available to allow anyone to reproduce the results. Ideally, provide a README or requirements.txt file describing how to run the analysis, including software versions, packages, and dependencies. -- Include statistical outputs, such as exact p-values, in supplementary materials when possible. This ensures clarity and eases verification. Ideally, provide metadata: For the datasets used or generated by the scripts, it would be helpful to include accompanying metadata files that explain: --- The definition of each variable name. --- The origin of each dataset (raw, processed, etc). --- Any preprocessing steps applied before analysis.
-
AbstractWater buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization—complicated by the divergent karyotypes of its two sub-species (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.We present a comprehensive pangenome that integrates four newly generated, highly contiguous assemblies of Pakistani river buffalo with available assemblies from both sub- species. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the sub-species to date. Using the pangenome …
AbstractWater buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization—complicated by the divergent karyotypes of its two sub-species (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.We present a comprehensive pangenome that integrates four newly generated, highly contiguous assemblies of Pakistani river buffalo with available assemblies from both sub- species. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the sub-species to date. Using the pangenome to assay variation across 711 global samples, we uncovered extensive genomic diversity, including thousands of large structural variants absent from the reference genome, spanning over 140 Mb of additional sequence. We demonstrate the utility of these data by identifying putative functional indels and structural variants linked to selective sweeps in key genes involved in productivity and immune response across 26 populations.This study represents one of the first successful applications of graph genomics in water buffalo and offers valuable insights into how integrating assemblies can transform analyses of water buffalo and other species with complex evolutionary histories. We anticipate that these assemblies, and the pangenome and putative functional structural variants we have released, will accelerate efforts to unlock water buffalo’s genetic potential, improving productivity and resilience in this economically important species.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf099), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 2: Yi Zhang
This manuscript presents the first high-quality, haplotype-resolved genome assemblies for two representative Pakistani river buffalo breeds (Nili Ravi and Azikheli), integrating them with existing assemblies to construct a water buffalo pangenome. The study leverages graph genomics to characterize structural variation (SV), identifying >140 Mb of non-reference sequence and 111,352 SVs. By genotyping of 711 global samples against this pangenome, the authors uncover population-specific selective sweeps linked to productivity, immunity, and adaptation traits, revealing potentially functional SVs, though these findings are limited by the absence of validation evidence and cross-study comparisons. The work highlights graph genomics as a transformative tool for integrative analyses of evolutionarily related species in an unbiased way and provides resources to accelerate buffalo breeding.
General Comments 1.The study's methodology is rigorous, combining long-read assembly, graph-based genotyping (PanGenie), and population-level sweep scans. Nevertheless, the manuscript would benefit from discussion of graph limitations, such as bias against rare variants (Fig. 3B) and challenges in graph construction for species with karyotypic divergence.
- The selection signature analyses were done across a number of population groups but the paper only showcases a limited selection of results. To strengthen the manuscript, the authors could concentrate on a consistent set of populations. This would enable a more in-depth examination of selective signals common across buffalo population groups or unique selective signals specific to certain groups.
- It could be informative to conduct comparative analyses of selection signatures using variant datasets from PanGenie and GATK. This could reveal whether the pangenome approach might uncover important structural variants within selection signals that GATK fails to identify.
Specific Comments
- In Figure 1D and the main text, the rationale behind dividing the SVs into 40 sets is not clearly presented. If the interpretation is correct, the y-axis label of the bar graph should denote the number of SVs rather than size. Moreover, the main title "SVs Size Distribution" at the top seems more relevant to the box plots at the bottom.
- Lines 325 - 326 state that the newly assembled pangenome graph exhibits a substantial increase in genome size compared to the existing reference genome. It is recommended that the authors describe the distribution of the 147,865,364 bp across the entire genome. Are they found more prevalent in specific regions of certain chromosomes?
- In lines 410 - 412, there may be an issue with the citation of Table S2. The table contains 402 individuals, whereas the text mentions 282.
- Figure 3 shows that, when using 30x samples in the variant calling comparison between Pangenie and GATK, there are still a large number of SNV variants detectable only by GATK. A more in-depth technical discussion of these differences would greatly enhance the reader's comprehension of these findings and the relative performance of the two methods.
- To provide a more intuitive understanding of how SV can influence gene function and contribute to the traits, the authors could include a figure that displays an example gene structure along with the SV of interest within a selection signal peak.
-
AbstractWater buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization—complicated by the divergent karyotypes of its two sub-species (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.We present a comprehensive pangenome that integrates four newly generated, highly contiguous assemblies of Pakistani river buffalo with available assemblies from both sub- species. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the sub-species to date. Using the pangenome …
AbstractWater buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization—complicated by the divergent karyotypes of its two sub-species (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.We present a comprehensive pangenome that integrates four newly generated, highly contiguous assemblies of Pakistani river buffalo with available assemblies from both sub- species. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the sub-species to date. Using the pangenome to assay variation across 711 global samples, we uncovered extensive genomic diversity, including thousands of large structural variants absent from the reference genome, spanning over 140 Mb of additional sequence. We demonstrate the utility of these data by identifying putative functional indels and structural variants linked to selective sweeps in key genes involved in productivity and immune response across 26 populations.This study represents one of the first successful applications of graph genomics in water buffalo and offers valuable insights into how integrating assemblies can transform analyses of water buffalo and other species with complex evolutionary histories. We anticipate that these assemblies, and the pangenome and putative functional structural variants we have released, will accelerate efforts to unlock water buffalo’s genetic potential, improving productivity and resilience in this economically important species.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf099), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 1:Paul Stothard
This well-written manuscript describes the generation of new genome assemblies for water buffalo and the construction of a pangenome graph that is used for variant calling and downstream analyses. The work is clearly described and the methods are appropriate given the goals of the study. The results are interesting and timely, and realistic limitations are stated. The manuscript should be of high interest to the water buffalo research community and to those interested in applying pangenome graphs to variant calling.
I have minor comments that I believe should be addressed prior to publication.
Minor comments:
In the NCBI genomes database, the water buffalo assembly NDDB_SH_1 is listed as the current reference genome, not UOA_WB_1 as suggested in the manuscript. Perhaps the reference genome was recently reassigned?
Lines 64-69: Lack of clarity regarding relationships among water buffalo populations:
- Wording suggests single domestication event accounts for all domestic water buffalo. But, the river and swamp buffalo diverged prior to the domestication date. This is a contradiction. Clarify by mentioning that there were at least two independent domestication events (one for river buffalo and one for swamp buffalo).
- Taxonomic terminology is inherently ambiguous for a few reasons, including:
- The Bubalus arnee species comprises both wild river buffalo and wild swamp buffalo, which have not been assigned subspecies names.
- Domestic water buffalo (including river and swamp buffalo) are assigned their own species name: Bubalus bubalis, despite being biologically the same species as Bubalus arnee.
- Unlike their wild source populations, domesticated river buffalo and domesticated swamp buffalo are assigned their own species names, Bubalus bubalis bubalis and Bubalus bubalis carabanensis, respectively.
- To address ambiguity regarding taxonomy and phylogeny of the buffalo populations, mention the full subspecies names (Bubalus bubalis bubalis, and Bubalus bubalis carabanensis).
Line 82: "Although eight higher quality": higher quality than what?
Line 177: Undefined acronym: "PAF".
Line 216: "each unique biosamples": should be "each unique biosample".
Line 272: Which SnpEff database was used for variant annotation?
Line 286-287: Based on Table 1, the difference between the largest and the smallest water buffalo genome is 360 mega base pairs. That exceeds the length of the largest chromosome by almost 2 fold, and is 14% of the total length of the UOA_WB_1 reference assembly. This is a very large difference to observe between members of the same species. Considering that segmental duplications are often not accurately represented in genome assemblies, there is a strong possibility that some of the variants identified between these new high-quality assemblies and the other assemblies are simply assembly artefacts (failure of recently duplicated segments to be distinguished, etc.). At the very least, this should be addressed in the Discussion.
Line 360-361: Elaborate slightly on what is in the dataset being shared.
Line 420-421: Clarify which of these are human vs animal traits.
Figure 1 A legend: The dots seem to all be the same size, which suggests that this is a scatter plot, not a bubble plot.
Figure 1 C: "across the graph genome" sounds spatial; perhaps "proportion of variant types in the graph genome" would be clearer.
Figure 1 D: It would be helpful to have the rows sorted to match the order in B.
Figure 1 D: The low bars (i.e. small number of shared sites) are not easy to interpret. Perhaps the y-axis could be transformed to log scale or the number of variants could be added to the bars.
-
