Haplotype-resolved reference genomes of the sea turtle clade unveil ultra-syntenic genomes with hotspots of divergence
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (GigaScience)
Abstract
Background
Reference genomes for the entire sea turtle clade have the potential to reveal the genetic basis of traits driving the ecological and phenotypic diversity in these ancient and iconic marine species. Furthermore, these genomic resources can support conservation efforts and deepen our understanding of their unique evolution.
Results
We present haplotype-resolved, chromosome-level reference genomes and high-quality gene annotations for five sea turtle species. This completes the catalog of reference genomes of the entire sea turtle clade when combined with our previously published reference genomes. Our analysis reveals remarkable genome synteny and collinearity across all species, despite the clade’s origin dating back more than 60 million years. Regions of high interspecific genetic distance and intraspecific genetic diversity are consistently clustered in genomic hotspots, which are enriched with genes coding for immune response proteins, olfactory receptors, zinc fingers, and G-protein-coupled receptors. These hotspot regions may offer insights into the genetic mechanisms driving phenotypic divergence among species, and represent areas of significant adaptive potential. Ancient demographic analysis revealed a synchronous population expansion among sea turtle species during the Pleistocene, with varying magnitudes of demographic change, likely shaped by their diverse ecological adaptations, and biogeographic contexts.
Conclusions
Our work provides genomic resources for exploring genetic diversity, evolutionary adaptations, and demographic histories of sea turtles. We outline genomic regions with increased diversity, linked to immune response, sensory evolution, and adaptation to varying environments that have historically been subject to strong diversifying selection, and likely will underpin sea turtle’s responses to future environmental change. These reference genomes can assist conservation by providing insights into the demographic and evolutionary processes that sustain and threaten these iconic species.
Article activity feed
-
AbstractBackground Reference genomes for the entire sea turtle clade have the potential to reveal the genetic basis of traits driving the ecological and phenotypic diversity in these ancient and iconic marine species. Furthermore, these genomic resources can support conservation efforts and deepen our understanding of their unique evolution.Results We present haplotype-resolved, chromosome-level reference genomes and high-quality gene annotations for five sea turtle species. This completes the catalog of reference genomes of the entire sea turtle clade when combined with our previously published reference genomes. Our analysis reveals remarkable genome synteny and collinearity across all species, despite the clade’s origin dating back more than 60 million years. Regions of high interspecific genetic distance and intraspecific genetic …
AbstractBackground Reference genomes for the entire sea turtle clade have the potential to reveal the genetic basis of traits driving the ecological and phenotypic diversity in these ancient and iconic marine species. Furthermore, these genomic resources can support conservation efforts and deepen our understanding of their unique evolution.Results We present haplotype-resolved, chromosome-level reference genomes and high-quality gene annotations for five sea turtle species. This completes the catalog of reference genomes of the entire sea turtle clade when combined with our previously published reference genomes. Our analysis reveals remarkable genome synteny and collinearity across all species, despite the clade’s origin dating back more than 60 million years. Regions of high interspecific genetic distance and intraspecific genetic diversity are consistently clustered in genomic hotspots, which are enriched with genes coding for immune response proteins, olfactory receptors, zinc fingers, and G-protein-coupled receptors. These hotspot regions may offer insights into the genetic mechanisms driving phenotypic divergence among species, and represent areas of significant adaptive potential. Ancient demographic analysis revealed a synchronous population expansion among sea turtle species during the Pleistocene, with varying magnitudes of demographic change, likely shaped by their diverse ecological adaptations, and biogeographic contexts.Conclusions Our work provides genomic resources for exploring genetic diversity, evolutionary adaptations, and demographic histories of sea turtles. We outline genomic regions with increased diversity, linked to immune response, sensory evolution, and adaptation to varying environments that have historically been subject to strong diversifying selection, and likely will underpin sea turtle’s responses to future environmental change. These reference genomes can assist conservation by providing insights into the demographic and evolutionary processes that sustain and threaten these iconic species.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf105), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 3: Xiaoli Liu
(1)It is recommended to add keywords such as "conservation genomics" or "adaptive evolution" to better align with the content. (2)In the background section, after discussing the current status of sea turtles and existing genomic research, the study's content is introduced directly without adequately explaining why it is necessary to sequence the genomes of the remaining five species of sea turtles on top of the existing partial genomic data. The introduction of the research objectives appears somewhat abrupt. (3)Last line of page four"Previous analyses in particular of the……within this ancient clade [34,38]":When introducing the broad context of genomics and biodiversity conservation, it is important to provide detailed explanations for key concepts such as 'genomic synteny' and 'colinearity'. Although these concepts are covered later in the analysis of the turtle genome, providing initial elaboration can help readers better understand subsequent content. (4)Page 6 Section 2.2:The range of this quality value, 38.7, is incorrect. Please verify carefully. (5)Result 3.1:High conservation at the chromosomal level is supported, but repetitive sequences must be excluded from synteny analysis. (6)Section 3.4, Second Paragraph:The reliability of PSMC in low-diversity species, such as N. depressus, may be limited; it is recommended to validate findings with other methods, such as MSMC2. (7)It is recommended to include a detailed description of sample selection in the methods section, covering aspects such as geographic distribution, population size, and sample collection methods, to demonstrate the representativeness and reliability of the selected samples.
-
AbstractBackground Reference genomes for the entire sea turtle clade have the potential to reveal the genetic basis of traits driving the ecological and phenotypic diversity in these ancient and iconic marine species. Furthermore, these genomic resources can support conservation efforts and deepen our understanding of their unique evolution.Results We present haplotype-resolved, chromosome-level reference genomes and high-quality gene annotations for five sea turtle species. This completes the catalog of reference genomes of the entire sea turtle clade when combined with our previously published reference genomes. Our analysis reveals remarkable genome synteny and collinearity across all species, despite the clade’s origin dating back more than 60 million years. Regions of high interspecific genetic distance and intraspecific genetic …
AbstractBackground Reference genomes for the entire sea turtle clade have the potential to reveal the genetic basis of traits driving the ecological and phenotypic diversity in these ancient and iconic marine species. Furthermore, these genomic resources can support conservation efforts and deepen our understanding of their unique evolution.Results We present haplotype-resolved, chromosome-level reference genomes and high-quality gene annotations for five sea turtle species. This completes the catalog of reference genomes of the entire sea turtle clade when combined with our previously published reference genomes. Our analysis reveals remarkable genome synteny and collinearity across all species, despite the clade’s origin dating back more than 60 million years. Regions of high interspecific genetic distance and intraspecific genetic diversity are consistently clustered in genomic hotspots, which are enriched with genes coding for immune response proteins, olfactory receptors, zinc fingers, and G-protein-coupled receptors. These hotspot regions may offer insights into the genetic mechanisms driving phenotypic divergence among species, and represent areas of significant adaptive potential. Ancient demographic analysis revealed a synchronous population expansion among sea turtle species during the Pleistocene, with varying magnitudes of demographic change, likely shaped by their diverse ecological adaptations, and biogeographic contexts.Conclusions Our work provides genomic resources for exploring genetic diversity, evolutionary adaptations, and demographic histories of sea turtles. We outline genomic regions with increased diversity, linked to immune response, sensory evolution, and adaptation to varying environments that have historically been subject to strong diversifying selection, and likely will underpin sea turtle’s responses to future environmental change. These reference genomes can assist conservation by providing insights into the demographic and evolutionary processes that sustain and threaten these iconic species.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf105), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 2: Brendan Reid
The authors of this work provide a fantastic addition to the genomic resources currently available for marine turtles with five new, apparently high-quality reference genomes. These new resources enable a number of interesting cross-species analyses in this group, including phylogenetic reconstruction, inference of demographic history, and identification of hotspots of diversity and divergence. I though this paper was quite clearly written and easy to read overall, and I have one major and a few more minor comments/suggestions.
Major comment: there is an extensive literature on hybridization among marine turtle lineages (see Vilaca et al. 2021, https://doi.org/10.1111/mec.16113, for a recent genomic example), with lots of evidence for ancient gene flow after initial lineage divergence as well as recent hybridization. The authors do not really mention this phenomenon at all, and since I think it has a lot of bearing on all of the results it would make sense to re-think your findings in light of the fact that some level of gene flow has occurred. Would extensive synteny/lack of genomic rearrangements potentially enable hybridization? Is overall low divergence among lineages potentially a function of gene flow? Are regions of high divergence the result of selection (as you suggest), or could these regions potentially be resistant to gene flow? I believe that IQtree assumes a strictly bifurcating tree, and gene flow can influence PSMC inferences (see Mazet et al. 2016, https://doi.org/10.1038/hdy.2015.104) - how would gene flow among lineages affect your inference of divergence dates and demographic histories?
MInor commentsL [note - line numbers would have been helpful for providing comments on specific items! I will refer to the lower-left page numbers and paragraph instead]:
page 3, paragraph 2: Some of the applications you refer to here don't seem terribly germane to the relevance of "genomic resources" in management and conservation per se, and several are just methods using some kind of genetic data ... e.g., "abundance"/close-kin mark recapture doesn't require full genomes (and the reference you cite used microsat data), and the "community"/eDNA applications don't generally rely on genomes but instead on databases of a few (usually mitochondrial) genes. Either include methods that truly benefit from the development of high-quality reference genomes or broaden this to something like "growth in molecular ecology techniques".
page 4, paragraph 2: last sentence is a bit of a run-on, could break this up a bit.
page 10, paragraph 3: for me, the ROH methods need some additional explanation and interpretation. The more detailed methods indicate that the ROH were identified on the basis of lower-than-average heterozygosity rather than true homozygosity - I can understand why this might have been done (since the baseline level of heterozygosity varies across species) but it still seems a bit arbitrary and could risk mistaking stretches with simply low variation for IBD tracts. I wonder if a ROH-detection method like ROHan that explicitly incorporates baseline genomic heterozygosity into its model would be more appropriate for comparing results across species and could give different results. I also question a bit the interpretation of these low-diversity tracts as evidence of inbreeding per se. The authors do not comment much on the length distributions of these ROH - given that many of them are quite short I would expect that if there was mating between close kin it probably happened far back in the past and the IBD tracts have been broken up by recombination.
page 11, paragraph 2: for PSMC analyses it is important to note the method assumes that differences in coalescence time/Ne across the genome result from demography alone. If portions of the genome are under balancing/diversifying selection (such as the areas of high diversity that you detect in this study), the local Ne for inferred these regions would be expected to be larger than the rest of the genome, which could lead to the spurious detection of population expansion or contraction (more likely a contraction for balancing selection). See Boitard et al. 2022 (https://doi.org/10.1093/genetics/iyac008) for a more detailed treatement. I would try excluding the regions putatively under diversifying selection and re-run PSMC to see if your inferences change.
-
AbstractBackground Reference genomes for the entire sea turtle clade have the potential to reveal the genetic basis of traits driving the ecological and phenotypic diversity in these ancient and iconic marine species. Furthermore, these genomic resources can support conservation efforts and deepen our understanding of their unique evolution.Results We present haplotype-resolved, chromosome-level reference genomes and high-quality gene annotations for five sea turtle species. This completes the catalog of reference genomes of the entire sea turtle clade when combined with our previously published reference genomes. Our analysis reveals remarkable genome synteny and collinearity across all species, despite the clade’s origin dating back more than 60 million years. Regions of high interspecific genetic distance and intraspecific genetic …
AbstractBackground Reference genomes for the entire sea turtle clade have the potential to reveal the genetic basis of traits driving the ecological and phenotypic diversity in these ancient and iconic marine species. Furthermore, these genomic resources can support conservation efforts and deepen our understanding of their unique evolution.Results We present haplotype-resolved, chromosome-level reference genomes and high-quality gene annotations for five sea turtle species. This completes the catalog of reference genomes of the entire sea turtle clade when combined with our previously published reference genomes. Our analysis reveals remarkable genome synteny and collinearity across all species, despite the clade’s origin dating back more than 60 million years. Regions of high interspecific genetic distance and intraspecific genetic diversity are consistently clustered in genomic hotspots, which are enriched with genes coding for immune response proteins, olfactory receptors, zinc fingers, and G-protein-coupled receptors. These hotspot regions may offer insights into the genetic mechanisms driving phenotypic divergence among species, and represent areas of significant adaptive potential. Ancient demographic analysis revealed a synchronous population expansion among sea turtle species during the Pleistocene, with varying magnitudes of demographic change, likely shaped by their diverse ecological adaptations, and biogeographic contexts.Conclusions Our work provides genomic resources for exploring genetic diversity, evolutionary adaptations, and demographic histories of sea turtles. We outline genomic regions with increased diversity, linked to immune response, sensory evolution, and adaptation to varying environments that have historically been subject to strong diversifying selection, and likely will underpin sea turtle’s responses to future environmental change. These reference genomes can assist conservation by providing insights into the demographic and evolutionary processes that sustain and threaten these iconic species.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf105), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 1: Laura Caquelin
Summary of the Study The authors aimed to create high-quality reference genomes for five sea turtle species to better understand their genetic diversity, evolutionary adaptations, and ecological traits. They used haplotype-resolved, chromosome-level reference genomes and gene annotations to reveal conserved genome structures, genetic hotspots linked to immune response and sensory evolution, and patterns of demographic expansion. Their findings highlight areas of genetic diversity critical for adaptation and conservation efforts.
Scope of reproducibility
According to our assessment the primary objective is: Investigation of multi-copy gene family enrichment in genomic hotspots of sea turtles.
- Outcome: Significant enrichment of "MHC", "Immunology-related", "G-Protein Coupled Receptor" (GPCR), "Olfactory Receptor" or "Zinc-Finger" in genomic hotspots with high genetic divergence, diversity, and gene density.
- Analysis method outcome: Fisher's exact test followed by Benjamini-Hochberg correction
- Main result: "Following functional annotation of the genes found in these hotspots, we found enrichment for multi-copy gene families coding for proteins with functions in immune response, olfactory receptors (ORs), zinc fingers, and G-protein-coupled receptors (GPCRs_ (Fig 4c, Tables S6 & S7). This included enrichment of immunology-related genes, GPCRs, ORs, and Zinc-finger genes in chromosome 13 (adjusted p < 10-42, 10-47, 10-79, 0.01, respectively), MHC genes, Immunology-related genes, GPCRs, ORs, and Zinc-finger genes in chromosome 14 (adjusted p < 10-24, 10-6, 10-2, 10-10, 10-52, respectively) and Immunology-related genes and GPCRs in chromosome 24 (adjusted p < 10-3 and 10-3, respectively)." (page 10).
- Availability of Materials a. Data
- Data availability: Open
- Data completeness: Complete
- Access Method: Repository
- Repository: https://git.imp.fu-berlin.de/begendiv/sea_turtlegenomes
- Data quality: The data files have been shared and appear sufficient for running the analyses. However, no metadata is provided to describe the content, structure, or origin of the files which limits interpretability and reusability. b. Code
- Code availability: Open
- Programming Language(s): R (for the enrichment test)
- Repository link: https://git.imp.fu-berlin.de/begendiv/sea_turtlegenomes
- License: MIT license
- Repository status: Public
- Documentation: Short README, describe only the presentation of the directory.
- Computational environment of reproduction analysis
- Operating system for reproduction: MacOS 14.7.4
- Programming Language(s): R
- Code implementation approach: Using shared code
- Version environment for reproduction: R version 4.4.1/RStudio 2024.09.0
- Results
5.1 Original study results
Results 1: The main results are presented in Figure 4 and the numerical p-values are available on supplementary table 6 and table 7.
5.3 Steps for reproduction -> Run the code "enrichment_test.R" shared on Git
- Issue 1: Files needed to run the code are not shared in the Git repository: "GCF_009764565.3_rDerCor1.pri.v4_genomic.longest.aa.tsv", "hotspots_chr13.longest.aa.tsv", "hotspots_chr14.longest.aa.tsv", "hotspots_chr24.longest.aa.tsv". -- Resolved: These analysis data are not shared in the internal Gigascience FTP server or the Git repository. After request, the authors uploaded all the files into the Git repository.
5.4 Statistical comparison Original vs Reproduced results
Results: The table S6 and S7 was reproduced: -- Supplementary table S6: see screenshot from R console -- Supplementary table S7: see screenshot from R console
Comments: The original R code "enrichment_test.R" simply stored the p-values results in a value object. To simplify the comparison process, directly obtain the final table, and ensure reproducibility while minimizing errors, we implemented the creation of the table.
------------------ Start of R code ------------------ Creating final tables Corresponding to supplementary table S6 table_S6 <- data.frame( enrichment = c("MHC", "Immunology", "GPCR", "Olfactory", "Zinc-finger"), Chr13 = c(p_mhc13, p_immune13, p_gpcr13, p_or13, p_zinc13), Chr14 = c(p_mhc14, p_immune14, p_gpcr14, p_or14, p_zinc14), Chr24 = c(p_mhc24, p_immune24, p_gpcr24, p_or24, p_zinc24))
Corresponding to supplementary table S7 Create a vector of names for rows and columns ( ! warning the pvalues in fdrs are not in the same order as the table S7) enrichment <- c("MHC", "Olfactory", "GPCR", "Immunology", "Zinc-finger") chromosomes <- c("Chr13", "Chr14", "Chr24")
Reorganizing fdrs in a matrix table_S7 <- matrix(fdrs, nrow = length(enrichment), byrow = TRUE) rownames(table_S7) <- enrichment colnames(table_S7) <- chromosomes
Organizing rows as the original table S7 library(dplyr) table_S7 <- as.data.frame(table_S7) # Convert matrix to data frame table_S7 <- table_S7 %>% slice(match(c("MHC", "Immunology", "GPCR", "Olfactory", "Zinc-finger"), enrichment)) ------------------- End of R code -------------------
Errors detected: The statement "MHC genes, Immunology-related genes, GPCRs, ORs, and Zinc-finger genes in chromosome 14 (adjusted p < 10^-24, 10^-6, 10^-2, 10^-10, 10^-52, respectively)" (page 10) appears to contain an error. Specifically, the p-value for Olfactory Receptors (5.583367e-10) is greater than the threshold of 10^-10, suggesting that this value should instead be below 10^-9. Therefore, the threshold for Olfactory Receptors should be revised to 10^-9.
Statistical Consistency: The p-values are consistent (see screenshot from R console).
- Conclusion
Summary of the computational reproducibility review The inferential statistics for the objective "Investigation of multi-copy gene family enrichment in genomic hotspots of sea turtles" were successfully reproduced using the original analysis code provided by the authors. The input data needed to run the code were initially unavailable but were subsequently shared through the Git repository. An inconsistency was noted in the text of the manuscript reporting a threshold for Olfactory Receptors, where the stated 10^-10 should be revised to 10^-9 based on the observed p-value (5.583367e-10).
Recommendations for authors While the original analysis code was successfully used to reproduce the results, we recommend improving the documentation to enhance clarity and reproducibility. In particular: -- Code annotation: The scripts would benefit from more detailed comments within the code to clarify the logic of each step. This would greatly help users follow the analyses more easily and understand the purpose of specific commands or operations. -- README file: The current README provides only a general overview. We suggest expanding it to include: --- A brief description of each script or analysis pipeline. --- An indication of which figure, table, or result in the manuscript each script corresponds to. --- Clear instructions on how to execute the analyses in the correct order, if applicable. -- Metadata: For the datasets used or generated by the scripts, it would be helpful to include accompanying metadata files that explain: --- The definition of each variable name. --- The origin of each dataset (raw, processed, etc). --- Any preprocessing steps applied before analysis. -- Data availability: At this stage, we have only verified the reproducibility of one part of the study. To facilitate full reproducibility of the entire study, we recommend sharing all necessary data files required to run every script present in the repository.
These improvements would make the repository significantly more user-friendly and would strengthen the reproducibility of the study.
-
