Chromosome-level genome assembly and methylome profile yield insights for the conservation of endangered loggerhead sea turtles

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Background

Characterizing genetic and epigenetic diversity is crucial for assessing the adaptive potential of threatened populations and species in the face of climate change. Sea turtles are particularly vulnerable due to their temperature-dependent sex determination (TSD) system, which heightens the risk of extreme sex ratio bias and extinction under future climate scenarios. High-quality genomic and epigenomic resources will therefore support conservation efforts for these endangered flagship species with such plastic traits.

Findings

We generated a chromosome-level genome assembly for the loggerhead sea turtle (Caretta caretta) from the globally important Cabo Verde rookery. Using Oxford Nanopore Technology (ONT) and Illumina reads followed by homology-guided scaffolding to the same species, we achieved a contiguous (N50: 129.7 Mbp) and complete (BUSCO: 97.1%) assembly, with 98.9% of the genome scaffolded into 28 chromosomes and 33,887 annotated genes. We also extracted the blood methylome profile from our ONT reads, which was confirmed to be representative of the reference population via whole-genome bisulfite sequencing of 10 additional loggerheads from the same population. Applying our novel resources, we revealed high conservation of synteny between sea turtle species, reconstructed population size fluctuations in line with major climatic events, and identified microchromosomes as key regions for monitoring genetic diversity and epigenetic flexibility. Isolating 199 TSD-linked genes, we further built a large network of functional protein associations and blood-based methylation patterns.

Conclusions

We present a high-quality loggerhead sea turtle genome and methylome from the globally significant East Atlantic population. By leveraging ONT sequencing, we generate genomic and epigenomic resources simultaneously and showcase the potential of this approach for driving molecular insights for conservation of endangered sea turtles.

Article activity feed

  1. Background Characterising genetic and epigenetic diversity is crucial for assessing the adaptive potential of populations and species. Slow-reproducing and already threatened species, including endangered sea turtles, are particularly at risk. Those species with temperature-dependent sex determination (TSD) have heightened climate vulnerability, with sea turtle populations facing feminisation and extinction under future climate change. High- quality genomic and epigenomic resources will therefore support conservation efforts for these flagship species with such plastic traits.Findings We generated a chromosome-level genome assembly for the loggerhead sea turtle (Caretta caretta) from the globally important Cabo Verde rookery. Using Oxford Nanopore Technology (ONT) and Illumina reads followed by homology-guided scaffolding, we achieved a contiguous (N50: 129.7 Mbp) and complete (BUSCO: 97.1%) assembly, with 98.9% of the genome scaffolded into 28 chromosomes and 29,883 annotated genes. We then extracted the ONT-derived methylome and validated it via whole genome bisulfite sequencing of ten loggerheads from the same population. Applying our novel resources, we reconstructed population size fluctuations and matched them with major climatic events and niche availability. We identified microchromosomes as key regions for monitoring genetic diversity and epigenetic flexibility. Isolating 191 TSD-linked genes, we further built the largest network of functional associations and methylation patterns for sea turtles to date.Conclusions We present a high-quality loggerhead sea turtle genome and methylome from the globally significant East Atlantic population. By leveraging ONT sequencing to create genomic and epigenomic resources simultaneously, we showcase this dual strategy for driving conservation insights into endangered sea turtles.

    This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giaf054), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer: F Gözde Çilingir

    In this study, the authors generated a high-quality chromosome-level genome assembly and methylome for the loggerhead sea turtle (Caretta caretta) using a combination of Oxford Nanopore Technology (ONT) and Illumina sequencing. They also examined population size fluctuations, identified microchromosomes as key areas for monitoring genetic diversity and epigenetic flexibility, and focused on genes linked to temperature-dependent sex determination (TSD), with additional datasets from 10 individuals using whole-genome bisulfite sequencing (WGBS).The study consists of three key parts: 1) genome sequencing and assembly, 2) benchmarking ONT methylation calls with WGBS, and 3) epigenetic patterning of TSD-linked genes, which was contextualized for future studies. The first part certainly includes relatively novel genomic resources that will provide valuable tools for conservation and population genomics. It's encouraging to see the use of DNA modification detection via ONT, with a comprehensive analysis of 5mC and 5hmC methylomes alongside genomes—especially for chelonians, a group that is underrepresented among available vertebrate genomes. Benchmarking ONT methylation calls with WGBS is also relevant for the field (though some clarifications on the experimental design are necessary). However, I have several concerns regarding the biological rationale of certain study design choices and the conclusions drawn by the authors regarding the TSD-linked genes' methylation patterns.Overall, this study provides valuable genomic resources for loggerhead sea turtles. However, some of the biological assumptions and study design choices regarding the methylation patterning require further clarification and a more robust discussion to ensure that the conclusions drawn can be supported by the data produced.Detailed comments to the authorsABSTRACTThe abstract states: "Isolating 191 TSD-linked genes, we further built the largest network of functional associations and methylation patterns for sea turtles to date."Throughout the manuscript, this number changes. Please double-check and ensure consistency in the number of TSD-linked genes reported.BACKGROUNDI suggest using the phrase "a skew toward female-biased sex ratios" instead of "feminisation" throughout the text for a clearer and more neutral description of the biological phenomenon. For example, the third sentence of the second paragraph could be revised as:"As multiple theoretical studies have predicted a significant skew toward female-biased sex ratios and subsequent population collapse by 2100 in response to future climate scenarios."METHODSPage 5, DNA extraction, sequencing, and quality control

    • first paragraph:ONT kit chemistry numbers and flow cell types can be confusing for readers. Could you also clarify that the SQK-LSK109 kit used is associated with R9.4.1 flow cells, indicating the sequencing error profile of the technology?Regarding the Phred score >Q8 cutoff: Q8 corresponds to a sequencing error rate of ~15-16%. Could you clarify the reasoning behind choosing this cutoff? Citing similar studies that have used this threshold would add support to your decision.Page 8: I couldn't find the de novo assembled transcriptomes in the ENA or GigaDB repositories. Are these data publicly available? If so, it would be beneficial to provide the location.Page 9, ONT methylation call and validation with WGBS:There's a discrepancy between the retained CpGs: you mention "26,449,075 CpGs" in one place and later report different numbers in the results section. Please clarify these numbers and ensure consistency.It would be helpful to include a table summarizing key metrics of the ONT methylation call, such as mean/median CpG site coverage, similar to Table S3.Page 9, second paragraph: You mention "Ten nesting loggerheads." Please specify that these are ten adult loggerhead females for clarity. Additionally, correct the table references: Table S3 should be Table S2, Table S4 should be Table S3, etc.RESULTS AND DISCUSSIONGenome AssemblyFigure 1B: While Table 1 effectively illustrates the differences in contiguity levels, Figure 1B doesn't add much due to the difficulty in distinguishing closely aligned lines. If you retain the figure, I suggest using more contrastive colors to improve readability.Genome Annotation: I agree that the lack of a pre-determined training parameter set for chelonians within the BRAKER pipeline leads to relatively incomplete gene model predictions. However, lifting over gene models from other sea turtle genomes and combining them with predictions (again using TSEBRA) would likely improve the overall completeness of the annotations.Methylation Call and ValidationYou state, "To verify our ONT methylation call, we compared calls with ten loggerhead methylomes re-sequenced via WGBS." Does this mean you generated an ONT methylome from a single individual and compared it to the average methylation levels from ten different individuals obtained with WGBS? If so, this may not be an ideal benchmarking strategy. Generating both ONT and WGBS data for all individuals would provide a more robust comparison. Clarifying this design would help the reader understand the validation process better. Additionally, consider citing relevant benchmarking studies.In the last paragraph of this section, you highlight ONT as a robust alternative to WGBS but then use WGBS for the TSD-linked gene analysis. This appears somewhat contradictory. It might be useful to explain why WGBS was favored in this part of the analysis.Genome Properties: Figures 3C-F were difficult to read to me (low resolution), and they don't seem directly related to Figures 3A and 3B. I suggest separating these figure groups for better clarity. Additionally, it would be helpful to report or visualize the repeat content of both micro and macro chromosomes. Long-read sequencing assemblies are particularly effective at resolving repeat-rich regions, and microchromosomes are often repeat-rich. Highlighting this aspect would demonstrate the added value of long-read sequencing for assembling reference genomes of organisms like sea turtles.TSD-linked genes: methylation patternsTesting methylation differences between TSD-linked and non-TSD-linked genes focusing on specific regulatory regions is potentially informative, but the biological rationale for expecting consistent differences between these two groups is unclear. TSD-linked genes are involved in dynamic, environmentally responsive processes, whereas non-TSD-linked single-copy orthologues (as used in the study) typically represent essential, evolutionarily conserved functions with more stable methylation patterns. The use of single-copy orthologues as a control set is problematic because these genes could serve fundamentally different roles. A more relevant comparison would be between TSD-linked genes and other genes involved in similarly dynamic, environmentally responsive pathways.Additionally, all methylation data come from adult female blood (N=10, all from the same beach), which may not be the most appropriate approach for studying TSD, a process that primarily occurs during embryonic development, when temperature cues influence sex determination. Methylation patterns in adults may no longer reflect the active regulatory processes that control TSD during embryogenesis. In other words, adult methylation patterns could be influenced by factors such as reproductive status or aging, and may not reflect the regulation of TSD-linked genes during key developmental stages. These limitations/points should be addressed.CONCLUSIONSThe manuscript would benefit from a discussion of how biological context (such as developmental stage) affects the interpretation of methylation patterns in this study.It is also worth mentioning that both ONT and WGBS require substantial amounts of input DNA, and blood samples from reptiles are ideal because of their nucleated red blood cells-this could be acknowledged as a practical advantage somewhere in the text.SUPPLEMENTARY INFOCould you explain what "DMS" refers to in Text S3? This term isn't defined in the manuscript.There are two Figure S7, please change the last one to Figure S8.SUPPORTING DATAThe FTP server data look good, but I couldn't find the de novo transcriptomes. Some files have long, confusing names—adding a README file in each directory would help clarify the contents.Important note: It would be helpful to include line numbers in the manuscript to facilitate direct and effective feedback.
  2. Background Characterising genetic and epigenetic diversity is crucial for assessing the adaptive potential of populations and species. Slow-reproducing and already threatened species, including endangered sea turtles, are particularly at risk. Those species with temperature-dependent sex determination (TSD) have heightened climate vulnerability, with sea turtle populations facing feminisation and extinction under future climate change. High- quality genomic and epigenomic resources will therefore support conservation efforts for these flagship species with such plastic traits.Findings We generated a chromosome-level genome assembly for the loggerhead sea turtle (Caretta caretta) from the globally important Cabo Verde rookery. Using Oxford Nanopore Technology (ONT) and Illumina reads followed by homology-guided scaffolding, we achieved a contiguous (N50: 129.7 Mbp) and complete (BUSCO: 97.1%) assembly, with 98.9% of the genome scaffolded into 28 chromosomes and 29,883 annotated genes. We then extracted the ONT-derived methylome and validated it via whole genome bisulfite sequencing of ten loggerheads from the same population. Applying our novel resources, we reconstructed population size fluctuations and matched them with major climatic events and niche availability. We identified microchromosomes as key regions for monitoring genetic diversity and epigenetic flexibility. Isolating 191 TSD-linked genes, we further built the largest network of functional associations and methylation patterns for sea turtles to date.Conclusions We present a high-quality loggerhead sea turtle genome and methylome from the globally significant East Atlantic population. By leveraging ONT sequencing to create genomic and epigenomic resources simultaneously, we showcase this dual strategy for driving conservation insights into endangered sea turtles.

    This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giaf054), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer: Victor Quesada

    This work offers a in improved version of the reference genome for the loggerhead sea turtle. The authors have also analyzed the methylation patterns of blood obtained from different individuals and with two methods. The resulting data set includes gene annotations, methylation levels and the specific analysis of methylation levels of genes involved in temperature-dependent sex determination (TSD). While the improvements offered by this work seem modest, I think that the data sets may provide important resources for future works.-In my opinion, the use of a previous version of the same genome in the assembly process should be noted in the abstract. It would be enough to write "... followed by homolgy-guided scaffolding to GSC_CCare_1.0...".-If possible, the authors should clarify the taxonomic relationship between the reference individual in this work and the reference individual for the previous version of the genome (ref. 26). Is it the same NCBI taxid?-There is a mention to "lateral terminal repeats" at the "Genome annotation" section (page 7). I think it is a typo and it should read "long terminal repeats".-In the same section, at page 9, reference 73 refers to StringTie, not gffread. In addition, it is not clear how "in-frame stop codons were removed". A simple way to unambiguously explain this would be to provide the options that were used, as with other programs.-I would revise the use of "coverage" versus "depth". For instance, the expression "...a coverage of 9.2(...)X" would be more precise as "...a sequencing depth of 9.2(...)X". Coverage should be a fraction or a percentage. However, this is only a piece of advice, as there is no strong consensus at the moment.-The interpretation of methylation patterns is always difficult. In my opinion, the manuscript should discuss several limitations about the results:First, using blood as the starting tissue is convenient but not ideal, as many methylation patterns are tissue-specific. The authors may want to add a reference to preliminary evidence that some methylation changes in blood cells are related to TSD (Bock et al., Mol Ecol. 2022; 31:5487-5505).Second, the work examines broad patterns of methylation (all promoters, all coding sequences,...). While this may be interesting for descriptive purposes, it may also drown significant signals. The manuscript should mention this limitation.*Figure 2B shows methylation per gene. If the aim is to compare both kinds of sequencing, there should be at least one comparison of methylation per CpG, which might even be cathegorial or downsampled.-The origin of the duplication of EP300 seems outside the scope of the manuscript. Nevertheless, given that the question is posed, the authors may want to perform a simple phylogenetic analysis of the sequences. Even the basic analysis of the annotated copies plus an outgroup is likely to give a robust answer to this question.-For the benefit of non-specialists, the manuscript might include a brief mention of how microchromosomes allow a larger number of combinations of variants without chromosome recombination.-Some expressions may be edited for clarity and precission. Examples are "which should be verified whether they are true" (page

    1. and "microchromosomes have greater methylation potential and realised levels...".
  3. Background Characterising genetic and epigenetic diversity is crucial for assessing the adaptive potential of populations and species. Slow-reproducing and already threatened species, including endangered sea turtles, are particularly at risk. Those species with temperature-dependent sex determination (TSD) have heightened climate vulnerability, with sea turtle populations facing feminisation and extinction under future climate change. High- quality genomic and epigenomic resources will therefore support conservation efforts for these flagship species with such plastic traits.Findings We generated a chromosome-level genome assembly for the loggerhead sea turtle (Caretta caretta) from the globally important Cabo Verde rookery. Using Oxford Nanopore Technology (ONT) and Illumina reads followed by homology-guided scaffolding, we achieved a contiguous (N50: 129.7 Mbp) and complete (BUSCO: 97.1%) assembly, with 98.9% of the genome scaffolded into 28 chromosomes and 29,883 annotated genes. We then extracted the ONT-derived methylome and validated it via whole genome bisulfite sequencing of ten loggerheads from the same population. Applying our novel resources, we reconstructed population size fluctuations and matched them with major climatic events and niche availability. We identified microchromosomes as key regions for monitoring genetic diversity and epigenetic flexibility. Isolating 191 TSD-linked genes, we further built the largest network of functional associations and methylation patterns for sea turtles to date.Conclusions We present a high-quality loggerhead sea turtle genome and methylome from the globally significant East Atlantic population. By leveraging ONT sequencing to create genomic and epigenomic resources simultaneously, we showcase this dual strategy for driving conservation insights into endangered sea turtles.

    This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giaf054), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer: Zhongduo Wang The study presents high-quality genomic and methylomic data for loggerhead sea turtles, serving as a significant resource for further genomic and epigenomic research on this species. Notably, this is the first methylome derived from a sea turtle using ONT technology, offering a new, reliable method for studying the epigenetic characteristics of non-model organisms. Moreover, by integrating genomic and methylomic data, the authors analyze the functionality and methylation patterns of TSD-related genes, contributing fresh perspectives to the molecular mechanisms underlying TSD. While the study offers valuable data, there are several areas that could be enhanced.1) Lack of Reference to Hawksbill Turtle Genome: The manuscript does not discuss any information regarding the hawksbill turtle genome. Given that hawksbills also published a comparative analysis of the loggerhead's genomic data, I recommend that the authors include relevant information or clarify why hawksbill data was not considered.2) Further Optimization of Genome Annotation: The authors acknowledge that the completeness of the genome annotation requires enhancement and mention future improvements such as species-specific parameter adjustments and manual curation. While it is understandable that time and resource constraints may have limited these optimizations prior to submission, it would be beneficial for the authors to clarify the reasons for this and outline a timeline for future enhancements.3) Information on Individual Variability in WGBS Results: The manuscript lacks specific information on inter-individual variability among the ten individuals in the WGBS data. I suggest that the authors consider adding this analysis or provide justification for its absence. If significant variability exists among individuals, averaging the methylomic data could obscure important biological information.4) Clarification on Statistical Tests and Data Processing: The manuscript employs several statistical tests such as t-tests, Ftests, and chi-squared tests. However, the methods section lacks detailed information on how the data was processed for these analyses. I recommend that the authors provide a more thorough explanation of the data preparation steps, assumptions checked, and justification for the choice of tests.In summary, this manuscript makes a significant contribution to the study of loggerhead turtle genomics and methylomics. Addressing the aforementioned points could further enhance the quality and impact of the work.