Chromosome-level reference genome for the medically important Arabian horned viper ( Cerastes gasperettii )

Gabriel Mochales-Riaño
Samuel R Hirst
Adrián Talavera
Bernat Burriel-Carranza
Viviana Pagone
Maria Estarellas
Theo Busschau
Stéphane Boissinot
Michael P Hogan
Jordi Tena-Garcés
Davinia Pla
Juan J Calvete
Johannes Els
Mark J Margres
Salvador Carranza

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (GigaScience)

Abstract

Venoms have traditionally been studied from a proteomic and/or transcriptomic perspective, often overlooking the true genetic complexity underlying venom production. The recent surge in genome-based venom research (sometimes called “venomics”) has proven to be instrumental in deepening our understanding of venom evolution at the molecular level, particularly through the identification and mapping of toxin-coding loci across the broader chromosomal architecture. Although venomous snakes are a model system in venom research, the number of high-quality reference genomes in the group remains limited. In this study, we present a chromosome-resolution reference genome for the Arabian horned viper Cerastes gasperettii (NCBI: txid110202), a venomous snake native to the Arabian Peninsula. Our highly contiguous genome (genome size: 1.63 Gbp; contig N50: 45.6 Mbp; BUSCO: 92.8%) allowed us to explore macrochromosomal rearrangements within the Viperidae family, as well as across squamates. We identified the main highly expressed toxin genes within the venom glands comprising the venom's core, in line with our proteomic results. We also compared microsyntenic changes in the main toxin gene clusters with those of other venomous snake species, highlighting the pivotal role of gene duplication and loss in the emergence and diversification of snake venom metalloproteinases and snake venom serine proteases for C. gasperettii. Using Illumina short-read sequencing data, we reconstructed the demographic history and genome-wide heterozigosity of the species, revealing how historical aridity likely drove population expansions. Finally, this study highlights the importance of using long-read sequencing as well as chromosome-level reference genomes to disentangle the origin and diversification of toxin gene families in venomous snake species.

GigaScience
Jul 8, 2025

Venoms have traditionally been studied from a proteomic and/or transcriptomic perspective, often overlooking the true genetic complexity underlying venom production. The recent surge in genome-based venom research (sometimes called “venomics”) has proven to be instrumental in deepening our molecular understanding of venom evolution, particularly through the identification and mapping of toxin-coding loci across the broader chromosomal architecture. Although venomous snakes are a model system in venom research, the number of high-quality reference genomes in the group remains limited. In this study, we present a chromosome-resolution reference genome for the Arabian horned viper (Cerastes gasperettii), a venomous snake native to the Arabian Peninsula. Our highly-contiguous genome allowed us to explore macrochromosomal rearrangements …

Venoms have traditionally been studied from a proteomic and/or transcriptomic perspective, often overlooking the true genetic complexity underlying venom production. The recent surge in genome-based venom research (sometimes called “venomics”) has proven to be instrumental in deepening our molecular understanding of venom evolution, particularly through the identification and mapping of toxin-coding loci across the broader chromosomal architecture. Although venomous snakes are a model system in venom research, the number of high-quality reference genomes in the group remains limited. In this study, we present a chromosome-resolution reference genome for the Arabian horned viper (Cerastes gasperettii), a venomous snake native to the Arabian Peninsula. Our highly-contiguous genome allowed us to explore macrochromosomal rearrangements within the Viperidae family, as well as across squamates. We identified the main highly-expressed toxin genes compousing the venom’s core, in line with our proteomic results. We also compared microsyntenic changes in the main toxin gene clusters with those of other venomous snake species, highlighting the pivotal role of gene duplication and loss in the emergence and diversification of Snake Venom Metalloproteinases (SVMPs) and Snake Venom Serine Proteases (SVSPs) for Cerastes gasperettii. Using Illumina short-read sequencing data, we reconstructed the demographic history and genome-wide diversity of the species, revealing how historical aridity likely drove population expansions. Finally, this study highlights the importance of using long-read sequencing as well as chromosome-level reference genomes to disentangle the origin and diversification of toxin gene families in venomous species.

This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf030 ), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

Reviewer Hardip Patel

Dear Authors, thank you for compiling this resource and the manuscript. I apologise for the delay in my review. I have read the manuscript with great interest. I have some major concerns that need be addressed and a lot of minor concerns. Without line numbers, it was difficult to provide comments. I have chosen to write the part of the sentence that my comment refers to for you to consider for improvements.

Major concerns:

Abstract can include quantitative values for some key results such as the genome size, contiguity (e.g.N50, L90) and quality metrics (e.g. BUSCO) of the genome assembly among other result claims listed in the abstract. Venom as the keyword can perhaps be described/defined. Authors interchangeably use "venom", "toxin", "venom toxin", genes coding venom proteins. I strongly suggest the use of consistent terminologies that are well defined in the manuscript. Methods need elaborate descriptions about reagents, procedures including for library preparations, sequencing machines, library kits and versions, etc. These are relevant for downstream analyses. For all software, list parameters used, even if default, then explicitly state that "default parameters were used". For all software, list version numbers used for analyses. Authors are urged to change "macorsynteny" and "microsynteny" terms to chromosome level and local synteny analyses. This is to avoid confusion related to macro/microchromosomes. "Genomic diversity" analyses use cross-species alignments and variant calling using software and methods developed for same species data. This can introduce significant bias in downstream interpretation and use of the variant data (heterozygosity measure may be). I suggest removal of this section because of lack of accuracy. Discussion of new discovery is largely lacking. I would appreciate if authors contextualized their results with other discoveries in the field. Section headings in Results and Discussions can be changed to reflect main findings instead of "transcriptomics" or "genomic diversity". One of the main findings is about SVMP gene family expansion. However, due to the lack of evidence about assembly accuracy in the region, accurate annotation of copies, and the effect of studying "primary assembly" instead of "haplotype assembly" at this region, I am not convinced of claims made in the paper. Appropriate justification is required for this section. The nomenclature of SVMP genes is confusing. For example, In Figure 4A, they are all labelled as SVMPs with different colours, but then they are labelled as MDCs and MADs in Figure 4b and Supp Figure 6. Please label each gene in each species with consistent names that can reflect orthologous relationship. This is hard to discern, especially without appropriate species labels in Supp Figure 6. Provide MSA files and trees used to infer evolutionary history. In the absence of the sequence alignments, and raw tree file, I am unable to evaluate this section of the manuscript. Please provide all required details for reviewers and readers. ??: It is not clear what authors mean by the word, term, phrase. Please correct them to convey accurate meaning using established and accepted scientific terminologies and English conventions. Minor concerns:

Abstract:

"compousing" ?? "highly expressed toxin genes": in what tissues? "genome-wide diversity" ?? "toxin gene families in venomous species" -> "toxin gene families in venomous snake species" Background: "Such advances in sequencing technologies": remove "Such" "depending on their type, interactions, and the organism": interactions with what? "proteomic (and transcriptomic) approaches": remove parenthesis "to new therapies for human illnesses including but not": since the title contains "medically important", it would be great to include some specific examples here from the literature. "However, venomous snakes are one": remove "However" "therefore, the fundamental model system": change "fundamental" to "useful" "of medical importance by the World Health Organization (WHO) due to their": provide citation "Within venomous snakes, the most medically": restructure the sentence for brevity and clarity. "cytotoxic effects (among others)": remove "(among others)" "conducted using a proteomic approach": clarify what proteomic approach mean here. "Hirst et al., (in review);" : remove this citation "within the Viperidae family posses an available reference": change the word "posses" to something meaningful "Moreover, employing several -omics techniques": be specific about techniques "We deciphered numerous genomic attributes": be specific Methods: Describe how blood was extracted from animals with all details including animal handling techniques, body part etc. "was stored in RNAlater until RNA extraction": source for RNAlater "We extracted gDNA from the blood of a female individual": provide additional details such as the quantity of blood used, thawing process, qty of reagents, especially elution buffer etc. Manufacturer protocols may be suited best for mammalian blood (humans, mice) without nucleus in RBCs unlike snakes. "Then, we sequenced a total of two 8M SMRT HiFi cells, aiming for a âˆ¼30x of coverage, at the University of Leiden": provide details of library preparation, sequencing machine etc. "(including venom glands, tongue, liver and pancreas, among others": Either list all or refer to the table. "RNA libraries were prepared with the VAHTS": Was the library and sequencing strand specific? Provide complete details on these processes. "8M SMRT HiFi cell containing two Iso-seq HiFi libraries": use correct names of these and also include sequencing machine details. "Quality control on HiFi and Illumina reads was assessed using FastQC": correct the phrasing of this sentence "To make an initial exploration of the genome, …..we generated a k-mer profile with Meryl": Explicitly state the purpose of this analysis. "Manual curation was performed with Pretext": cite Pretext properly. Explain decisions of this manual curation. i.e. what evidence was used to join or break contigs. "Then, we ran three iterative rounds of RepeatMasker to annotate the known and unknown elements identified by RepeatModeler and soft-masked the genome for simple repeats": break this sentence into two and explain reasons for running RepeatMasker three times. "We used GeMoMa v.1.9": Include all details about the annotations. This sentence is not sufficient for reproducibility. Were the RNAseq data assembled or provided as raw files to GeMoMa. How were they mapped to the genome assembly f "published: Anolis carolinensis from AlfÃ¶ldi": Remove the word "from" here as citation is sufficient. Provide details of assembly versions, annotation version, database of annotations etc. "Crotalus ruber from Hirst et al., (in review)": remove this citation or list it as personal communication "We previously quality checked and removed the adapters of the RNA-seq data": remove "previously" and provide details on how adapters were removed from RNAseq data "also removed the adapters for the Iso-seq data": Explain how this was performed. "We blast our ..": Change all occurrence of "blast" to "BLAST" and specify parameters, if it was BLASTN or BLASTP or something else. This is not clear at all. "we performed additional annotation steps for venom genes.": Details are not complete for reproducibility. State explicitly what decisions were made and how gene structure was determined. This is the main part of the paper and does require accurate details. "Whole-genome synteny was explored between": synteny by definition refers to being on the same string/chromosome. Therefore whole-genome synteny as a term doesn't make sense given that genome is divided into chromosomes. Revise it to say "chromosomal synteny" "chromosomes assembled in the reverse complement, which were corrected using SAMtools faidx": samtools faidx cannot do this. Explain how this was done. "After adapter trimming and quality control, we mapped our RNA-seq reads": how were adapters trimmed and QC implemented. "Gene counts per gene": change gene counts to read counts "Differential expression analyses were carried out": requires additional details such as filters applied for the count, groups compared, statistical model, multiple testing correction methods. "characterize the venom arsenal of Cerastes gasperettii": change the arsenal word. "Fragmentation spectra were matched against a customized database including the bony vertebrates taxonomy dataset of the NCBI non-redundant database": revise for accuracy "Unmatched MS/MS spectra were de novo sequenced": spectra were sequenced how?? "we used blast, incorporating both toxin and non-toxin paralogs": change blast to BLAST and provide additional details about the tool used "Then, we aligned those regions using Mafft (Katoh": provide coordinates of these regions for future research in each assembly "history for the main groups of toxins (i.e.,": parenthesis is not closed. Close it or remove it. "we also included other non-toxin paralogous genes from nontoxic species (for details about this see Supplementary Information": where do I look into the supplementary information? Be very clear. Provide coordinates of regions that were compared. "When needed, we translated CDS": when was this needed? Explain. "built a phylogeny for each of the toxin groups using Phyml": I presume that this is done with translated CDS sequences in toxin genomic regions. Please clarify. "Heterozygous positions were obtained from bam files with Samtools v1.9": provide details as to how this was done. Samtools doesn't have features to operate at a site level and therefore I am confused. "Filtered reads were mapped against the new reference genome of Cerastes gasperettii using the bwa mem algorithm": bwa mem is designed for same species comparisons. Here you have used it for crossspecies. Provide justification and perhaps biases it may have introduced for distantly related species. "SNP calling was carried out …": This is not appropriate as models assume same species data. You have used cross-species alignments, which can be highly biased. Results and Discussion: "PacBio HiFi (~40x), Hi-C (~60x) and Illumina data (~78x)": change to number of base pairs. 40x for a genome of 2GB is 80GB data and for genome of 1GB size, it is 40GB data. Before sequencing and assembly, the genome size cannot be known. "After manual curation, we enhanced the scaffolding parameters of our genome": what was done as manual curation. Please specify. "âˆ¼228 times more contiguous than the Anolis sagrei genome": how is 228 more measured. How is this useful as a metric without the known ground truth. Assemblies can and do have errors. "27,158 different protein-coding genes within our assembly": this seems large compared to other species. Can you elaborate or compare these numbers with other species. "Toxin genes usually found in venomous snakes (see proteome results below) were mainly found on macrochromosomes, although major toxin groups were found on microchromosomes (SVMPs, SVSPs and PLA2; Fig. 1)." : please revise this statement. Two part of the sentence are saying opposite things. Second provide coordinates of these genes as GFF/BED file as supplementary file with their exon structure annotations for others to reuse this information. "showed a great level of similarity between Cerastes gasperettii and Crotalus adamanteus": provide quantitative metrics for "great" level of similarity. "we found several fission events in the A. sagrei genome,": Since A. sagrei genome is not contiguous and chromosome scale, you cannot infer fissions as it may be artefact of non-contiguous assembly. If that is not the case, provide evidence of this. "The last four…": Belongs in methods "Macrosyntenic differences between lizards and snakes": this is very superficial discussion point. Please remove it or strengthen it with evidence. "Heatmap analyses with the most 2,000": Revise this statement. It doesn't make sense. E.g. Heatmap is a visualisation technique and not analyses method. "We studied venom evolution within the most abundant toxin groups": rewrite the sentence for clarity and brevity. "After a thorough manual curation": Explain what was this manual curation process clearly and the purpose of it. "contiguous tandem repeat SVMPs for": Change "repeat" to "array" because tandem repeat has a different meaning in genomics research context. "flanked by the NEFL and NEFM": Unclear if they are both 5' or 3' of toxin genes. Clarify "Microsyntenic analyses showed": change to local synteny "gene copy number variation between": Since these are duplicate copies, clearly state how gene copies were identified. Include details of open reading frames, exon structures, pseudogene status, etc "we can see an expansion in": Describe number of new copies, their status as intact or not, and sequence similarity between copies. Provide evidence that there is no false duplication due to heterozygous allele collapse in the assembly. "More genomic data will indicate if SVMP12": Did you mean SVMP13? "This difference may be expected, as PLA2 only represents around 5% of the proteome for Cerastes gasperettii": This is not true. Proteome doesn't equal to genome in some cases and superficial inference such as this is not warranted. For PSMC analyses, please discuss the effect of mutation rate and generation time. Figures: Figure 1: Add y-axis scales to the circos plot. Figure 1b legend says it is a linkage map, but looks more like HiC contact map. Please edit. Figure 1b legend also says "including the sex chromosomes", which is not consistent with the circos plot. Figure 3A refers to transcriptome and 3b to proteome. Please make this very clear. Figure 4A, C and E, label genes consistent with the phylogenetic trees in supplementary figures so readers can know their genomic arrangements. Figure S4: Discuss why CG1 sample separates from rest of the samples. Seems like a batch effect.

Read the original source
GigaScience
Jul 8, 2025

Venoms have traditionally been studied from a proteomic and/or transcriptomic perspective, often overlooking the true genetic complexity underlying venom production. The recent surge in genome-based venom research (sometimes called “venomics”) has proven to be instrumental in deepening our molecular understanding of venom evolution, particularly through the identification and mapping of toxin-coding loci across the broader chromosomal architecture. Although venomous snakes are a model system in venom research, the number of high-quality reference genomes in the group remains limited. In this study, we present a chromosome-resolution reference genome for the Arabian horned viper (Cerastes gasperettii), a venomous snake native to the Arabian Peninsula. Our highly-contiguous genome allowed us to explore macrochromosomal rearrangements …

Venoms have traditionally been studied from a proteomic and/or transcriptomic perspective, often overlooking the true genetic complexity underlying venom production. The recent surge in genome-based venom research (sometimes called “venomics”) has proven to be instrumental in deepening our molecular understanding of venom evolution, particularly through the identification and mapping of toxin-coding loci across the broader chromosomal architecture. Although venomous snakes are a model system in venom research, the number of high-quality reference genomes in the group remains limited. In this study, we present a chromosome-resolution reference genome for the Arabian horned viper (Cerastes gasperettii), a venomous snake native to the Arabian Peninsula. Our highly-contiguous genome allowed us to explore macrochromosomal rearrangements within the Viperidae family, as well as across squamates. We identified the main highly-expressed toxin genes compousing the venom’s core, in line with our proteomic results. We also compared microsyntenic changes in the main toxin gene clusters with those of other venomous snake species, highlighting the pivotal role of gene duplication and loss in the emergence and diversification of Snake Venom Metalloproteinases (SVMPs) and Snake Venom Serine Proteases (SVSPs) for Cerastes gasperettii. Using Illumina short-read sequencing data, we reconstructed the demographic history and genome-wide diversity of the species, revealing how historical aridity likely drove population expansions. Finally, this study highlights the importance of using long-read sequencing as well as chromosome-level reference genomes to disentangle the origin and diversification of toxin gene families in venomous species.

This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf030 ), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

** Reviewer Blair Perry**

Mochales-Riano et al. present a high-quality genome assembly for the Arabian horned viper and provide a suite of genomic analyses related to synteny, toxin gene evolution and expression, genomic diversity, and demographic history of this and related species. This species is a valuable addition to existing snake genome resources given its medical significance and the current underrepresentation of genomes for Viperidae. I also appreciate that the authors sequenced the heterogametic sex and successfully assembled both sex chromosomes. I do have a few questions and concerns about the manuscript in its current form that I highlight below. Most notably, I feel that the arguments throughout the manuscript about toxin gene copy number correlating with proteomic abundance to be poorly supported and generally problematic given the data and analyses that the authors present. I suggest that the authors reevaluate these claims, and either provide additional analyses in an effort to support these claims or otherwise remove them from the manuscript, as I don't think they are ultimately crucial to the value of this genome report.

Introduction:

I find the argument being made in the sentence beginning "Previous works have shown that changes in gene regulation" a bit confusing. Rather than this arguing that studying the expression of venom genes is "insufficient," I think that this instead argues that transcriptomic and proteomic data are critical for studying venom in conjunction with annotated genome sequence. You could for example have a species with 20 copies in a particular tandem array, but only two of them are ever expressed at biologically meaningful levels and thus contribute proteins to the excreted venom. Knowing both the total number of copies in the genome and the number that are actually contributing to the venom proteome are both valuable and necessary for understanding the evolution of that gene family, its role and significance in venom phenotypes, etc. I'm also not sure I follow the logic of the next sentence. Why exactly would the identification of specifically "unexpressed" toxin genes be particularly notable for antivenom, drug discovery, therapeutics, etc.? "We deciphered numerous genomic attributes of this species including its genetic diversity and failed to find evidence of inbreeding" - lack of inbreeding is never discussed in the context of the heterozygosity results, but is pitched here as a major result of the paper. Did the authors have a priori expectations regarding inbreeding in this species?

Methods:

"Gene counts per gene…" - should this be "Gene expression counts per gene…"? Venom gland RNA-seq data was generated from three animals, but proteomic data was generated from a pool of two other animals. This is not ideal for linking gene expression to venom proteome composition, where you really would want venom collected from the same animals you are getting venom gland RNA from. This is especially true is there is intraspecific variation in venom phenotypes within this species. The latitude and longitude are not provided for the two proteome samples. Were these collected from the same latitude and longitude as the RNA-seq animals? For analyses of heterozygosity, the authors map wgs data from diverse species against the cerastes reference and call variants. Why was this approach chosen over instead mapping the data for each species to either that species' reference (i.e., C. viridis and N. naja) or a more closely related species for those without a reference? Presumably that would reduce the potential influence of reference bias on these estimates of heterozygosity?

Results:

"Toxin genes usually found in venomous snakes (see proteome results below) were mainly found on macrochromosomes, although major toxin groups were found on microchromosomes (SVMPs, SVSPs and PLA2; Fig. 1)" this feels a bit contradictory. Maybe just can state that toxin genes were found on both macro and microchromosomes? "Finally, we also found a battery of 3FTxs and myotoxin-like genes, but they were not represented in our RNA-seq dataset (see below)." The authors do not further discuss this result as implied by "(see below)," unless that was simply referring to subsequent discussion of RNA-seq data. From what I can tell, these are also not present in the proteomic data, correct? "The venom gland transcriptome contained a total of 7,237 genes expressed (TPM > 500), including a total of 65 putative toxin genes. Differential gene expression analyses revealed a total of 161 genes (33 putative toxin genes) that were differentially upregulated (FC > 2 and 1% FDR) in venom glands compared to other tissues (Fig. 3A)." Figure 3A only shows 10 toxin genes with "unique" expression in the venom gland, not the 161 upregulated toxin genes as implied here. The authors should add a heatmap with these 161 genes to the supplement, if not to Figure 3 (guessing it might not fit). Fig 3: The authors do not discuss the lack of unique/upregulated expression evidence for PLA2s and Disintegrins in Fig 3A, despite their contribution to protein composition in Fig 3B. For disintegrins in particular, they represent a higher proportion of the venom proteome than CTLs and CRISPs, yet there is no evidence presented for high expression in these genes. What do the authors think is going on here? Could this be a technical issue related to the processing of the RNAseq data, perhaps related to the small size of these genes? Alternatively, could this be indicative of a mismatch between venom phenotypes of the animals used to generate transcriptomic versus proteomic data? In the text, the authors state "These genes, together with other SVMPs, SVSPs, Disintegrins (DISI) and Ctype lectins (CTL), were highly expressed in the venom gland and form the core toxic effector components of the venom" but again there is no presented evidence for DISI expression in particular. Are these genes included in the 161 upregulated genes in the venom gland? The authors only present proteomic data in the form of a pie chart of overall composition grouped by toxin family (Fig 3B). Does the proteomic data generated here provide individual gene-level proteomic abundance estimates? If so, this would be valuable to include, especially in support of the authors claims about gene copy number being correlated with protein abundance. For example in Figure 3, SVMP9 and SVMP10, and to a lesser extent SVMP13, are highly expressed and therefore possibly/likely the major contributors to SVMPs in the proteome. Is the SVMP section of the pie chart in Fig 3B dominated by proteins from these 3 genes? "We studied venom evolution within the most abundant toxin groups (i.e., SVMPs and SVSPs, as well as PLA2)." PLA2s are a relatively low proportion of the venom proteome in Fig 3B, and are not present in the expression heatmap in Fig 3A. Why were these chosen for further investigation over CTL, CRISP, DISI, etc.? "The amplification of SVMP copy numbers is consistent with proteomic results, as SVMPs were the second most abundant component…". Related to my comment above, are all/many of these copies expressed in proteomic, or at least transcriptomic, data? As the data is currently presented, it appears that a small number of SVMPs are highly expressed and thus likely contributing to the proteome. This does not support, and might in fact contradict, the authors claim that there is an association with increased copy number and contribution to the proteome. Related to this, and more generally, the authors do not present a convincing argument for the relationship between gene copy number and the resulting percentage of a given toxin gene family in the proteome. If copy number is directly related to the resulting amount of a toxin in the proteome, the authors would need to show that many/all of those copies are expressed in the transcriptomic data, and that proteins produced from those genes are present and contributing to the venom proteome (beyond just the total percentage for the family). Further, making any links between copy number and percent overall composition in the proteome is problematic, because it inherently is impacted by copy number variation and expression of all the other toxin genes. You could, in theory, have copy number expansion in a species where all the genes are expressed and contribute to the proteome, but no overall change in the percent of that toxin family in the proteome if other toxin families have also expanded and/or are expressed more highly. Related to this, there is currently no obvious baseline to compare against in order to make these claims that expansion has resulted in higher venom proteome composition (i.e., a situation where we have fewer SVMP gene copies and a corresponding lower percentage of SVMP proteins in the venom proteome). This would potentially require comparison across species and/or populations with differing copy number, etc. My concerns above also apply to the interpretation of SVSP results: "The high number of SVSP genes found (although lower than in Crotalus adamanteus) were in line with the proteomic results, as SVSPs are the most abundant toxin in the proteome (Fig. 3B)." Further, C. adamanteus has a larger number of SVSP genes than C. gasperettii, yet a lower percent composition of SVSPs in the proteome (Margres et al. 2014), emphasizing my concerns about associating copy number and percent composition. Could the two large Group 2 SVSPs in Fig 4E be misannotations of multiple genes? Looking at the adamanteus genes above these, there genes starting and ending at roughly the same position the start and end of these large SVSPs, making me wonder if there are multiple cerastes genes that were annotated as one. In my own experience, I have seen similar situations where FGENESH+ was fed a large region containing multiple genes and annotated multiple genes together as one, so might just be worth double checking that that hasn't happened here. Alternatively, could these be gene fusions? If that's the case, that would presumably complicate the gene tree analyses, correct? i.e., these genes would probably need to excluded from those analyses

Read the original source
GigaScience
Jul 8, 2025
Venoms have traditionally been studied from a proteomic and/or transcriptomic perspective, often overlooking the true genetic complexity underlying venom production. The recent surge in genome-based venom research (sometimes called “venomics”) has proven to be instrumental in deepening our molecular understanding of venom evolution, particularly through the identification and mapping of toxin-coding loci across the broader chromosomal architecture. Although venomous snakes are a model system in venom research, the number of high-quality reference genomes in the group remains limited. In this study, we present a chromosome-resolution reference genome for the Arabian horned viper (Cerastes gasperettii), a venomous snake native to the Arabian Peninsula. Our highly-contiguous genome allowed us to explore macrochromosomal rearrangements …
Venoms have traditionally been studied from a proteomic and/or transcriptomic perspective, often overlooking the true genetic complexity underlying venom production. The recent surge in genome-based venom research (sometimes called “venomics”) has proven to be instrumental in deepening our molecular understanding of venom evolution, particularly through the identification and mapping of toxin-coding loci across the broader chromosomal architecture. Although venomous snakes are a model system in venom research, the number of high-quality reference genomes in the group remains limited. In this study, we present a chromosome-resolution reference genome for the Arabian horned viper (Cerastes gasperettii), a venomous snake native to the Arabian Peninsula. Our highly-contiguous genome allowed us to explore macrochromosomal rearrangements within the Viperidae family, as well as across squamates. We identified the main highly-expressed toxin genes compousing the venom’s core, in line with our proteomic results. We also compared microsyntenic changes in the main toxin gene clusters with those of other venomous snake species, highlighting the pivotal role of gene duplication and loss in the emergence and diversification of Snake Venom Metalloproteinases (SVMPs) and Snake Venom Serine Proteases (SVSPs) for Cerastes gasperettii. Using Illumina short-read sequencing data, we reconstructed the demographic history and genome-wide diversity of the species, revealing how historical aridity likely drove population expansions. Finally, this study highlights the importance of using long-read sequencing as well as chromosome-level reference genomes to disentangle the origin and diversification of toxin gene families in venomous species.

This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf030 ), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

Reviewer Jiatang Li

In the manuscript entitled 'Chromosome-level reference genome for the medically important Arabian horned viper (Cerastes gasperettii)', the authors assembled a high-quality chromosome-level reference genome for the Arabian horned viper (Cerastes gasperettii), a special Viperid species, which is an important data resource. Combined with multi omics data, the authors characterized the genome, conducted the analysis of toxin gene family, and identified a novel SVMP gene. The research is with great significance for the revelation of the origin and diversification of snake venom. Overall, I think the science and findings of the study are meaningful and merit publication, but in its current form, there are some issues should be noticed:

It should be noted that Fig. 1 and Fig. 2 both have unidentified border lines.

In all phylogenetic trees presented by the manuscript, it would be better for authors to indicate all species information.

I'm curious if the authors considered period differences in sampling, for example differences in venom glands after venom harvest or in the resting state, which could affect the analysis especially the transcriptome.

In the transcriptomics section, the author stated that the batch effect of CG1 was due to the low mapping of that sample to our reference genome. It is a misinterpretation to me as CG1 itself is the genome sequencing sample. The authors should further explain for this.

The authors need to ensure that all data generated by the manuscript is accessible and information about the data is not currently available.

Please check the references to ensure that the formatting meets the publisher's requirements, e.g., some Latin names of species requiring italics.
Read the original source
Version published to 10.1093/gigascience/giaf030
Jan 1, 2025
Version published to 10.1101/2024.07.29.605543 on bioRxiv
Jul 29, 2024

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed