The whale shark genome reveals patterns of vertebrate gene family evolution

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This manuscript is of interest in the field of comparative genomics as it provides novel genomic resources for the whale shark, which belongs to a group of vertebrates which has to date limited available genomic data. While the strength of this manuscript is the publication of a novel genomic resource, the importance of the conclusions and the broader impact remain unclear, particularly because the work on its own does not yet provide new insights into the biology of whale sharks.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Chondrichthyes (cartilaginous fishes) are fundamental for understanding vertebrate evolution, yet their genomes are understudied. We report long-read sequencing of the whale shark genome to generate the best gapless chondrichthyan genome assembly yet with higher contig contiguity than all other cartilaginous fish genomes, and studied vertebrate genomic evolution of ancestral gene families, immunity, and gigantism. We found a major increase in gene families at the origin of gnathostomes (jawed vertebrates) independent of their genome duplication. We studied vertebrate pathogen recognition receptors (PRRs), which are key in initiating innate immune defense, and found diverse patterns of gene family evolution, demonstrating that adaptive immunity in gnathostomes did not fully displace germline-encoded PRR innovation. We also discovered a new toll-like receptor (TLR29) and three NOD1 copies in the whale shark. We found chondrichthyan and giant vertebrate genomes had decreased substitution rates compared to other vertebrates, but gene family expansion rates varied among vertebrate giants, suggesting substitution and expansion rates of gene families are decoupled in vertebrate genomes. Finally, we found gene families that shifted in expansion rate in vertebrate giants were enriched for human cancer-related genes, consistent with gigantism requiring adaptations to suppress cancer.

Article activity feed

  1. Author Response:

    Reviewer #1:

    Tan et al. resequence the whale shark genome using long-read technology (PacBio), improving a previously available assembly, which was obtained from the same source DNA used in this study. The analyses of this improved genome led to a gapless assembly, which, together with the annotation, led the authors to analyze several features of the whale shark genome, including gene family gains/losses, the evolution of immune genes important for patter recognition receptors, rates of substitution in whale shark compared to other vertebrates, the evolution of genes potentially related with the emergence of gigantism and cancer rate.

    Whale sharks constitute a charismatic group of vertebrates for several reasons. First, they belong to a poorly studied group, and their genomic properties inform us about gene families dating back to the split between osteichthyes and chondrichthyes, i.e. to the origin of gnatostomes. Additionally, they have a unique biology, associated with their large size, which can inform us on the evolution of gigantism in vertebrates.

    Tan et al. indeed assemble a gapless genome for whale shark, with large contiguity (contig N50 of 10^5), i.e. providing a novel resource for shark genomes and for the study of early vertebrate evolution. However, repetitive regions that are not included in the assembly, account for over 700 Mb. The authors could provide more information about the genome assembly, the coverage, and a chromosome-level map of the genome.

    We clarify what this means in the relevant sentence (line 133) and added some clarity on lines 138–139. Supplementary Table 1 also lists these stats. A chromosome-level map is outside of the scope of what is possible with our present data.

    This improved assembly, together with the annotation, lead to exploring several aspects of the whale shark genome evolution. They reveal gene family gains and losses specific to the lineage leading to whale sharks. It is however not entirely clear to what extent the novel assembly enabled these findings compared to the previous assembly.

    The previous assembly was extremely fragmented with only ~15% BUSCO complete orthologs as previously noted by Hara et al. (2018). We thus regard the present genome assembly as large improvement for genome-based studies, including the present analyses based primarily on studying gene family evolution and downstream analyses. We now provide more information in the main text about the increase in ortholog completeness (line 155), which should make it clearer why gene family evolution analyses would be improved, including origin and loss of gene families, evolution of innate immune genes, and rates of gene family expansion.

    This work provides an important resource for the study of the evolution of Pattern Recognition Receptors in early vertebrates, potentially identifying a novel class of TLR (i.e. TLR29) in whale sharks. This work further suggests that a diverse set of PRR is fully compatible with the evolution of lymphocyte-based adaptive immunity. In other words, the authors hypothesize that the evolution of adaptive immunity did not lead to the functional loss of innate immune effectors.

    Using comparative genomics, the authors explore the genomic basis of gigantism and cancer evolution. First, they adopt the two-cluster test to study substitution rates in single-copy orthologs, confirming that cartilaginous fish have slower rate of substitution to other vertebrates - confirming a previous finding. However, whale shark substitution rate does not differ from other sharks (i.e. non-giant sharks), suggesting that gigantism may not be directly or uniquely correlated with a slow substitution rate. I would recommend the authors to expand on the use of the two cluster test in the result section.

    The two-cluster test is a fairly simple test and we describe the results of all of the analyses performed. The descriptions seem comparable to the detail provided in other publications that use the two-cluster test (e.g. Venkatesh et al. 2014, Weber et al. 2020).

    Overall, the manuscript would greatly benefit from leveraging on its main asset, which a genome assembly with relatively high contiguity. Even if the authors study the genome of one individual, they could provide more information regarding changes in heterozygosity along the genome.

    We have moved results about heterozygosity to the main text and added some more information regarding heterozygosity by performing SNP calling using freebayes to lines 146- 148. As we do not have a chromosome-scale assembly and that is outside the goals of the current study, we do not summarize changes in the heterozygosity along the genome.

  2. Evaluation Summary:

    This manuscript is of interest in the field of comparative genomics as it provides novel genomic resources for the whale shark, which belongs to a group of vertebrates which has to date limited available genomic data. While the strength of this manuscript is the publication of a novel genomic resource, the importance of the conclusions and the broader impact remain unclear, particularly because the work on its own does not yet provide new insights into the biology of whale sharks.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

  3. Reviewer #1 (Public Review):

    Tan et al. resequence the whale shark genome using long-read technology (PacBio), improving a previously available assembly, which was obtained from the same source DNA used in this study. The analyses of this improved genome led to a gapless assembly, which, together with the annotation, led the authors to analyze several features of the whale shark genome, including gene family gains/losses, the evolution of immune genes important for patter recognition receptors, rates of substitution in whale shark compared to other vertebrates, the evolution of genes potentially related with the emergence of gigantism and cancer rate.

    Whale sharks constitute a charismatic group of vertebrates for several reasons. First, they belong to a poorly studied group, and their genomic properties inform us about gene families dating back to the split between osteichthyes and chondrichthyes, i.e. to the origin of gnatostomes. Additionally, they have a unique biology, associated with their large size, which can inform us on the evolution of gigantism in vertebrates.

    Tan et al. indeed assemble a gapless genome for whale shark, with large contiguity (contig N50 of 10^5), i.e. providing a novel resource for shark genomes and for the study of early vertebrate evolution. However, repetitive regions that are not included in the assembly, account for over 700 Mb. The authors could provide more information about the genome assembly, the coverage, and a chromosome-level map of the genome. This improved assembly, together with the annotation, lead to exploring several aspects of the whale shark genome evolution. They reveal gene family gains and losses specific to the lineage leading to whale sharks. It is however not entirely clear to what extent the novel assembly enabled these findings compared to the previous assembly.

    This work provides an important resource for the study of the evolution of Pattern Recognition Receptors in early vertebrates, potentially identifying a novel class of TLR (i.e. TLR29) in whale sharks. This work further suggests that a diverse set of PRR is fully compatible with the evolution of lymphocyte-based adaptive immunity. In other words, the authors hypothesize that the evolution of adaptive immunity did not lead to the functional loss of innate immune effectors.

    Using comparative genomics, the authors explore the genomic basis of gigantism and cancer evolution. First, they adopt the two-cluster test to study substitution rates in single-copy orthologs, confirming that cartilaginous fish have slower rate of substitution to other vertebrates - confirming a previous finding. However, whale shark substitution rate does not differ from other sharks (i.e. non-giant sharks), suggesting that gigantism may not be directly or uniquely correlated with a slow substitution rate. I would recommend the authors to expand on the use of the two cluster test in the result section. Overall, the manuscript would greatly benefit from leveraging on its main asset, which a genome assembly with relatively high contiguity. Even if the authors study the genome of one individual, they could provide more information regarding changes in heterozygosity along the genome.

  4. Reviewer #2 (Public Review):

    While the new assembly is certainly not the major selling point, I find the computational analysis very deep and comprehensive. The authors show that different groups of pattern recognition receptors undergo different modes of evolution, from very stable (RLRs) to showing expansions in diverse lineages (TLRs and especially NLRs), and thus reveal general evolutionary patterns of innate immune genes. Furthermore, they confirm that large animals in general have lower rates of protein evolution but show that the whale shark does not evolve at lower rates compared to smaller sharks. Interestingly, gene families that evolved at different rates in lineages including large animals are enriched in cancer genes.

  5. Reviewer #3 (Public Review):

    The manuscript by Tan et al., leverages the group's improved whale shark genome assembly to elucidate patterns of genomic evolution in vertebrates. The authors use long read PacBio sequencing to assemble an improved reference genome for the whale shark, a member of cartilaginous fish group, which as a group lacks genomic data relative to other vertebrate groups. After summarizing the results of the new genome, the authors use the new assembly to infer gene family gain and loss during vertebrate evolution. They found that vertebrate genome duplications that occurred early in vertebrate evolution coincided with, but were not necessarily the cause of, a burst of new gene family generation in early vertebrates. Building on this analysis, the group was able to identify gene families gained during in vertebrate, jawed vertebrate, and bony vertebrate evolution with immune-related gene innovation coinciding with jawed vertebrate and bony vertebrate evolution.

    Next, the group used the new whale shark reference genome to assess how the emergence of the adaptive immune system (as cartilaginous fishes are the most evolutionary distant group from humans to possess both innate and adaptive immune systems) influenced immune evolution by analyzing pattern recognition receptor (PRR) evolution in jawed vertebrates. They found that, while there is PRR turnover and diversification among jawed vertebrates, a core set of PRRs found in the last common ancestor of jawed vertebrates is well conserved. This signifies that innate immunity is still required in vertebrates but PRR gain and loss is relatively slowed versus organisms lacking an adaptive immune system. The authors then attempt to find conserved evolutionary mechanisms that may be associated with vertebrate gigantism. They could not find any robust evolutionary pathways associated with vertebrate gigantism but did find that gene families involved in cancer were more likely to have a shifted gain/loss rate for vertebrate giants than would be expected by chance.

    Genomic sampling from a wider array of evolutionarily distant species is critical to understanding the evolution of physical traits and molecular pathways. With this new/refined assembly of the whale shark genome, the authors have greatly improved the quality of an understudied cartilaginous fish genome, making its use for comparative genomic and evolutionary studies more robust and powerful. Indeed, the authors themselves make use of this new reference, notably to interrogate the impact of the adaptive immune system on innate immune evolution through the lens of PRR genes in that species, or the potential evolutionary drivers of gigantism.

    Although the new genome assembly has promise, this reviewer has several major concerns that need to be addressed before conclusions are supported and before the manuscript is compatible with reproducible science practices.

    A major comment relates to the PRR evolution analysis, which is not sufficiently explained. For instance, the authors say that they used subsampling of a previously used set for the TLR analysis. The nature of the original set is not described satisfactorily (i.e. species, database, etc; see point 4 below), the method and depth of subsampling is not explained (i.e. software used, % of coverage, random or guided), and the scientific rationale for such a choice instead of using a full set is not given. In addition, why does the TLR analysis include more species than for NLR and RIG-like? When trying to infer NLR repertoire shifts in vertebrates, the authors compare humans (mammal, bony vertebrate), zebrafish (bony fish), and whale shark (cartilaginous fish). The selection of other included species in that analysis needs to: (i) be transparent, including information about how the other species/sequences are selected and obtained and (ii) be unified, i.e. include the same taxonomic depth for all considered groups.