Annelid Comparative Genomics and the Evolution of Massive Lineage-Specific Genome Rearrangement in Bilaterians
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Arcadia Science)
Abstract
The organization of genomes into chromosomes is critical for processes such as genetic recombination, environmental adaptation, and speciation. All animals with bilateral symmetry inherited a genome structure from their last common ancestor that has been highly conserved in some taxa but seemingly unconstrained in others. However, the evolutionary forces driving these differences and the processes by which they emerge have remained largely uncharacterized. Here, we analyze genome organization across the phylum Annelida using 23 chromosome-level annelid genomes. We find that while many annelid lineages have maintained the conserved bilaterian genome structure, the Clitellata, a group containing leeches and earthworms, possesses completely scrambled genomes. We develop a rearrangement index to quantify the extent of genome structure evolution and show that, compared to the last common ancestor of bilaterians, leeches and earthworms have among the most highly rearranged genomes of any currently sampled species. We further show that bilaterian genomes can be classified into two distinct categories—high and low rearrangement—largely influenced by the presence or absence, respectively, of chromosome fission events. Our findings demonstrate that animal genome structure can be highly variable within a phylum and reveal that genome rearrangement can occur both in a gradual, stepwise fashion, or rapid, all-encompassing changes over short evolutionary timescales.
Article activity feed
- 
  
- 
      We used our dataset to test the power of ALG-based macrosynteny as a taxonomic tool by asking whether it can be used to reliably identify characteristics for defining monophyletic groups of annelids. I know the this is just the results section, but it would be useful if you could clarify whether you're arriving at these conclusions using statistical methods or not. As written it's a bit unclear. 
- 
      In particular, chromosome fusion-with-mixing events have high potential as phylogenetically informative rare genomic changes (i.e. molecular synapomorphies) because they are irreversible (Rokas and Holland 2000; Schultz et al. 2023; Steenwyk and King 2024). Related to this, it might also be interesting to explore the use of your inferred macrosynteny as characters for phylogenetic inference! For instance, rather than using the concatenated single copy genes to infer your species tree, would use of macrosynteny lead to the same conclusions with respect to the relative prevalence of fusions/splitting events? 
- 
      (B) Ideogram plots of clitellate genomes colored by earthworm ancestral linkage groups. Is this not the same figure as in 3A, just excluding T. lapidaria an and colored according to the earthworm ancestral linkage groups? I think in some cases these ideogram plots are somewhat redundant across figures. This is ultimately a matter of preference, but it might help to streamline things by limiting the number of figures. 
- 
      WGD, whole genome duplication. Where exactly did this occur? And you point to the node? From the figure it's unclear if this occurred at the base of the clade or somewhere else. 
- 
      Bilaterian ancestral linkage groups are often fused but rarely split in errantian and sedentarian annelids. It's a bit difficult to gain clear intuition here regarding the relative prevalence of fusion vs splitting events. Could you perhaps plot the phylogeny with nodes labeled according to the count of fusions/splits (e.g. something like 2/0, where # fusions = 2, # splits = 0)? 
- 
      From this, it can be inferred that the annelid ancestral state was 20 ALGs and, therefore, 20 chromosomes. Where is this coming from? Are these numbers from explicit statistical methods, e.g. ancestral state reconstruction? I'd be clear about this, because as it, it reads as more of a hypothesis than an inference. 
- 
      concatenated alignment of 537 single-copy orthologs from 23 annelid genomes. Why did you decide to use concatenation rather than a partitioned analysis, or more formal species tree models capable of leveraging multi-copy gene families such as SpeciesRax (https://doi.org/10.1093/molbev/msab365), Asteroid (https://doi.org/10.1093/bioinformatics/btac832), or Astral-Pro 2 (https://doi.org/10.1093/bioinformatics/btac620)? I'd definitely suggest exploring the use of one of these alternatives, as the models for species trees are quite different from those for concatenation. Furthermore, in some cases concatenation can lead to highly "resolved" phylogenies with strong bootstrap support. For a more detailed discussion of the matter, I'd suggest taking a look at Liu et al 2015 (https://nyaspubs.onlinelibrary.wiley.com/doi/abs/10.1111/nyas.12747) 
- 
      LG+F+R7 model Using IQ-tree? I'd suggest specifying the method you chose to use here. 
- 
      Phylogeny of chromosome-level annelid genomes The order here seems a bit off, and led to one of my comments later on the that stemmed from this confusion. I would suggest ordering such that the sections describe: - The collection/curation and gene model annotation of the 23 genome assemblies and summaries of them.
- the inference of orthology
- phylogenetic inference, and
- the remaining analyses of macrosynteny
 I suggest this because currently, you describe phylogenetic analyses which use the orthogroups inferred using orthofinder, but those analyses or results are not described until after the phylogenetics. I think restructuring in this way might be a bit more intuitive for the reader. 
- 
      The topology is largely consistent with transcriptome-based phylogenies and supports the widely accepted division of the bulk of annelid diversity into two monophyletic groups, Errantia and Sedentaria, with Oweniidae and Sipuncula as basal lineages. Given the earlier comments about uncertainty in transcriptome based phylogenetics and the fact that the new phylogenetic hypotheses presented here using support those from transcriptome datasets, I might suggest either tempering the earlier statements, or describing in greater detail/specificity here exactly where there are differences in topology between the two. 
- 
      We built a maximum likelihood phylogeny of annelids using the chromosome-level genomes and newly annotated gene models (fig. 1; supplementary fig. S3). As written it's a bit ambiguous what the data used for phylogenomic inference here were - I might suggest rewording slightly to something along the lines of: "We built a maximum likelihood phylogeny of annelids using >500 single-copy genes sampled from the chromosome-level genome assemblies and newly annotated gene models" 
- 
      However, the current understanding of annelid phylogeny is largely based on transcriptomic data (Struck et al. 2011; Weigert et al. 2014; Andrade et al. 2015; Weigert and Bleidorn 2016) and subsequently retains a degree of uncertainty. Can you elaborate on/discuss why you believe transcriptome based phylogenomic inferences are more likely to retain uncertainty than from other genome-wide sources of data? I ask because there will always be variable degrees of uncertainty in phylogenetic/phylogenomic inference, irrespective of the source of data used. Furthermore, (measured) uncertainty can be confounded by the methods used, source of data, and more (see Simon 2022 for a discussion: https://academic.oup.com/sysbio/article/71/4/921/5904279). 
- 
  