Comparative genomics reveals insight into the evolutionary origin of massively scrambled genomes

This article has been Reviewed by the following groups

Read the full article

Listed in

Log in to save this article

Abstract

Ciliates are microbial eukaryotes that undergo extensive programmed genome rearrangement, a natural genome editing process that converts long germline chromosomes into smaller gene-rich somatic chromosomes. Three well-studied ciliates include Oxytricha trifallax , Tetrahymena thermophila, and Paramecium tetraurelia , but only the Oxytricha lineage has a massively scrambled genome, whose assembly during development requires hundreds of thousands of precisely programmed DNA joining events, representing the most complex genome dynamics of any known organism. Here we study the emergence of such complex genomes by examining the origin and evolution of discontinuous and scrambled genes in the Oxytricha lineage. This study compares six genomes from three species, the germline and somatic genomes for Euplotes woodruffi , Tetmemena sp ., and the model ciliate O. trifallax . We sequenced, assembled, and annotated the germline and somatic genomes of E. woodruffi, which provides an outgroup , and the germline genome of Tetmemena sp . We find that the germline genome of Tetmemena is as massively scrambled and interrupted as Oxytricha ’s: 13.6% of its gene loci require programmed translocations and/or inversions, with some genes requiring hundreds of precise gene editing events during development. This study revealed that the earlier diverged spirotrich, E. woodruffi , also has a scrambled genome, but only roughly half as many loci (7.3%) are scrambled. Furthermore, its scrambled genes are less complex, together supporting the position of Euplotes as a possible evolutionary intermediate in this lineage, in the process of accumulating complex evolutionary genome rearrangements, all of which require extensive repair to assemble functional coding regions. Comparative analysis also reveals that scrambled loci are often associated with local duplications, supporting a gradual model for the origin of complex, scrambled genomes via many small events of DNA duplication and decay.

Article activity feed

  1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Reply to the reviewers

    Reviewer #1 (Evidence, reproducibility and clarity (Required)):

    Summary:

    Ciliates extensively rearrange their somatic genome every time a new somatic nucleus develops from the zygotic germline nucleus. In this manuscript, Feng et al report the sequencing, assembly and annotation of the germline and somatic genomes of Euplotes woodruffi and the germline genome of Tetmemena sp. (whose somatic genome was sequenced and assembled by the same lab in 2015). They present a comparative analysis of developmentally programmed genome rearrangements in these two species and in the model ciliate Oxytricha trifallax. Their major findings are that:

    (i) E. woodruffi and Tetmemena sp. eliminate a smaller fraction of their germline genome (~54%) from their somatic macronucleus (MAC) than O. trifallax (>80%)

    (ii) Transposable elements (TE) represent a smaller fraction of the germline genome (~2%) in the first two ciliates than in O. trifallax (~15%). TEs are mainly located at the boundaries of germline chromosomes and in intergenic regions, but can also be found inside IESs

    (iii) Several thousands of genes are scrambled in the germline genome of all three species

    The authors have also addressed the possible origin of gene scrambling. They report an interesting association with local paralogy and propose a model for the emergence of the odd-even pattern of gene unscrambling between two paralogous copies.

    Major comments:

    1. Based on the statistics presented in Table 1, genome assemblies are of good quality, with a reasonable N50 size of germline (MIC) contigs. It seems, however, that no entire MIC chromosome could be assembled, since no two-telomere contig is mentioned in the list. As proposed by the authors (p.7) the presence of numerous TEs at the boundaries of MIC contigs (Fig S1) may have hindered the assembly of MIC chromosome ends. I would have appreciated to have more information on the "other repeats" (which seem to differ from tandem repeats according to Fig 2) and their location along MIC contigs.

    Subcategories of “other repeats” were included in Table S2 based on Repeatmasker annotations. We now analyzed the locations of other repeats in MIC contigs and include those as well in new Figure S1B. About 30% of “other” transposable elements are present at the boundaries of MIC contigs, which may also hinder the assembly. Notably, 35-45% of “other TEs” are in assembled, intergenic regions.

    The definition of "Internal Eliminated Sequences" (IES) is not clear. The authors make a distinction between IESs and TEs. I understand that IESs are DNA segments that separate two macronuclear-destined sequences (MDS) in the germline genome. Thus they appear to be restricted to those regions that eventually yield gene-sized MAC chromosomes. IESs are eliminated between two pointers that may not be identical on both sides in case of scrambled genes. Some clarification is needed here.

    To illustrate my point: I found the statement "with many TE insertions within IESs, suggesting that TE insertions may have generated IESs" particularly confusing (p. 9 lines 5-6). Does this mean that IESs extend beyond the ends of inserted TEs? The legend of Fig S1 should also be clarified.

    We clarified the text and legend. IESs can extend beyond the ends of inserted TEs, even if the original IES is a decayed TE, due to subsequent sequence evolution at the boundaries or if the original insertion was into an existing IES. David Prescott referred to sequence evolution at the edges of IESs as “pointer sliding” (ref.36).

    p. 10 lines 2-4 and Fig S2: Could the authors explain the difference they make between MDS (in the text) and CDS (in Fig S2)? My understanding is that a CDS is the entire gene coding sequence and may be made of multiple MDSs. If this is correct, the sentence should read "We compared the number of MDSs between single-copy orthologs for single-gene MAC chromosomes across the three species and found that the orthologs have similar CDS lengths".

    Yes, we made the correction.

    p. 12 lines 10-15: the discovery that paralogous MDSs can be found in scrambled genomic loci is interesting. If the two paralogs can be distinguished based on the number of substitutions, it would be informative to go back to individual reads and check whether each of the two copies can be incorporated in the unscrambled CDS (and at which frequency). Would the pointers be compatible with this?

    The paralogous MDSs in the MIC are often not identical. The copy with the highest similarity is assigned as “preliminary match” by SDRAP (ref. 52), and others are assigned as “additional matches”. To validate SDRAP assignments, we did pairwise BLASTN alignments (“-task megablast”) of paralogous MIC MDSs and their corresponding MAC MDSs. We confirmed that in the three species, the preliminary match has the best or equally best pid (percentage of identity) in most cases. Therefore, the MDS assigned as preliminary match is more likely the paralog incorporated into the MAC chromosome.

    We used genome assemblies of Euplotes woodruffi, which had the highest Nanopore coverage, to further investigate the frequency of MDS incorporation. We followed the reviewer’s suggestion and called SNP variants on both MAC and MIC genomes. For MAC SNP calling, we used Illumina reads as input for freebayes (ref a). For MIC SNP calling, we used Nanopore reads, instead of Illumina reads, to avoid non-specific short-read mapping on paralogous MDSs and to avoid the presence of any contaminating MAC reads. Variants were called and phased by PEPPER-Margin-DeepVariant (ref b), a new tool published in 2021 in Nature Methods, which has been reported to have similar accuracy to Illumina read variant calling, especially at high read coverage. We used the parameter “--pepper_min_coverage_threshold 20” to call confident variants when at least 20 reads cover the position. Only 92 MIC SNPs in the paralogous MDSs passed all filters of the program. Using this small set of MIC SNPs, we were unfortunately unable to distinguish which paralogous MIC MDS was incorporated into the MAC. Therefore, we cannot infer with what frequency one paralogous MDS is incorporated over another, until they become sufficiently diverged, which is compatible with the model.

    a. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907. 2012 Jul 17.

    b. Shafin K, Pesout T, Chang PC, Nattestad M, Kolesnikov A, Goel S, Baid G, Kolmogorov M, Eizenga JM, Miga KH, Carnevali P. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nature methods. 2021 Nov;18(11):1322-32.

    The hypothesis that odd-even scrambled loci have evolved from paralogous genes in E. woodruffi is supported by the existence of paralogous MDSs, length conservation of MDS/IES pairs and sequence similarity between corresponding MDS and IES in a pair. The correlations presented for Oxytricha and Tetmemena are much less convincing (Fig S5D and E). I recommend that the authors are even more cautious in their statement on p.13 ("For Oxytricha and Tememena, the MDS and IES lengths for such MDS/IES pairs also correlate positively, but more moderately").

    Thank you, we rephrased the text.

    p. 15 last paragraph: Why did the authors focus only on TBEs inserted in non-scrambled IESs to look for orthologous TBE insertions? Is there a reason to believe that no recent TBE insertion occurred at other genomic loci? Or was it only for practical reasons? It is also not clear to me whether the authors have considered full-length TBEs or the presence of at least one TBE ORF.

    This analysis was limited for practical reasons, because we identify position conservation of TBEs by aligning protein sequences of MAC genes. We only consider TBEs inserted in non-scrambled IESs in exons. It would be difficult and less meaningful to align completely non-coding MIC-limited regions.

    Partial TBEs are also included if they contain at least one TBE ORF (detected by BLAST).

    Furthermore, TE insertion cannot explain the origin of scrambled IESs, and TEs rarely map to scrambled IESs (Figure S1A), but there is a clear evolutionary model for the origin of nonscrambled IESs from decay of TBEs (ref. 49). Initial purifying selection would act on the TE to maintain its ability to self-excise, whereas we advocate for a different model for the origin of scrambled IESs by decay of paralogous MDSs.

    p. 16: the authors report that some introns of E. woodruffi map "near" Oxytricha/Tetmemena pointers. How near? Based on the information provided by the authors, I don't think this observation necessarily implies that IESs were converted to introns (or reciprocally) during evolution. If this were true, shouldn't at least one intron boundary coincide exactly with a pointer? The authors should clarify this (also in the discussion, on p. 20, top paragraph).

    We used a 20bp window (~7 amino acids), as described in the Methods, and added that to the Results. Full detail is provided in the Methods section, “Ortholog comparison pipeline and Monte Carlo simulations”. 103 *E. woodruffi *introns are within 20bp from the midpoint of Oxytricha/Tetmemena pointers. Among these, 43 intron boundaries overlap an *Oxytricha *or Tetmemena pointer. We observed 306 cases of precisely matching boundaries between any two species, where the exon junction of one species maps inside the MDS/IES pointer of another species, although we would only expect the boundaries of introns and IESs to coincide so precisely if they were recent conversions. Hence we feel that a window analysis is informative.

    p. 19 2nd paragraph: the suggested mechanism explaining the 5' bias of IESs in E. woodruffi genes is unclear. How could germline recombination take place between a MIC chromosome and a MAC reverse transcript or nanochromosome? This would imply that DNA could be imported in the MIC. Is there evidence that this might occur?

    The ability of TEs to invade the MIC demonstrates that even foreign DNA can be incorporated into the MIC. Since MAC DNA is present at high copy number, it offers a potential source for a recombination template that could erase IESs, as could an errant reverse transcript of one of the long noncoding template RNAs. Any of these would be infrequent events that would matter on an evolutionary time scale even if developmentally rare.

    According to Figure 1, no scrambled genes have been reported in Paramecium tetraurelia. Within the frame of the proposed model, this is somewhat unexpected because this ciliate went through several whole genome duplications during evolution and harbors many paralogous gene pairs. Is there a reason why no gene scrambling took place in Paramecium?

    Paramecium uses only TA dinucleotide pointers for IES elimination, unlike the rich diversity of pointers in spirotrichous ciliates. This limitation in its machinery may explain why no scrambled loci have been observed in Paramecium, despite the abundance of paralogs. Our model suggests that local MIC paralogy is associated with the origin of scrambling. But most of the paralogy reported in *Paramecium *is at the level of whole chromosomes in the MAC (ref. 104) rather than local MIC paralogy.

    Minor comments:

    p. 4 (4th bottom line): To my knowledge, ref #28 presents a draft (incomplete) MIC assembly of the Paramecium genome.

    Thank you, we added reference 29 and adjusted the wording describing the quality of MIC genome draft assemblies.

    p. 7 (last paragraph): "encoding" should be replaced by "carrying"

    Thank you, we made the change.

    p. 10 (2nd paragraph): insert a missing "o" into "nanochromosomes"

    Thank you, corrected.

    p. 10 (same paragraph): the weak 5' bias of IES distribution in Tetmemena should be shown (either as an additional panel in Fig 3 or in a Sup Figure.

    Thank you, we added it as Figure S2C.

    p. 24 2nd paragraph: "a" is missing in "Trinity, which is a software..."

    Thank you, we made the correction.

    CROSS-CONSULTATION COMMENTS

    I agree with most comments of reviewer 3.

    The authors have actually defined "TE" in the introduction (p. 6). Depending on the journal's rules for abbreviation use, it may not be necessary to define it again in the results section

    Reviewer #1 (Significance (Required)):

    Ciliates are unicellular models to study developmentally programmed genome rearrangements at the mechanistic, genome-wide and evolutionary levels. These aspects have so far mostly been addressed in three species: P. tetraurelia and Tetrahymena thermophila on the one hand, the spirotrichous ciliate O. trifallax on the other.

    One new piece of information that can be found in the present manuscript is the assembly and annotation of the germline genome of two novel species: Tetmemena sp, closely related to Oxytricha, and the more distant E. woodruffi. Feng et al establish that, similar to other ciliates, Tetmemena and Euplotes eliminate TEs and other germline-specific sequences during programmed genome rearrangements. They also undergo extensive gene unscrambling, which results in IES removal and MDS reordering to assemble coding sequences.

    A TE origin was discussed previously for Paramecium (Arnaiz et al PLoS Genet; Sellis et al 2021 PLoS Biol) and Tetrahymena IESs (Hamilton et al 2016 eLife). While this may also hold true in spirotrichous ciliatesThe present manuscript proposes a completely new evolutionary scenario for IESs from scrambled genes. Here, Feng et al establish that scrambled genes of spirotrichous ciliates tend to be associated with local paralogy. They provide evidence supporting that IESs from scrambled genes may have evolved from paralogous MDSs.

    Although I am more an expert in the molecular mechanisms involved in genome rearrangements, I feel that the work reported here should draw the attention of a broader audience interested in genome dynamics and evolution, beyond the specific field of spirotrichous ciliate biology.

    Reviewer #3 (Evidence, reproducibility and clarity (Required)):

    Feng et al. provide a solid analysis of the evolution of genome rearrangement in spirotrich ciliates. The authors applied a variety of state-of-the-art sequencing and bioinformatic methods to investigate the intriguing and extremely complex patterns of genome architecture in this protist lineage. Methods (including statistical analyses) are adequate and explained in detail. Results and discussions reflect careful, clever analysis of the data and excellent linkage with the literature. Figures and tables complement the text in a compelling way. I have only minor suggestions:

    Summary: more gradually introduce Spirotrichea and the phylogenetic relationship among the three species analyzed. This would better position the reader to understand the evolutionary context you are working in. Also, it would be helpful to more clearly differentiate novel vs. existing data. A suggestion: "This study focuses on three spirotrich species: two in the family Oxytrichidae (Oxytricha trifallax and Tetmemena sp) and Euplotes woodruffi as an outgroup. To complement existing data, we sequenced, assembled and annotated the germiline and somatic genomes of E. woodruffi and the germline genome of Tetmemena sp."

    Thank you, we clarified the summary (abstract).

    Introduction, first paragraph: Replace "The species in this study..." for a more precise statement, such as "The three spirotrich species studied here..."

    Thank you, we have made this statement more precise.

    p. 4: This sentence is unclear: "These useful tools provide partial insight to guide selection of species for full genome sequencing, which allows construction of complete rearrangement maps of a MIC genome onto a MAC genome for a reference species."

    Thank you, we have clarified this sentence.

    p. 8: define TE on first mention.

    Defined on page 6.

    Table 1. Indicate which MIC and MAC data are from this study.

    References are included for published data and a note has been added to indicate data from this study.

    Reviewer #3 (Significance (Required)):

    The present work represents a significant advance in the field of evolutionary genomics. The focus of the paper is on ciliates, an ancient (2 billion-year old) and highly diverse eukaryotic phylum that presents many peculiarities, including sex, nuclear dimorphism, genome rearrangement, high numbers of paralogs and transposons, etc. While some data exist on a few model ciliates of disparate phylogenetic position, this work focuses on two species taxonomically placed in the same family, plus a more distant outgroup within the same class. This gives a novel dimension to this study, that goes beyond exploring genome architecture in a single clade. Instead, it allows to explore evolutionary trends in genome rearrangement among relatively closely related species. This paper should be of high interest not only for ciliate biologists (like me), but also in relation to comparative genomics of protists/eukaryotes and germ-soma biology. I highly recommend publication.

  2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #3

    Evidence, reproducibility and clarity

    Feng et al. provide a solid analysis of the evolution of genome rearrangement in spirotrich ciliates. The authors applied a variety of state-of-the-art sequencing and bioinformatic methods to investigate the intriguing and extremely complex patterns of genome architecture in this protist lineage. Methods (including statistical analyses) are adequate and explained in detail. Results and discussions reflect careful, clever analysis of the data and excellent linkage with the literature. Figures and tables complement the text in a compelling way. I have only minor suggestions:

    • Summary: more gradually introduce Spirotrichea and the phylogenetic relationship among the three species analyzed. This would better position the reader to understand the evolutionary context you are working in. Also, it would be helpful to more clearly differentiate novel vs. existing data. A suggestion: "This study focuses on three spirotrich species: two in the family Oxytrichidae (Oxytricha trifallax and Tetmemena sp) and Euplotes woodruffi as an outgroup. To complement existing data, we sequenced, assembled and annotated the germiline and somatic genomes of E. woodruffi and the germline genome of Tetmemena sp."

    • Introduction, first paragraph: Replace "The species in this study..." for a more precise statement, such as "The three spirotrich species studied here..."

    • p. 4: This sentence is unclear: "These useful tools provide partial insight to guide selection of species for full genome sequencing, which allows construction of complete rearrangement maps of a MIC genome onto a MAC genome for a reference species."

    • p. 8: define TE on first mention.

    • Table 1. Indicate which MIC and MAC data are from this study.

    Significance

    The present work represents a significant advance in the field of evolutionary genomics. The focus of the paper is on ciliates, an ancient (2 billion-year old) and highly diverse eukaryotic phylum that presents many peculiarities, including sex, nuclear dimorphism, genome rearrangement, high numbers of paralogs and transposons, etc. While some data exist on a few model ciliates of disparate phylogenetic position, this work focuses on two species taxonomically placed in the same family, plus a more distant outgroup within the same class. This gives a novel dimension to this study, that goes beyond exploring genome architecture in a single clade. Instead, it allows to explore evolutionary trends in genome rearrangement among relatively closely related species. This paper should be of high interest not only for ciliate biologists (like me), but also in relation to comparative genomics of protists/eukaryotes and germ-soma biology. I highly recommend publication.

  3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #1

    Evidence, reproducibility and clarity

    Summary:

    Ciliates extensively rearrange their somatic genome every time a new somatic nucleus develops from the zygotic germline nucleus. In this manuscript, Feng et al report the sequencing, assembly and annotation of the germline and somatic genomes of Euplotes woodruffi and the germline genome of Tetmemena sp. (whose somatic genome was sequenced and assembled by the same lab in 2015). They present a comparative analysis of developmentally programmed genome rearrangements in these two species and in the model ciliate Oxytricha trifallax. Their major findings are that:

    (1) E. woodruffi and Tetmemena sp. eliminate a smaller fraction of their germline genome (~54%) from their somatic macronucleus (MAC) than O. trifallax (>80%)

    (2) Transposable elements (TE) represent a smaller fraction of the germline genome (~2%) in the first two ciliates than in O. trifallax (~15%). TEs are mainly located at the boundaries of germline chromosomes and in intergenic regions, but can also be found inside IESs

    (3) Several thousands of genes are scrambled in the germline genome of all three species

    The authors have also addressed the possible origin of gene scrambling. They report an interesting association with local paralogy and propose a model for the emergence of the odd-even pattern of gene unscrambling between two paralogous copies.

    Major comments:

    (1) Based on the statistics presented in Table 1, genome assemblies are of good quality, with a reasonable N50 size of germline (MIC) contigs. It seems, however, that no entire MIC chromosome could be assembled, since no two-telomere contig is mentioned in the list. As proposed by the authors (p.7) the presence of numerous TEs at the boundaries of MIC contigs (Fig S1) may have hindered the assembly of MIC chromosome ends. I would have appreciated to have more information on the "other repeats" (which seem to differ from tandem repeats according to Fig 2) and their location along MIC contigs.

    (2) The definition of "Internal Eliminated Sequences" (IES) is not clear. The authors make a distinction between IESs and TEs. I understand that IESs are DNA segments that separate two macronuclear-destined sequences (MDS) in the germline genome. Thus they appear to be restricted to those regions that eventually yield gene-sized MAC chromosomes. IESs are eliminated between two pointers that may not be identical on both sides in case of scrambled genes. Some clarification is needed here.

    To illustrate my point: I found the statement "with many TE insertions within IESs, suggesting that TE insertions may have generated IESs" particularly confusing (p. 9 lines 5-6). Does this mean that IESs extend beyond the ends of inserted TEs? The legend of Fig S1 should also be clarified.

    (3) p. 10 lines 2-4 and Fig S2: Could the authors explain the difference they make between MDS (in the text) and CDS (in Fig S2)? My understanding is that a CDS is the entire gene coding sequence and may be made of multiple MDSs. If this is correct, the sentence should read "We compared the number of MDSs between single-copy orthologs for single-gene MAC chromosomes across the three species and found that the orthologs have similar CDS lengths".

    (4) p. 12 lines 10-15: the discovery that paralogous MDSs can be found in scrambled genomic loci is interesting. If the two paralogs can be distinguished based on the number of substitutions, it would be informative to go back to individual reads and check whether each of the two copies can be incorporated in the unscrambled CDS (and at which frequency). Would the pointers be compatible with this?

    (5) The hypothesis that odd-even scrambled loci have evolved from paralogous genes in E. woodruffi is supported by the existence of paralogous MDSs, length conservation of MDS/IES pairs and sequence similarity between corresponding MDS and IES in a pair. The correlations presented for Oxytricha and Tetmemena are much less convincing (Fig S5D and E). I recommend that the authors are even more cautious in their statement on p.13 ("For Oxytricha and Tememena, the MDS and IES lengths for such MDS/IES pairs also correlate positively, but more moderately").

    (6) p. 15 last paragraph: Why did the authors focus only on TBEs inserted in non-scrambled IESs to look for orthologous TBE insertions? Is there a reason to believe that no recent TBE insertion occurred at other genomic loci? Or was it only for practical reasons? It is also not clear to me whether the authors have considered full-length TBEs or the presence of at least one TBE ORF.

    (7) p. 16: the authors report that some introns of E. woodruffi map "near" Oxytricha/Tetmemena pointers. How near? Based on the information provided by the authors, I don't think this observation necessarily implies that IESs were converted to introns (or reciprocally) during evolution. If this were true, shouldn't at least one intron boundary coincide exactly with a pointer? The authors should clarify this (also in the discussion, on p. 20, top paragraph).

    (8) p. 19 2nd paragraph: the suggested mechanism explaining the 5' bias of IESs in E. woodruffi genes is unclear. How could germline recombination take place between a MIC chromosome and a MAC reverse transcript or nanochromosome? This would imply that DNA could be imported in the MIC. Is there evidence that this might occur?

    (9) According to Figure 1, no scrambled genes have been reported in Paramecium tetraurelia. Within the frame of the proposed model, this is somewhat unexpected because this ciliate went through several whole genome duplications during evolution and harbors many paralogous gene pairs. Is there a reason why no gene scrambling took place in Paramecium?

    Minor comments:

    • p. 4 (4th bottom line): To my knowledge, ref #28 presents a draft (incomplete) MIC assembly of the Paramecium genome.

    • p. 7 (last paragraph): "encoding" should be replaced by "carrying"

    • p. 10 (2nd paragraph): insert a missing "o" into "nanochromosomes"

    • p. 10 (same paragraph): the weak 5' bias of IES distribution in Tetmemena should be shown (either as an additional panel in Fig 3 or in a Sup Figure.

    • p. 24 2nd paragraph: "a" is missing in "Trinity, which is a software..."

    CROSS-CONSULTATION COMMENTS

    I agree with most comments of reviewer 3.

    The authors have actually defined "TE" in the introduction (p. 6). Depending on the journal's rules for abbreviation use, it may not be necessary to define it again in the results section

    Significance

    • Ciliates are unicellular models to study developmentally programmed genome rearrangements at the mechanistic, genome-wide and evolutionary levels. These aspects have so far mostly been addressed in three species: P. tetraurelia and Tetrahymena thermophila on the one hand, the spirotrichous ciliate O. trifallax on the other.

    • One new piece of information that can be found in the present manuscript is the assembly and annotation of the germline genome of two novel species: Tetmemena sp, closely related to Oxytricha, and the more distant E. woodruffi. Feng et al establish that, similar to other ciliates, Tetmemena and Euplotes eliminate TEs and other germline-specific sequences during programmed genome rearrangements. They also undergo extensive gene unscrambling, which results in IES removal and MDS reordering to assemble coding sequences.

    • A TE origin was discussed previously for Paramecium (Arnaiz et al PLoS Genet; Sellis et al 2021 PLoS Biol) and Tetrahymena IESs (Hamilton et al 2016 eLife). While this may also hold true in spirotrichous ciliatesThe present manuscript proposes a completely new evolutionary scenario for IESs from scrambled genes. Here, Feng et al establish that scrambled genes of spirotrichous ciliates tend to be associated with local paralogy. They provide evidence supporting that IESs from scrambled genes may have evolved from paralogous MDSs.

    • Although I am more an expert in the molecular mechanisms involved in genome rearrangements, I feel that the work reported here should draw the attention of a broader audience interested in genome dynamics and evolution, beyond the specific field of spirotrichous ciliate biology.