Small RNAs from mitochondrial genome recombination sites are incorporated into T. gondii mitoribosomes

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Log in to save this article

Abstract

The mitochondrial genomes of apicomplexans comprise merely three protein-coding genes, alongside a set of thirty to forty genes encoding small RNAs (sRNAs), many of which exhibit homologies to rRNA from E. coli . The expression status and integration of these short RNAs into ribosomes remains unclear and direct evidence for active ribosomes within apicomplexan mitochondria is still lacking. In this study, we conducted small RNA sequencing on the apicomplexan Toxoplasma gondii to investigate the occurrence and function of mitochondrial sRNAs. To enhance the analysis of sRNA sequencing outcomes, we also re-sequenced the T. gondii mitochondrial genome using an improved organelle enrichment protocol and Nanopore sequencing. It has been established previously that the T. gondii genome comprises 21 sequence blocks that undergo recombination among themselves but that their order is not entirely random. The enhanced coverage of the mitochondrial genome allowed us to characterize block combinations at increased resolution. Employing this refined genome for sRNA mapping, we find that many small RNAs originated from the junction sites between protein-coding blocks and rRNA sequence blocks. Surprisingly, such block border sRNAs were incorporated into polysomes together with canonical rRNA fragments and mRNAs. In conclusion, apicomplexan ribosomes are active within polysomes and are indeed assembled through the integration of sRNAs, including previously undetected sRNAs with merged mRNA-rRNA sequences. Our findings lead to the hypothesis that T. gondii’s block-based genome organization enables the dual utilization of mitochondrial sequences as both messenger RNAs and ribosomal RNAs, potentially establishing a link between the regulation of rRNA and mRNA expression.

Article activity feed

  1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

    Learn more at Review Commons


    Reply to the reviewers

    The authors do not wish to provide a response at this time.

  2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #3

    Evidence, reproducibility and clarity

    Summary

    In their manuscript, Tetzlaff et al. report a substantially improved protocol for the isolation of mitochondria from the parasitic apicomplexan Toxoplasma gondii, which allowed improved sequencing and in-depth analyses of the organism's peculiarly complex mitochondrial genome. Follow-up small RNA-sequencing made it then possible to confirm the expression of fragmented mitochondrial ribosomal RNAs (mt-rRNAs) and to identify a dozen new RNA species of unknown function. The authors document not only multiple Toxoplasma mitochondrial genes that overlap one another-including rRNA and protein-coding genes, otherwise a rare occurrence-but also show that some fragmented rRNA genes recombine, effectively leading to multifunctional sequence segments, another rare feature and consequence of the peculiar architecture of the organism's mitochondrial genome. Lastly, the authors confirm that products of three genes presumed to encode pieces of the highly fragmented mitochondrial large subunit (mtLSU) rRNA do indeed assemble-presumably with additional components-into large molecular-weight complex(es).

    Major comments

    Key conclusions of the manuscript are that Toxoplasma's mitogenome encodes overlapping rRNA and protein-coding genes, divergent and chimeric rRNA pieces, and several small RNAs (sRNAs) of unknown function. Provided evidence is very solid for certain aspects of the study, but objectionable for the others as detailed below.

    1. The extent of the presented analysis of rRNAs and unassigned sRNAs seems lacking. In several places of the manuscript, the authors wonder about potential implications of divergent rRNA sequences, but their analyses appear to have been limited to sequence similarity searches. Had modelling of secondary structure interactions been attempted, this conundrum could potentially be solved. Importantly, similarity searches (to conventional rRNAs) were performed using BLASTN, which is a rather crude tool for the purpose, instead of covariance models/HMMs. It is therefore not entirely surprising that some sRNAs remained unassigned. Admittedly, recognizing rRNA motifs in divergent RNAs is a challenging issue. However, it is important to not conflate similarity to conventional rRNA and the molecule's functionality as an rRNA, i.e., sequence divergence does not necessarily disqualify the unassigned sRNAs as potential rRNAs. Mitochondrial rRNA sequences are among the most divergent, often constrained only by base-pairing, if at all, as has shown the research on kinetoplastid and diplonemid mt-rRNAs, which contain very few conserved elements and very few base pairs (e.g., Ramrath,2018,Science & Valach,2023,NAR). Even in generally less divergent cases such as green algae, the fragment encoding a highly divergent and derived 5S-like rRNA has only been recognized as such only after the mitoribosome structures were determined (Waltz,2021,Nature Comm & Tobiasson,2022,Nature Comm). It would not be surprising if the same was the case for Toxoplasma's fairly quickly evolving mitochondrial genome.
    2. The discovery of overlapping protein-coding and rRNA genes is intriguing, but the authors do not explain why it should be considered as fundamentally groundbreaking as the 'Abstract' and 'Discussion' make it sound. Gene overlaps are found in mitochondria of many organisms (e.g., fungi, animals, various protists), especially of tRNA and protein-coding genes. Even in Plasmodium, a rather close relative of Toxoplasma studied in the presented work, LSUB (rRNA) gene overlaps cob (protein) gene in the antisense orientation. Admittedly, the extent of the overlaps in Toxoplasma does seem fairly high at a first glance, but it is necessary to provide more data and, importantly, broader context to make the case that Toxoplasma overlaps are in any way special. For instance, what is the average size of the overlaps? What is their cumulative size? How does their extent (i.e., the size of overlapping coding sequences compared to the total length of coding sequences) compare to gene overlaps in other (mitochondrial) genomes? Certain additional aspects of the analysis and interpretation of protein- and/or rRNA-coding sequence overlaps are somewhat underdeveloped. For example, are the RNA-coding regions that overlap protein-coding sequences more divergent in those three conserved proteins compared to other organisms, i.e., does their function as rRNA take precedence, or is the converse the truth, i.e., are the rRNA sections more divergent? RNA19 (overlapping coxIII and cob) is the only example discussed in depth, but at least a short sentence summarizing the overall picture would be useful. As for the authors' interpretations, proposed formation of sRNA:mRNA hybrids, through which sRNAs could by implicated in facilitating mRNA recognition by the mitoribosome, is an interesting hypothesis, but a simpler scenario, which is given very little space, is that the genes happen to overlap by chance and that the overlaps are merely a consequence of genome compaction (a phenomenon that the authors rightly highlight). Without a comprehensive analysis, it is impossible to conclude which possibility is more likely. For instance, if both protein-coding and non-protein-coding sequences are divergent, this would indicate that there are few evolutionary constraints and so the fact that these sequences overlap means very little and might be just due to neutral drift, an effect of genome compaction without much consequence for the organism. Lastly, considerable significance is attributed in the study to the presence of antisense overlaps, especially between rRNA- (or sRNA-) and protein-coding genes. Yet, the overall extent of sense and antisense overlaps in the Toxoplasma mitogenome is quite similar, which-again-seems to point to a neutral evolutionary process. Can the authors elaborate if this aspect of the genome architecture was taken into account and if they regard it as of lesser relevance (and why, if so)?
    3. Another controversial issue concerns prevalent sequence block combinations and their impact on mitochondrial gene expression regulation. The authors postulate that 5′-terminal blocks of protein-coding genes always occurring near other protein-coding blocks has some functional significance. However, concluding this from just two cases (even if out of two) is quite speculative and seems like reading too much into a pattern that could very well be due to chance alone. The authors argue that the fact that 5′ ends of coxI & coxIII genes overlap is another indication of potential gene expression coordination. While it is possible to envisage such a regulation because of the 5′ termini proximity, the overlap between these genes means that their connection is hardwired into the genome, making it difficult to compare this particular case to the other sequence blocks. Arguably, it is tempting to speculate that an evolutionary pressure exists to coordinate protein expression and such a coordination does not indeed seem implausible, but the presented data and arguments are not convincing. The authors should at least expand on their ideas in the 'Discussion' and indicate potential experiments and/or which additional data could support (or refute) their speculation.
    4. My last major point concerns the experimental examination of large-molecular weight complexes and the interpretation of its results. To prove incorporation of the sRNAs into the mitoribosome, i.e., confirm that they do indeed represent rRNAs, the authors opted to investigate their distribution across a sucrose velocity gradient. This is a relatively simple and powerful approach and although it does not provide an irrevocable proof, it can be used to gain very useful insights. However, the presented design has critical flaws: 1) all sRNAs selected for Northern blot were mtLSU components, so only the mtLSU would be detected; 2) a single cytosolic LSU component was used as the control, so the distribution of cyto-SSU subunit, cyto-ribosome, and cyto-polysomes is actually unclear; 3) the authors' interpretation relies on the assumption that both mitochondrial and cytosolic ribosomes preserve their association as polysomes, but no relevant control is provided for this. For example, in Figure 6, fractions 6-14 clearly contain cyto-LSU, but polysomes (e.g., disomes) might just as well start in fractions 12-14; without additional controls, or at least continuous monitoring of UV absorbance across the gradient (to show a typical polysomal pattern), it is not guaranteed that what was detected actually included cyto-polysomes. The main concern, however, is the migration of mitoribosomes. First, the authors presume that the fractions 7-8 contain the mitochondrial monosomes because they are the fractions closest to the gradient top. This is not guaranteed. In fact, based on the experience of our and our colleagues' labs and taking into consideration the conditions used for the described experiment (more precisely, the use of Triton and deoxycholate, which in many organisms lead to mitoribosome subunit dissociation), it seems quite likely that fractions 7-9 actually contain separated mtLSU, not monosomes. Fractions in higher sucrose concentration would then represent monosomes and possibly assembly intermediates, though perhaps also a minor polysomal fraction (if the interactions are preserved in the conditions used). In particular, if the assembly process in Apicomplexa is as complex as in Euglenozoa (e.g., see papers on kinetoplastid mitoribosomes Saurer,2019,Science & Tobiasson,2021,EMBO Journal), which does not seem unlikely in Toxoplasma given the necessity to incorporate ~15 distinct rRNA pieces per mitoribosomal subunit, then the assembly intermediates might form ribonucleoprotein complexes that migrate quite far into a sucrose gradient (e.g., as in kinetoplastid mtSSU, Maslov,2007,Mol Biol Parasit). Thus, while it can be reasonably well argued that the detected RNAs co-migrate with the mtLSU (and possibly mito-monosome), the claim that they associate with mito-polysomes is open to question. More critically, investigating only sRNAs that are clearly identifiable as rRNA pieces-and all from the mtLSU at that-does not automatically prove that all sRNAs associate with the mitoribosome. To argue that the unassigned sRNAs are associated with mitoribosomes, northern blots of as many as possible (but at the very least one) unassigned sRNAs are absolutely necessary. However, I encourage the authors to consider performing additional experiments to address the issues raised in the preceding paragraph: for example, a western blot of mitochondrial ribosomal protein(s), a northern blot with at least one mtSSU rRNA fragment (since all three shown are from mtLSU), as well as a test that would examine the influence of detergents on mitoribosome stability (e.g., use milder detergents such as digitonin or dodecylmaltoside). Furthermore, if experimental conditions are identified allowing subunit dissociation, it would be possible to discern to which subunit which sRNA belongs and, importantly, whether the unassigned sRNAs are just "disguised" rRNAs (simplest explanation) or something completely different (speculative explanation seemingly favoured by the authors). All this would substantially boost the significance of the presented work.

    Minor comments

    General comments

    The word "novel" is rather overused in the manuscript. At several places, it is inappropriate, as the presented results are not as unprecedented as the manuscript makes them sound; at other places, it might be acceptable, but as the word's meaning is vague, the text would benefit from using more informative term(s) instead. The former case is exemplified by the sentence at the lane 102 "Here, we present a novel method for enriching organellar nucleic acids" - "novel" does not simply mean "new", but alludes to "unprecedented"; yet, the devised method, albeit clever, is a modification of existing approaches. The sentence at the lane 182 illustrates the latter case where "novel blocks" are mentioned, but "previously not detected blocks" would be more appropriate and to the point. The labelling of 5′ and 3′ is inconsistent throughout the manuscript - sometimes the prime is used, sometimes the apostrophe, sometimes it is the single quotation mark.

    Abstract

    In light of the raised concerns, the authors should consider carefully rewording this section, as some of the formulations are mis-representing the data and lead to unjustified generalizations.

    Introduction

    lanes 72-73: "How rRNA fragments are assembled into functional ribosomes remains an enigma." - Without proper context, this statement feels like an exaggeration. Fragmented rRNAs are known from other species and their mitoribosome structures were determined in the past few years (i.e., Tetrahymena, Polytomella, Chlamydomonas). Arguably, these mt-rRNAs are not as fragmented as in Toxoplasma, but at the very least, it is clear that base-pairing of rRNA pieces and RNA-binding proteins play significant roles in the process. If the authors think that this is not the case in apicomplexans, this should be at least alluded to, if not explained. l. 80-83: The paragraph mixes information on Plasmodium and Toxoplasma. To a non-initiated reader, this can be quite confusing. It would be useful to specify which species the authors refer to. l. 83-86: The information on the atovaquone impact lacks reference(s). l. 105: "demonstrated that they are incorporated into polysomes" - In light of the issues raised above and if the authors opt not to expand the work as suggested above, this claim (and similar throughout the text) should be emended. l. 106-108: "allowed us to identify novel transcripts, many of which originate from block boundaries and contain mixed origins from coding and noncoding regions." - This sentence would benefit from rephrasing because it is difficult to comprehend (the sequences overlap protein-coding and non-protein-coding regions, but do not contain any origins).

    Results

    l. 115-117: "cell fractionation method that takes advantage of the differential cholesterol content in plasma membranes" - Does Toxoplasma contain cholesterol? Perhaps it might be more practical to refer to sterols (since the effect of digitonin is not limited to cholesterol). l. 147: "significant increase" - It might be useful to specify that the increase was ~42-fold, so that readers can see the extent of improvement; it has the advantage of really highlighting the achievement. l. 180: "have been lettered from A-V" - Rewording to "designated by letters from A to V" works better. l. 213-218: This section is essentially a discussion so should be moved the corresponding section of the manuscript. l. 262-265: cotranscripts/transcript isoforms - It is a matter of nomenclature, but it seems more appropriate to refer to "a transcript containing LSUF and LSUG regions" instead of a co-transcript, because in the latter case, one then expects that these two will be separated in a following processing step, which-as the authors demonstrate-is clearly not the case for the vast majority of the population of these rRNA pieces. Given the prevalence of the larger pieces, it seems more appropriate to refer to the "smaller transcript isoforms" as possible degradation products and not isoforms, which implies some kind of functional relevance. l. 281: In the section "Discovery of novel rRNA fragments", it might be useful to provide a graphical representation or at least a sentence summarizing all different categories of sRNAs. For instance, what is missing from the text is that there are 11 species for which homologous sequences in "conventional" rRNAs were not identified and out of these only 4 seem to have sequence homologs in other Apicomplexa. In addition, in Table S5, the authors could indicate where these homologs are located in Plasmodium, since these appear to be newly identified candidates for Plasmodium sRNA species/rRNA pieces. l. 313-314: "In general, block combinations lead to the expression of novel RNAs in T. gondii that are not found in apicomplexan species with a simpler genome organization. " - It is not clear where this generalization comes from: Fig.S5A shows that RNA5, RNA7, RNA23t extend across block borders (but based on Table S5 are not unique to Toxoplasma), while only RNA31 and RNA34 are both absent from other Apicomplexa and extend across block borders - yet, this is still less than half of all newly identified sRNAs. In addition, the novelty claim is not clear either: based on the presented data, several sRNAs that overlap are clearly present in other apicomplexans (e.g., RNA1 and RNA2) and thus are not completely new, but merely more divergent in Toxoplasma, because parts of their sequence have been replaced by the shared sequence segment. l. 319-320: "None of the three RNAs had detectable homologies to rRNA." - Specify to which rRNAs were the sequences compared to make the inference. l. 320-321: "For all five coding-noncoding RNAs, homologs are present in the mitochondrial genome of P. falciparum." - Does this mean that they remain unassigned in Plasmodium as well or that they have not been previously recognized in Plasmodium? Confusingly, RNA34 is labeled as not having homologs in Apicomplexa in Table S5. In addition, mentioning "coding-noncoding RNAs" is somewhat misleading because some of the sRNAs clearly code for mt-rRNA pieces. l. 335-338: This section contains contradictory statements that should be reformulated. A couple of sentences prior, the authors experimentally determined that RNA19 actually overlaps only a single protein-coding sequence (coxI), but then refer to the original and demonstrably incorrect annotation of RNA19 overlapping also the cob gene. l. 341: The authors mention similarity to rRNA, but do not specify which rRNA. Referring to similarity to known or conserved rRNA sequences or segments would work better. Still, the region of the block S (i.e., 5′ proximal segment of RNA19) falls into the region between helices H51 and H60 of the domain III in the LSU secondary structure, which is sequence-wise relatively poorly conserved-especially in mitochondrial rRNAs-so sequence divergence is not unexpected. l. 366: "Note that RNA1 and RNA2 are registered according to their shared sequence" - Unclear what "registered" means here. l. 416-421: Specifying when reference is made to cytosolic vs. mitochondrial monosomes and polysomes would make this section and the related parts of the 'Discussion' clearer. Also, the authors clearly state here that there might be technical reasons for what they observed, but ignore this possibility in the 'Discussion' and assume that they did indeed separate polysomes.

    Discussion

    l. 444: "the reshuffling appears limited to specific block borders and is not random" - How many biological replicates of nanopore sequencing were performed? Did the authors test other T. gondii strains? What about other apicomplexan species? Unless this has been done, there is no demonstration that the block order and block-joining frequencies documented here are (more or less) constant and that block order is under some kind of purifying selection. Hence, the conclusion that the block borders are not random is debatable. Arguably, it is not random in this particular experiment, but neither is it limited to specific blocks because most combinations have been detected (even if at low frequency; Figure S1). l. 450: "One intriguing finding is the obligate linkage of coding sequences" - Presuming this sentence is about protein-coding sequences, this should be reformulated because it mis-represents the actual data. Figure 2 clearly shows that protein-coding blocks are often linked to rRNA-coding blocks. l. 454: "balancing the expression of coxI and coxIII" - Not clear where this information comes from, as it is not from the cited papers. l. 460-461: "Our small RNA sequencing results revealed another potential advantage of the block organization of the T. gondii mitochondrial genome" - This should be reformulated. Clearly, the discovery of the 15 sRNAs was facilitated by the recognition of block order, but the presented argument is a bit confusing: how does the organization into blocks provide an "advantage" and what kind of advantage do the authors mean? (An evolutionary advantage or an advantage related to gene expression regulation or an advantage for their sRNA-Seq data mapping?) l. 462-478: Multiple explanations are provided for the existence of sRNAs at block borders and what these sRNAs represent. While I agree that it is important to consider all options, even the more debatable ones, the authors seem to forget the simplest possibility: the identified unassigned sRNAs could well be rRNA pieces and them being encoded across block borders is not any more, nor any less surprising than the fact that protein-coding genes are encoded across (several) gene blocks. l. 485: "antisense RNA surveillance" - In contrast to the nuclei, the existence of a genuine antisense RNA "surveillance" mechanism in mitochondria is uncertain. Given what is known from mitochondria of other organisms (especially plants and kinetoplastids), it seems more likely that certain regions of sense and antisense transcripts are protected from exonucleases by RNA-binding proteins (RBPs such as PPR and related helix-turn-helix repeat proteins, e.g., Toxoplasma's homologs HPRs discovered in Plasmodium [Hillebrand,2018,NAR]), leading to RNAs that partially overlap, but are actually protected from base-pairing by these RBPs. This is not taken into account in any presented explanation of the phenomenon of antisense gene overlaps. l. 490: "start codon. while also " - Typo: should be a comma, not a dot. l. 500: "discovery of block-border sRNAs highlights the complex regulatory mechanisms at play" - This should be reformulated: the claim is very speculative, since no hard data are provided on such regulatory mechanisms in the presented work. l. 504: "sRNAs are incorporated into polysome-size structures" - In light of the concerns raised in the preceding section, this should be reformulated. l. 539-540: The closing sentence should be reformulated. The mitogenome organization in blocks per se does not "allow" the sequences to function as both mRNA and rRNA. Rather, it seems to be a combination of 1) the compactness of the genome that seems to lead to the re-use of certain segments in both mRNA and rRNA or in two distinct rRNAs, and 2) the apparently dynamic nature of the genome (due to recombination among gene blocks) that brings together certain combinations of gene blocks.

    Methods

    l. 607: Only agarose gel separation is mentioned, but most experiments shown are of denaturing PAGE separations (which is actually mentioned in several figure legends). l. 636: "Paste your materials and methods section here." - To be removed. l. 662: "NUMTS" - This should be "NUMTs"; the same typo occurs at multiple places in the 'Methods' section. l. 704: "Homology search for novel transcript annotation" - Somewhat confusing title; it is possible to guess what the authors likely mean, but it is unclear. l. 715: "New block annotations can be found in GenBank." - 1) The whole community would very likely appreciate if the GenBank entries were properly annotated (i.e., genes added), not just showed sequences as is currently the case for all Namasivayam,2021,Genome Res entries (not sure about the authors' own entries because they were inaccessible). If impossible to update the entries of the Namasivayam,2021,Genome Res study, then just submitting anew properly annotated GenBank entries would be appropriate. 2) It was not possible to properly assess some of the claims in the manuscript because access to the files was not provided to reviewers, nor have been the newly submitted GenBank entries made public by the authors.

    Figures

    Figure 1B - The load of total proteins into each well is unclear. Ponceau stain does not show identical loads, so it is unclear what the reader should take as the reference. Figure 1D -The phrasing "fragments found in the pellet fractions of the protocol" is a bit awkward. The fragments are in the pellet fractions after plasma membrane permeabilization and benzonase incubation, not in the "fractions of the protocol". Figure 2 - The chosen hues of red and green (for coxI and coxIII) are of such similar intensity that they are virtually indistinguishable to ~2% of the readers. A colourblind-friendly palette would be very much appreciated. For guidelines, see for example: https://www.nature.com/articles/nmeth.1618 . Figure 3 - The use of lowercase letters to indicate the probes (instead of the full probe names) is a nice idea and simplifies the reading experience, but the use of the same letter 'a' in different figures for different probes is confusing. Labeling each probe with a unique ID/letter and indicating this ID in the Table S6 (e.g., by adding an additional column) would work much better. Figure 4A - The wiggle lines for rRNAs are coloured in purple shades, which contrast with the grey colour that is assigned to them in the Figure 2. Keeping a consistent colour palette across figures would be preferable. Figure 4C - If the E.coli sequence was on the outer lines, the Toxoplasma sequences could be closer to one another, which would make it easier for the reader to understand the alignment. Figure 5 - Purple shades for rRNA are somewhat difficult to discern from the blue cob. Also, the 'reference' wiggles would work better if demarcated as a key because this would make it visually clearer that they are shared by the A and B panels.

    Supplementary Information

    Figure S1 - An explanation what the A and B panels show is missing. Figure S5 - It is difficult to appreciate the extent of overlaps with protein-coding sequences if these are missing from the figure (unlike in Fig.5). Table S4 - Nuclear genome accession number is missing. Add "mitochondrial" to the label of the column "sequence blocks". Table S5 - 1) It is unclear what the 'rRNA homology' refers to. (It does not seem to be the nomenclature used by Feagin et al.,2012, PLoS One.) 2) An extension of the table (or perhaps a separate table) with the cumulative size of mtLSU and mtSSU rRNA pieces, as well as unassigned sRNAs, would be useful. 3) It should also be stated somewhere if homologs of any of the rRNA pieces known from Plasmodium are missing in Toxoplasma. (If so, they could be among the newly identified short RNAs.)

    Referees cross-commenting

    Referee #2 rightly pointed out that basic statistics on nanopore reads, as well as omitted methodological details (e.g., minimap2 and SAMtools settings) would be welcome. Similarly, Figure 2 should indicate the upstream/downstream block orientation. If the authors intend to position their work as a major achievement in mitochondrial enrichment for Toxoplasma (as the text currently indicates), I also agree that a comparison with previously published protocols would not be out of place.

    Significance

    Speaking from personal experience, devising a protocol for such a substantial mitochondrial enrichment, as the study presents, is a great technical achievement, which cannot be understated, especially for a protist or any somewhat unconventional model organism. The mitoribosomal community will certainly take notice of the improved catalogue of mitochondrial rRNA pieces, while the discovery of overlapping protein-coding and rRNA genes will be of interest to those working in the field of mitochondrial evolutionary biology. The study already provides a significant upgrade from the previous attempts to understand the nature of the mitochondrial genome in Toxoplasma (and in Apicomplexa in general), and is well positioned to become a source of inspiration for future studies in the field. However, being at a crossroad of genomics, evolution, and molecular biology, it has certain limitations in its current form, mainly because the evolutionary and molecular biology aspects would benefit from further development (see 'Major concerns'). The text is generally well written and accompanying figures well designed, but clarifications, broader context, and less speculative interpretation would be welcome (as detailed mostly in 'Minor concerns'). To justify publication in a journal with a broad readership, the authors should provide additional experimental evidence to strengthen their case and generalize their findings.

  3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #2

    Evidence, reproducibility and clarity

    In this article, the authors delve into an intriguing topic, aiming to enhance our understanding of the organization of the mitochondrial genome of T. gondii, a parasite of significant importance in both human and animal health contexts.

    In essence, their approach involves enriching mitochondrial material, followed by genome sequencing and the analysis of mitochondrial short RNAs. They achieve a remarkable depth of mitochondrial sequencing and generate valuable RNA data. Furthermore, their efforts lead to the discovery and annotation of new short RNAs.

    Overall, the article is well-crafted and presents compelling results. However, it's worth noting that, at times, the authors appear somewhat self-congratulatory, and certain results might be perceived as overly ambitious. Nevertheless, the discussion is aptly constructed.

    Major comment:

    They assert certain discoveries that had already been reported. Notably, they adapt an existing protocol for mitochondrial enrichment and describe it as 'We developed a protocol to enrich T. gondii mitochondria.' It's worth noting that they neither reference a more recently described protocol (PMC6851545) nor compare the performance of their modified protocol with the original.

    The protocol they employ does not seem to yield exceptionally high success rates, as mitochondrial DNA constitutes less than 10% of the total sequenced DNA.

    Additionally, they frequently mention the identification of specific combinations of sequence blocks previously identified by Namasivayam et al. (PMC8092004), which was also discussed in Namasivayam et al. 2021."

    Missing in the supplementary material are basic details on the sequences performed. Distribution of mitochondrial reads length, depth, etc.

    Further clarification is needed for Figure 2. Specifically, the frequency units or combinations of frequency (A, B, and C) are not clearly explained. While the matrix's asymmetry suggests a 5'- 3' orientation difference, this orientation difference is not explicitly specified (B). Additionally, the fragment Mp does not appear in the block combination figure (C).

    Some points to improve the introduction:

    Provide an evolutionary context for the following phrase: 'An idiosyncratic feature of Apicomplexa is a highly derived mitochondrial genome.' Specify what you intend to emphasize.

    Line56: The sentence must begin with a capital letter

    In line 58 "Nuclear genes encoding proteins with functions in mitochondria contribute strongly to P. falciparum and T. gondii cell fitness" Although it is mentioned later, it would be more effective to introduce the fact that all but three genes are encoded in the nucleus.

    Line68: "Apicomplexan mitogenomes usually code only for three proteins" It seems to me that 'usually' should not be included.

    Line 65-67: The sentence should include that the mitochondrial genome is composed of a total of 20 blocks of repeating sequences organized in multiple DNA molecules of varying length and non-random combinations

    At the end of the introduction, the authors state that they have developed a protocol for mitochondrial enrichment. The text should be modified taking into account that: 1- The new protocol is an adaptation of another existing protocol. In fact, the Methods the authors say the protocol was "slightly" modified. 2- There is already existing mitochondrial enrichment protocol available [Reference: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6851545/#mmi14357-bib-0074]. In any case, they should consider performing a comparative analysis between the proposed protocol and existing ones to determine its relative effectiveness. It should be noted that the proposed protocol enriches in organelles (including the nucleus and apicoplast), but when sequencing DNA, mitochondrial DNA accounts for only 5% of the total reads, which may raise doubts about its overall efficacy.

    Some points related to Results section:

    Lines 113-115: 'To distinguish between NUMTs (nuclear DNA sequences that originated from mitochondria) and true mitochondrial sequences, it is necessary to enrich mitochondrial DNA.' I disagree with this sentence. NUMTs, in general, consist of very short sequences. With long reads, it is relatively straightforward to differentiate mitochondrial sequences from those nuclear sequences that have small mitochondrial fractions. In my opinion, even many Illumina reads can be confidently identified as belonging solely to the mitochondria. I found this article that supports this argument, indicating that the majority of NUMTs are less than 100 nucleotides in length [Reference: https://pubmed.ncbi.nlm.nih.gov/37293002/].

    Lines 166-168: 'A previous sequencing study used Oxford Nanopore sequencing technology (ONT) to identify combinations of sequence blocks in T. gondii mitochondria (Namasivayam et al. 2021).' However, it's important to note that Namasivayam's group did not merely use ONT to identify combinations of blocks; rather, they discovered, identified, and defined these combinations based on sequencing with long reads.

    Line 177: "The length of mitochondrial reads ranged from 87 nt to 17,424 nt" It would be beneficial to include a histogram depicting the length distribution of the obtained reads. It's worth noting that nanopore reads tend to be shorter than Illumina reads

    Line 194-195 "we found that only a small fraction of all possible block combinations are prevalent within the genome" this has been previously described (PMC8092004)

    Line 201. "This indicates that the genome's flexibility is limited and that not all block combinations are realized". This is consistent with the findings published by Namasivayam et al. in 2021, which have already established that the combination of the 21 blocks is non-random.

    Line 205: "All combinations are well covered in our ONT results and helped to refine block borders relative to previous annotations (Fig. S2)" In the supplementary materials the authors say: "However, the blocks Fp, Kp, and Mp frequently occur separately in the mitochondrial genome We therefore treated Fp, Kp and Mp as separate blocks and have shortened the blocks F, K and M accordingly". As far as I understand, for this very reason, Namasivayam and collaborators annotate them as partial fragments, which may appear in other regions but are, in turn, parts of larger F, K, and M fragments. To redefine the segments F, K, and M without the sequences corresponding to Fp, Kp, and Mp, as shown in Figure S2, these fragments should be distinct from the 'partials.' In other words, segments of the type (F minus Fp), (K minus Kp), and (M minus Mp) should appear in the reads, and should be distinguishable from Fp, Kp, and Mp. If this distinction is made, I am satisfied with the new definition.However, if such a separation is not evident, it seems important to clarify it in the text or to reconsider this new definition.

    Lines 221-223: "This suggests that there is no need to postulate mechanisms of genomic or posttranscriptional block shuffling to arrive at full-length open reading frames." The authors argue that invoking mechanisms of genomic or post-transcriptional block shuffling is unnecessary to explain the presence of full-length open reading frames, given that genes represent 2-3% of mitochondrial sequences. However, there is a missing estimate regarding the probability of encountering all three genes within a single molecule or mitochondrial genome, as well as the total number of sequenced mitochondria. Consequently, the statement appears overly assertive. In the absence of alternative mechanisms for generating complete genes, this would mean that at most only 1646 mitochondrial genomes would have been sequenced. To comprehensively address this issue, the authors should consider discussing this scenario further. They should also provide information about how many reads they found containing all three genes and how many contained two of the genes.

    Lines 249-250 "using the block combinations identified here by ONT sequencing " which is the difference between blocks identified here with those on Namasivayam ? The division of M, K and F fragments?

    Line 287: "The six remaining small RNA fragments are specific to T. gondii" I would suggest being more cautious in this sentence by stating that they were not found in other organisms. Given the similarity of the mitochondrial genome between T. gondii, N. caninum, and other coccidians, it would be expected to find them in these organisms as well.

    Line 300 "Among the novel small RNAs identified, there is also a class that was only detectable due to our insights into genome block combinations." A valid strategy is to map the small RNAs to the generated nanopore reads or to an assembly made with these reads, rather than solely relying on the single blocks or combinations of blocks, as this approach would yield the same result.

    Line 444: "Upon closer scrutiny, however, the reshuffling appears limited to specific block borders and is not random" This was already established by Namasivayam et al 2021.

    I would like to highlight the potential for a more comprehensive examination of the mitochondrial genome in the discussion. While the proposed explanations for the presence of sRNAs at the 'block borders' appear plausible, it's worth noting that the definition of these blocks is artificial rather than biological. I think it is interesting to discuss without the concept of block sequences, but of sequences existing in the mitochondrial genome. Therefore, it's important to discuss whether these sequences (the block borders) are consistently present in all mitochondrial genomes. The total cumulative length of the blocks is 5.9 Kb, which is relatively small and comparable to one of the smallest mitochondrial genomes on record. It is conceivable that recombination and the generation of new sequences play a role in expanding genomic space for encoding, such as RNAs.

    Line 535-536 "We developed a protocol to enrich T. gondii mitochondria and used Nanopore sequencing to comprehensively map the genome with its repeated sequence blocks." I find this sentence to be somewhat assertive, especially considering that they modified an existing protocol and obtained results that may not be optimal. Additionally, they have not compared their protocol with other available methods for mitochondrial enrichment.

    Some points related to Method section: In none of the experiments is it specified how many parasites were initially used as a starting point

    "Masking NUMTs in the T. gondii nuclear genome" it's unclear whether the authors utilize all hits or filter the results of BLASTN. It would be helpful if they specify the criteria for filtering, such as identity percentage or query coverage. Additionally, it's not clear how they generate the GFF3 file from the BLAST results, or whether they instead create a BED file. Providing clarification on this process would enhance the reproducibility of their methods. Moreover, it would be beneficial if the authors include information regarding the number of sequences they intend to mask, the average length of the NUMTs, and the total percentage of the genome these masked sequences represent.

    Line 657 "Mapping results were filtered using SAMtools"
    The text does not specify the filtering criteria or the parameters used for this process.

    Line 673 establish "No matching reads were found" in the "Sequence comparisons of ONT reads found here with published ONT reads for the T. gondii mitochondrial genome" but in the results the authors say: "While smaller reads of our dataset are found in full within longer reads in the published datasets, we do not find any examples for reads that would be full matches between the dataset. Could you provide a more detailed explanation? Specifically, I would like to know how many reads from the dataset (including their length) are also present in other datasets, and at what minimum length do they cease to coincide?

    689 - The text does not specify the filtering criteria or the parameters used for Samtools filtering process.

    Lines 689-693 Please describe better the methodology used.

    Line 696: the program is fastp not fastq (Chen et al. 2018)

    Line 697: what do you mean only the ends of the reads were mapped? how many bases? Or do they mean that they map the reads fowrards and reverse reads?

    Significance

    In this article, the authors delve into an intriguing topic, aiming to enhance our understanding of the organization of the mitochondrial genome of T. gondii, a parasite of significant importance in both human and animal health contexts.

    In essence, their approach involves enriching mitochondrial material, followed by genome sequencing and the analysis of mitochondrial short RNAs. They achieve a remarkable depth of mitochondrial sequencing and generate valuable RNA data. Furthermore, their efforts lead to the discovery and annotation of new short RNAs.

    Overall, the article is well-crafted and presents compelling results. However, it's worth noting that, at times, the authors appear somewhat self-congratulatory, and certain results might be perceived as overly ambitious. Nevertheless, the discussion is aptly constructed.

  4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #1

    Evidence, reproducibility and clarity

    Summary:

    Mitochondrial genomes of Apicomplexa parasites have undergone dramatic reductions during their evolution with genes for only three proteins remaining. In addition, ribosomal RNA genes are present in different, often species-specific gene arrangements. Toxoplasma exhibits massive variations in gene arrangement that are distributed over multiple copies. In this study the Schmitz-Linneweber lab not only re-analysed the mitochondrial genome of Toxoplasma gondii using a novel protocol for enriching the organellar nucleic acid, allowing to sequence the mitochondrial genome at unprecedented depth, they also addressed an enigma regarding the expression status of mitochondrial ribosomes. While indirect evidence of mitochondrial translation exists, no direct evidence for active mitoribosomes exist and their composition is still poorly understood. Here, using HTS or small RNAs the authors demonstrate that they are incorporated into polysomes. Furthermore, the authors developed the hypothesis that the block-based genome organization enables the dual utilization of mitochondrial sequences as both messenger RNAs and ribosomal RNAs.

    Own opinion/Major comments

    The mitochondria of the Apicomplexa are characterized by massive gene transfer into the cell nucleus, and sequence rearrangements, which has led to a single, questioned genome reorganization. The underlying mechanisms of gene transcription and translation are also poorly understood. In a previous study, the Kissinger lab demonstrate the unique organization of the mitochondrial genome that consists of minimally of 21 sequence blocks (SBs) totaling 5.9 kb that exist as nonrandom concatemers (Namasivayam et al. 2021). In this study the authors optimized a new isolation technique of organellar content to sequence the mitochondrial genome. This new purification protocol appears to be very robust and allowed the sequencing of mitochondrial genome at unprecedented depth. The obtained data not only validate previous studies, but they also suggest several new features, such as (potentially) continuous reshuffling of DNA blocks, leading to independent block combinations. The most important aspect of this study is the demonstration of polysomes and the presence of rRNAs within these complexes, taking previous studies (i.e. Lacombe et al., 2019) a step further. Taking all these efforts and data into account it is a very nice and interesting study that will certainly be of interest for a broader readership. All the presented data and analysis appear to be solid and well controlled. However, it must be mentioned that this reviewer is not an expert when it comes to the analysis and comparison of huge genomic datasets and the opinion of a bioinformatician would be helpful in assessing this study in more detail. All other data (organellar purification and analysis of polysomes) appear state of the art and no corrections are required.

    Referees cross-commenting

    I agree with reviewer 2 and 3. Some additional details on techniques and the enrichment should be added.

    Significance

    General assessment:

    Taking all these efforts and data into account it is a very nice and interesting study that will certainly be of interest for a broader readership. All the presented data and analysis appear to be solid and well controlled. However, it must be mentioned that this reviewer is not an expert when it comes to the analysis and comparison of huge genomic datasets and the opinion of a bioinformatician would be helpful in assessing this study in more detail. All other data (organellar purification and analysis of polysomes) appear state of the art and no corrections are required.

    Advance:

    The study fills an important gap in our knowledge regarding the organization and translational activity of the apicomplexan (Toxoplasma) mitoribosome. See also comments above.

    Audience: Cell Biology, Parasitology, Mitochondria