Nuclear genome profiling of two Mexican orchids of the genus Epidendrum
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Review Commons)
Abstract
Characterizing genomic properties such as genome size, ploidy level, heterozygosity, and repetitive DNA proportion and composition without relying on genome assembly is crucial for profiling the genomes of non-model species. Little is known about the nuclear genome of the large neotropical orchid genus Epidendrum . This study compares genome profiles of Epidendrum anisatum and E. marmoratum, using flow cytometry and k-mer analysis approaches, as well as bioinformatics ploidy level estimation and repeatome characterization. Multiple depths of coverage, k values, and k-mer-based tools for genome size estimation were explored. A 2.3-fold genome size difference was found between the two species, ranging from 2.35 to 2.80 Gb (x̂ = 2.58 Gb) for E. anisatum and from 0.99 to 1.24 Gb (x̂ = 1.11 Gb) for E. marmoratum . Both species were identified as diploid with no evidence of strict partial endoreplication. The most important aspects to be taken into account to improve genome size estimation were heterozygosity, depth of coverage, and the maximum k-mer coverage. The genomes of both species were found to be highly repetitive (63–73%) and heavily dominated by Ty3-gypsy retrotransposons, particularly those of the Ogre family. Additionally, the genome of E. anisatum was characterized by the presence of a 172 bp satellite ( AniS1 ), which represented 11% of the genome size. Together, both Ty3-gypsy transposons and AniS1 shape the genome size difference between the two Epidendrum genomes. This study provides the first genome profiling for species in the genus Epidendrum , but also highlights the importance of using flow cytometry, cytogenetic approaches and bioinformatics techniques in combination for genome profiling.
Article activity feed
-
Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
Learn more at Review Commons
Reply to the reviewers
Reviewer #1 (Evidence, reproducibility and clarity (Required)):
Summary:
The study provides a comprehensive overview of genome size variation in two related species of the genus Epidendrum, which appear to be homoploid, although their DNA content more closely corresponds to that of heteroploid species. While I have a few serious concerns regarding the data analysis, the study itself demonstrates a well-designed approach and offers a valuable comparison of different methods for genome size estimation. In particular, I would highlight the analysis of repetitive elements, which effectively explains the observed differences between the species. However, …
Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
Learn more at Review Commons
Reply to the reviewers
Reviewer #1 (Evidence, reproducibility and clarity (Required)):
Summary:
The study provides a comprehensive overview of genome size variation in two related species of the genus Epidendrum, which appear to be homoploid, although their DNA content more closely corresponds to that of heteroploid species. While I have a few serious concerns regarding the data analysis, the study itself demonstrates a well-designed approach and offers a valuable comparison of different methods for genome size estimation. In particular, I would highlight the analysis of repetitive elements, which effectively explains the observed differences between the species. However, I encourage the authors to adopt a more critical perspective on the k-mer analysis and the potential pitfalls in data interpretation.
Major comments:
R1. p. 9: Genome size estimation via flow cytometry is an incorrect approach. The deviation is approximately 19% for E. anisatum and about 25% for E. marmoratum across three repeated measurements of the same tissue over three days? These values are far beyond the accepted standards of best practice for flow cytometry, which recommend a maximum deviation of 2-5% between repeated measurements of the same individual. Such variability indicates a systemic methodological issue or improper instrument calibration. Results with this level of inconsistency cannot be considered reliable estimates of genome size obtained by flow cytometry. If you provide the raw data, I can help identify the likely source of error, but as it stands, these results are not acceptable.
__A: __Thanks a lot for pointing out this issue. We have identified the source of the wide interval after consulting with the staff of LabNalCit. We originally used human peripheral blood mononuclear cells (PBMCs) as a reference to estimate the genome size (GS) of P. sativum and used the resulting range to estimate the GS of Epidendrum. We calculated P. sativum's GS using a wide human GS range of 6-7 Gb, which resulted in a wide range of* P. sativum* GS and, consequently, in a wide range of GS for our samples. Therefore, the wide range reported is not an issue with the instruments, but about the specifics of the analysis.
__We have done the following changes: __
- Reducing the range we calculated of P. sativum's GS using a narrower human genome size range (6.41-6.51; Piovesan et al. 2019; DOI: 10.1186/s13104-019-4137-z), and using these intervals to calculate our sample's GS.
- We have explained our procedure in the methods, changed our results as required, and included a supplementary table with cytometry data (Supplementary Data Table 1).
- Human peripheral blood mononuclear cells (PBMCs) from healthy individuals were used as a standard laboratory reference to calculate the P. sativum genome size. Pisum sativum and the Epidendrum samples were analyzed in a CytoFLEX S flow cytometer (Beckman-Coulter), individually and in combination with the internal references (PBMCs and P. sativum, respectively). Cytometry data analysis was performed using FlowJo® v. 10 (https://www.flowjo.com/). A genome size value for the Epidendrum samples was calculated as the average of the minimum and maximum 1C/2C values obtained from three replicates of the DNA content histograms of each tissue sample. Minimum and maximum values come from the interval of P. sativum estimations based on the human genome size range (human genome size range: 6.41-6.51; Piovesan et al. 2019).
- The 1C value in gigabases (Gb; calculated from mass in pg) of *E. anisatum *ranged from 2.55 to 2.62 Gb (mean 1C value = 2.59 Gb) and that of *E. marmoratum *from 1.11 to 1.18 Gb (mean 1C value = 1.13 Gb; Supplementary Data Table S1).
- We also eliminated from Figure 3 the range we had estimated previously.
- Finally, we changed the focus of the comparison and discussion of the evaluation of the bioinformatic estimations, highlighting this deviation rather than whether the GS bioinformatic estimations fall within the cytometric interval. We calculated the Mean Absolute Deviation (MAD) as the absolute difference between the genome size estimates using k-mers and flow cytometry. This meant changing the results in P. 11 and 12 and adding to Fig. 3 two boxplots depicting the MAD. We have also added Supplementary Data Fig. S3 depicting the absolute deviations for E. anisatum and* E. marmoratum per tool using the estimates generated from a k-mer counting with a maximum k-mer coverage value of 10,000 using 16 different values of k; a Supplementary Data Figure S5 depicting the mean absolute deviations resulting from the different subsampled simulated depths of coverage of 5×, 10×, 20×, 30×, and 40×; and finally a Supplementary Data Fig. S6 depicting the MAD changes as a function of depth of coverage for E. anisatum and E. marmoratum*.
R1. p. 14 and some parts of Introduction: It may seem unusual, to say the least, to question genome size estimation in orchids using flow cytometry, given that this group is well known for extensive endoreplication. However, what effect does this phenomenon have on genome size analyses based on k-mers, or on the correct interpretation of peaks in k-mer histograms? How can such analyses be reliably interpreted when most nuclei used for DNA extraction and sequencing likely originate from endoreplicated cells? I would have expected a more detailed discussion of this issue in light of your results, particularly regarding the substantial variation in genome size estimates across different k-mer analysis settings. Could endoreplication be a contributing factor?
A:
We reworded the introduction p.3, 2nd paragraph to make our point on the effect of endoreplication on flow cytometry clearer. We eliminated the following sentence from discussion p. 15 : "Difficulties for cytometric estimation of genome size can thus be taxon-specific. Therefore, cross-validating flow cytometry and bioinformatics results can be the most effective method for estimating plant genome size, especially when only tissues suspected to show significant endoreplication, such as leaves, are available" We added the following, p. 18: Genome size estimation for non-model species is considered a highly standardized approach. However, tissue availability and intrinsic genome characteristics (large genomes, polyploidy, endoreplication, and the proportion of repetitive DNA) can still preclude genome size estimation (e.g. Kim et al. 2025) using cytometry and bioinformatic tools. Cross-validating flow cytometry and bioinformatics results might be particularly useful in those cases. For example, when only tissues suspected of showing significant conventional endoreplication, such as leaves, are available, bioinformatic tools can help to confirm that the first peak in cytometry histograms corresponds to 2C. Conversely, bioinformatic methods can be hindered by partial endoreplication, which only flow cytometry can detect.
4. We included a paragraph discussing the effect of CE and PE on bioinformatic GS estimation P. 17:Besides ploidy level, heterozygosity, and the proportion of repetitive DNA, k-mer distribution can be modified by endoreplication. Since endoreplication of the whole genome (CE) produces genome copies (as in preparation for cell division, but nuclear and cell division do not occur ), we do not expect an effect on genome size estimates based on k-mer analyses. In contrast, PE alters coverage of a significant proportion of the genome, affecting k-mer distributions and genome size estimates (Piet et al., 2022). Species with PE might be challenging for k-mer-based methods of genome size estimation.
R1. You repeatedly refer to the experiment on genome size estimation using analyses with maximum k-mer coverage of 10,000 and 2 million, under different k values. However, I would like to see a comparison - such as a correlation analysis - that supports this experiment. The results and discussion sections refer to it extensively, yet no corresponding figure or analysis is presented.
A:
We had previously included the results of the analyses using different k-mer coverage in the Supplementary Data Figure S2. We have added, to formally compare the results using analyses with maximum k-mer coverage of 10,000 and 2 million, a Wilcoxon paired signed-rank test, which showed a significant difference, p. 12: The estimated genome sizes using a maximum count value of 10,000 were generally lower for all tools in both species compared to using a maximum count value of 2 million (median of 2M experiment genome size - median of 10K experiment genome size= 0.24 Gb). The estimated genome size of the 2 million experiment also tended to be closer to the flow cytometry genome size estimation with significantly lower MAD than the 10K experiment (Wilcoxon paired signed-rank test p = 0.0009). In the 10K experiment (Supplementary Data Figure S2; S3), the tool with the lowest MAD for* E. anisatum* was findGSE-het (0.546 Gb) and for E. marmoratum it was findGSE-hom (0.116 Gb).
2. We have added a boxplot in the Supplementary Data Figure S3 depicting the mean absolute deviations using maximum k-mer coverage of 10,000 and 2 million compared to flow cytometry.Minor comments:
R1. p. 3: You stated: "Flow cytometry is the gold standard for genome size estimation, but whole-genome endoreplication (also known as conventional endoreplication; CE) and strict partial endoreplication (SPE) can confound this method." How did you mean this? Endopolyploidy is quite common in plants and flow cytometry is an excellent tool how to detect it and how to select the proper nuclei fraction for genome size estimation (if you are aware of possible misinterpretation caused by using inappropriate tissue for analysis). The same can be applied for partial endoreplication in orchids (see e.g. Travnicek et al 2015). Moreover, the term "strict partial endoreplication" is outdated and is only used by Brown et al. In more recent studies, the term partial endoreplication is used (e.g. Chumova et al. 2021- 10.1111/tpj.15306 or Piet et al. 2022 - 10.1016/j.xplc.2022.100330).
A:
We have reworded the paragraph where we stated "Flow cytometry is the gold standard for genome size estimation", as in the answer to Major comment 2. Additionally, we highlighted in the discussion how, while FC is the gold standard for GS estimation, studying multiple alternatives to it may be important for cases in which live tissue is not available or is available only to a limited extent (i.e. only certain tissues), p. 18 We have changed the term "strict partial endoreplication" to partial endoreplication (PE).
R1. p. 5: "...both because of its outstanding taxic diversity..." There is no such thing as "taxic" diversity - perhaps you mean taxonomic diversity or species richness.
__A: __We have changed "taxic diversity" to "species diversity".
R1. p. 6: In description of flow cytometry you stated: "Young leaves of Pisum sativum (4.45
pg/1C; Doležel et al. 1998) and peripheral blood mononuclear cells (PBMCs) from healthy
individuals...". What does that mean? Did you really use blood cells? For what purpose?
A: Please find the explanation and the modifications we've made in the answer to major comment 1.
R1. p. 7: What do you mean by this statement "...reference of low-copy nuclear genes for each species..."? As far as I know, the Granados-Mendoza study used the Angiosperm v.1 probe set, so did you use that set of probes as reference?
__A: __We rewrote: "To estimate the allele frequencies, the filtered sequences were mapped to a
reference of low-copy nuclear genes for each species" to:
To estimate the allele frequencies, the filtered sequences were mapped to the Angiosperm v.1 low-copy nuclear gene set of each species.
R1. p. 7: Chromosome counts - there is a paragraph of methodology used for chromosome counting, but no results of this important part of the study.
A: We are including a supplementary figure (Supplementary Data Figure 7) with micrographs of the chromosomes of E. anisatum and E. marmoratum.
R1. p. 12: Depth of coverage used in repeatome analysis - why did you use different coverage for both species? Any explanation is needed.
A: To make explicit the fact that the depth of coverage is determined automatically by the analysis with no consideration for the amount of input reads, but only of the graph density and the amount of RAM available (Box 3 in Novak et al. 2020), we rewrote:
"To estimate the proportion of repetitive DNA, the individual protocol analyzed reads corresponding to depths of coverage of 0.06× for Epidendrum anisatum and 0.43× for E. marmoratum." to
To estimate the proportion of repetitive DNA, the RepeatExplorer2 individual protocol determined a max number of analyzed reads (Nmax) corresponding to depths of coverage of 0.06x for Epidendrum anisatum and 0.43x for E. marmoratum.
R1. p. 16: The variation in genome size of orchids is even higher, as the highest known DNA amount has been estimated in Liparis purpureoviridis - 56.11 pg (Travnicek et al 2019 - doi: 10.1111/nph.15996)
A: We have updated it.
R1. Fig. 1 - Where is the standard peak on Fig. 1? You mention it explicitly on page 9 where you are talking about FCM histograms.
A: We reworded the results, eliminating the references to the standard internal reference.
Reviewer #1 (Significance (Required)):
Significance
This study provides a valuable contribution to understanding genome size variation in two Epidendrum species by combining flow cytometry, k-mer analysis, and repetitive element characterization. Its strength lies in the integrative approach and in demonstrating how repetitive elements can explain interspecific differences in DNA content. The work is among the first to directly compare flow cytometric and k-mer-based genome size estimates in orchids, extending current knowledge of genome evolution in this complex plant group. However, the study would benefit from a more critical discussion of the limitations and interpretative pitfalls of k-mer analysis and from addressing methodological inconsistencies in the cytometric data. The research will interest a specialized audience in plant genomics, cytogenetics, and genome evolution, particularly those studying non-model or highly endoreplicated species.
Field of expertise: plant cytogenetics, genome size evolution, orchid genomics.
Reviewer #2 (Evidence, reproducibility and clarity (Required)):
Summary:
With this work, the authors provide genome profiling information on the Epidendrum genus. They performed low-coverage short read sequencing and analysis, as well as flow cytometry approaches to estimate genome size, and perform comparative analysis for these methods. They also used the WGS dataset to test different approaches and models for genome profiling, as well as repeat abundance estimation, empathising the importance of genome profiling to provide basic and comparative genomic information in our non-model study species. Results show that the two "closely-related" Epidendrum species analysed (E. marmoratum and E. anisatum) have different genome profiles, exhibiting a 2.3-fold genome size difference, mostly triggered by the expansion of repetitive elements in E. marmoratum, specially of Ty3-Gypsy LTR-retrotransposon and a 172 tandem repeat (satellite DNA).
Major comments:
Overall, the manuscript is well-written, the aim, results and methods are explained properly, and although I missed some information in the introduction, the paper structure is overall good, and it doesn't lack any important information. The quality of the analysis is also adequate and no further big experiments or analysis would be needed.
However, from my point of view, two main issues would need to be addressed:
__R2. __The methods section is properly detailed and well explained. However, the project data and scripts are not available at the figshare link provided, and the BioProject code provided is not found at SRA. This needs to be solved as soon as possible, as if they're not available for review reproducibility of the manuscript cannot be fully assessed.
__A: __We have made public the .histo files for all depths of coverage and cluster table files necessary to reproduce the results. We will also make public a fraction of the sequencing sufficient to reproduce our genome size and repetitive DNA results as soon as the manuscript is formally published. Whole dataset availability will be pending on the publication of the whole genome draft.
R2. The authors specify in the methods that 0.06x and 0.43x sequencing depths were used as inputs for the RE analysis of E. anisatum and E. marmoratum. I understand these are differences based on the data availability and genome size differences. However, they don't correspond to either of the recommendations from Novak et al (2020):
In the context of individual analysis: "The number of analyzed reads should correspond to 0.1-0.5× genome coverage. In the case of repeat-poor species, coverage can be increased up to 1.0-1.5×." Therefore, using 0.06x for E. anisatum should be justified, or at least addressed in the discussion.
Moreover, using such difference in coverage might affect any comparisons made using these results. Given that the amount of reads is not limiting in this case, why such specific coverages have been used should be discussed in detail.
In the context of comparative analysis: "Because different genomes are being analyzed simultaneously, the user must decide how they will be represented in the analyzed reads, choosing one of the following options. First, the number of reads analyzed from each genome will be adjusted to represent the same genome coverage. This option provides the same sensitivity of repeat detection for all analyzed samples and is therefore generally recommended; however, it requires that genome sizes of all analyzed species are known and that they do not substantially differ. In the case of large differences in genome sizes, too few reads may be analyzed from smaller genomes, especially if many species are analyzed simultaneously. A second option is to analyze the same number of reads from all samples, which will provide different depth of analysis in species differing in their genome sizes, and this fact should be considered when interpreting analysis results. Because each of these analysis setups has its advantages and drawbacks, it is a good idea to run both and cross-check their results."
Therefore, it should be confirmed how much it was used for this approach (as in the methods it is only specified how much it was used for the individual analysis), and why.
__A: __In Box 3, Novak et al (2020) explain that the number of analyzed reads (Nmax) is determined automatically by RepeatExplorer2, based on the graph density and available RAM. Therefore, the reported depths of coverage are results, not the input of the analysis. We tried different amounts of reads as input and got consistently similar results, so we kept the analysis using the whole dataset.
For the comparative analysis, we have added the resulting depth of coverage and explained that we used the same number of reads for both species.
Added to methods:
"For the comparative protocol, we used the same amount of reads for both species".
Added to results:
"To estimate the proportion of repetitive DNA, the RepeatExplorer2 individual protocol determined a maximum number of analyzed reads (Nmax) corresponding to depths of coverage of 0.06x for E. anisatum and 0.43x for* E. marmoratum*. "
"The RepeatExplorer2 comparative protocol determined a maximum number of analyzed reads (Nmax) corresponding to depths of coverage of approximately 0.14x for E. marmoratum and 0.06x for* E. anisatum*"
This is consistent with other works which utilize RepeatExplorer2, for example, Chumová et al (2021; https://doi.org/10.1111/tpj.15306), who wrote: "The final repeatome analysis for each species was done using a maximum number of reads representing between 0.049x and 1.389x of genome coverage."
Minor comments:
General comments:
- The concept of genome endoreplication and the problem it represents for C-value estimations needs to be better contextualised. It would be nice to have some background information in the introduction on how this is an issue (specially in Orchid species). Results shown are valuable and interesting but require a little more context on how frequent this is in plants, especially in Orchids, and across different tissues.
__A: __We have included information about the variation of conventional and partial endoreplication in plants.
Differences in CE may also occur between individuals or even respond to environmental factors (Barow 2006). In contrast, PE results in cells that replicate only a fraction (P) of the genome (Brown et al. 2017) and it has only been reported in Orchidaceae (Brown et al. 2017). CE and PE can occur in one or several endoreplication rounds, and different plant tissues may have different proportions of 2C, 4C, 8C ... nC or 2C, 4E, 8E, ... nE nuclear populations, respectively. The 2C nuclear population sometimes constitutes only a small fraction in differentiated somatic tissues and can be overlooked by cytometry (Trávníček et al. 2015). Using plant tissues with a high proportion of the 2C population (such as orchid ovaries and pollinaria) can help overcome this difficulty (Trávníček et al. 2015; Brown et al. 2017).
Comments and suggestions on the figures:
__R2. __In fig 1, the flow cytometry histograms need to be more self-explanatory. What are the Y axis "counts" of? Also, please either place the label for both rows or for each, but don't make it redundant. The axis fonts need to be made a bit larger too. If possible, explain briefly in the figure legend (and not only in the text) what each peak means.
__A: __We have modified the figure adding legends for Y and X axes, eliminated redundant labels, and changed the font size.
__R2. __Fig 5. Horizontal axis labels are illegible. Please make these larger (maybe make the plot wider by moving the plot legend to the top/bottom of the figure? - just a suggestion).
__A: __We consider the horizontal axis label to be superfluous and we removed it.
Small text editing suggestions:
R2. Methods, "Ploidy level estimation and chromosome counts" section. It would be easier for the reader if this paragraph were either divided into two methods sections, or into two paragraphs at least, since these are two very different approaches and provide slightly different data or information.
A: We slightly modified: "Chromosome number was counted from developing root tips" to
"Additionally, to confirm ploidy level, chromosome number was counted from developing root tips" and changed the subtitle to only "Ploidy level estimation".
R2. Methods, "Genome size estimation by k-mer analysis" section. Please specify whether the coverage simulations (of 5x to 40x) were made based on 1c or 2c of the genome size? I assumed haploid genome size but best to clarify.
A: We have added it to P7: "To assess the suitability of the whole dataset and estimate the minimum coverage required for genome size estimation, the depth of coverage of both datasets was calculated based on the flow cytometry 1C genome size values."
R2. Results, "Genome size estimation by k-mer analysis and ploidy estimation" section. In the first two paragraphs, the results presented appear to conform to anticipated patterns based on known properties of these types of datasets. Although this information confirms expected patterns, it does not provide new or biologically significant insights into the genomes analysed. It may be beneficial to further summarize these paragraphs so that the focus of this section can shift toward the comparison of methods and the biological interpretation of the genome profiles of Epidendrum.
__A: __We agree that those paragraphs deviate a little from the focus of our results. However, we believe they provide useful information both for pattern confirmation in a relatively understudied field and for readers which may not be very familiar with the methods utilized.
__R2. __Discussion, "Genome size estimation using flow cytometry" section. In the second paragraph, it is discussed how potential endoduplication events can "trick" the flow cytometry measurements. This has probably previously been discussed on other C-value calculation studies and would benefit from context from literature. How does this endoduplication really affect C-value measurements across plant taxa? I understand it is a well-known issue, so maybe add some references?
A: We have included in the Introduction information about CE and PE and their associated references. P. 3 and 4.
__R2. __Discussion, "Repetitive DNA composition in Epidendrum anisatum and E. marmoratum" section. In the second paragraph, when mentioning the relative abundance of Ty3-gypsy and Ty1-copia elements, it is also worth mentioning their differences in genomic distribution and the potential structural role of Ty3-gypsy elements.
A: We added this paragraph in P.20:
"Ty3-gypsy elements are frequently found in centromeric and pericentromeric regions, and may have an important structural role in heterochromatin (Jin et al. 2004; Neumann et al. 2011; Ma et al. 2023), particularly those with chromodomains in their structure (chromovirus, i.e. Tekay, CRM transposons; Neumann et al. 2011). Conversely, Ty1-copia elements tend to be more frequent in gene-rich regions (Wang et al. 2025A). However, Ty3-gypsy chromovirus elements can be found outside the heterochromatin regions (Neumann et al. 2011), and in Pennisetum purpureum (Poaceae) Ty1-copia elements are more common in pericentromeric regions (Yu et al. 2022)."
R2. Discussion, "Repetitive DNA composition in Epidendrum anisatum and E. marmoratum" section. In the third paragraph, it is mentioned that both species have 2n=40. I believe these are results from this work since there is a methods section for chromosome counting. This data should therefore go into results.
__A: __We have added the chromosome count micrographs as Supplementary Data Fig. S7
R2. Discussion, "Repetitive DNA composition in Epidendrum anisatum and E. marmoratum" section. I'd recommend expanding a bit more on repetitive DNA differences based on the RepeatExplorer results. Providing references on whether this has been found in other taxa would be helpful too. For example, Ogre bursts have been previously described in other species (e.g. legumes, Wang et al., 2025). Moreover, I consider worth highlighting and discussing other interesting differences found, such as the differences in unknown repeats (could be due to one species having "older" elements- too degraded to give any database hits- compared to the other), or Class II TE differences between species (and how these account less for genome size difference because of their size), etc.
A: We have rearranged and added discussion expanding on the role of repetitive DNA in E. anisatum and E. marmoratum and how it relates to the repetitive DNA in other species. This includes Ogre transposons, an expanded Ty1-copia vs. Ty3-gypsy discussion, and a section on unclassified repeats and can be found on P.19 to P.21.
Reviewer #2 (Significance (Required)):
Overall, this study provides a valuable contribution to our understanding of genome size diversity and repetitive DNA dynamics within Epidendrum, particularly through its combined use of low-coverage sequencing, flow cytometry, and comparative genome profiling. Its strongest aspects lie in the clear methodological framework and the integration of multiple complementary approaches, which together highlight substantial genome size divergence driven by repeat proliferation-an insight of clear relevance for orchid genomics and plant genome evolution more broadly.
While the work would benefit from improved data availability, additional contextualization of the problem of endoreduplication in flow cytometry, and clarification of some figure elements and methodological details, the study nonetheless advances the field by presenting new comparative genomic information for two understudied species and by evaluating different strategies for genome profiling in non-model taxa.
The primary audience will include researchers in non-model plant genomics, cytogenetics, and evolutionary biology, although the methodological comparisons may also be useful to a broader community working on genome characterization in diverse lineages. My expertise is in plant genomics, genome size evolution, and repetitive DNA biology; I am not a specialist in flow cytometry instrumentation or cytological methods, so my evaluation of those aspects is based on general familiarity rather than technical depth.
Reviewer #3 (Evidence, reproducibility and clarity (Required)):
A review on "Nuclear genome profiling of two Mexican orchids of the genus Epidendrum" by Alcalá-Gaxiola et al. submitted to ReviewCommons
The present manuscript presented genomic data for two endemic Maxican orchids: Epidendrum anisatum and E. marmoratum. Authors aim to determine the genome size and ploidy using traditional (flow cytometry and chromosome counts) and genomic techniques (k-mer analysis, heterozygosity), along with the repetitive DNA composition characterization.
Considering the genomic composition, the main difference observed in repeat composition between the two species was attributed to the presence of a 172 bp satDNA (AniS1) in E. anisatum, which represents about 11% of its genome but is virtually absent in E. marmoratum. The differences in the genomic proportion of AniS1 and Ty3-gypsy/Ogre lineage TEs between E. anisatum and E. marmoratum are suggested as potential drivers of the GS difference identified between the two species.
Our main concern are about the GS estimation and chromosome number determination. Along with many issues related to GS estimations by flow cytometry, results related to chromosome number determination are missing on the manuscript. Improvements in both techiniques and results are crucial since authors aim to compare different methods to GS and ploidy determination.
__R3. __Genome size: Following the abstract, it is no possible to understand that authors confirm the GS by flow cytometry - as clarified after on the manuscript. Please, since the approach used to obtain the results are crucial on this manuscript, make it clear on the abstract.
A: We have highlighted the congruence of flow cytometry and bioinformatic approaches in the abstract:
"Multiple depths of coverage, k values, and k-mer-based tools for genome size estimation were explored and contrasted with cytometry genome size estimations. Cytometry and k-mer analyses yielded a consistently higher genome size for E. anisatum (mean 1C genome size = 2.59 Gb) than * E. marmoratum* (mean 1C genome size = 1.13 Gb), which represents a 2.3-fold genome size difference."
__R3.__Flow cytometry methodology: For a standard protocol, it is mandatory to use, at least, three individuals, each one analyzed on triplicate. Is is also important to check the variation among measurements obtained from the same individual and the values obtained from different individuals. Such variation should be bellow 3%. The result should be the avarege C-value following the standard deviation, what inform us the variation among individuals and measurements.
__A: __We have done three technical replicates of each tissue of the individuals of E. anisatum and E. marmoratum. To show the variation from different replicates and tissues, we have included the Supplementary Data Table S1. Intraspecific variation on genome size is beyond the scope of this work.
__R3. __Checking Fig. 1, we could not see the Pisum peack. If authors performed an analysis with external standart, it should be clarified on Methods. I suggest always use internal standard.
Besides, comparing Fig. 1 for leave and pollinium, it seems to be necessary to set up the Flow Cytoemtry equipament. Note that the 2C peack change its position when comparing different graphs. The data could be placed more central on x-axis by setting the flow cytometry.
Action Required: Considering that authors want to compare indirect genomic approaches to determine the GS, I suggest authors improve the GS determination by Flow Cytometry.
Please, on Methodology section, keep both techniques focused on GS close one another. Follow the same order on Methodology, Results and Discussion sections.
__A: __We have made several changes on the estimation and reporting of the flow cytometry genome size estimation. Among these:
We have clarified the use of the P. sativum internal standard and PBMC's in methods (P.6). We have added the associated mean coefficient of variation for both the sample and the internal reference in Supplementary Data Table S1, in order to show that the variation is not the result of an instrument error. We have changed the order of the paragraphs in the methods section to follow the order in other sections.
__R3. __Chromosome count: In Introduction section (page 5), the authors explicitly aim to provide "bioinformatics ploidy level estimation and chromosome counting." Furthermore, the Methods section (page 7, subsection "Ploidy level estimation and chromosome counts") details a specific protocol for chromosome counting involving root tip pretreatment, fixation, and staining. However, no results regarding chromosome counting are presented in the manuscript. There are no micrographs of metaphase plates, no tables with counts, and no mention of the actual counts in the Results section or Supplementary Material. Despite this absence of evidence, the Discussion (Page 18) states: "ploidy and chromosome counts of both E. anisatum and E. marmoratum are the same (2n=40)." The value of 2n=40 is presented as a finding of this study, however, there is no reference to this results.
Action Required: The authors must resolve this discrepancy by either providing the missing empirical data (micrographs and counts). This detail needs to be reviewed with greater care and scientific integrity.
__A: __We have added the chromosome count micrographs as Supplementary Data Fig. S7.
Minor reviews (Suggestions):
__R3. __Refining the Title (Optional): Although the current title is descriptive, we believe it undersells the value of the manuscript. Since this study provides the first genome profiling and repeatome characterization for the genus Epidendrum and offers important insights into the calibration of bioinformatics tools and flow cytometry for repetitive genomes, I suggest modifying the title to reflect these aspects. The comparative access of GS is also an importante feature. This would make the article more attractive to a broader audience interested in genomics of non-model organisms.
__A: __We have changed the title to "Nuclear genome profiling of two species of Epidendrum (Orchidaceae): genome size, repeatome and ploidy"
__R3. __Botanical Nomenclature (Optional): Although citing taxonomic authorities is not strictly required in all fields of plant sciences, most botanical journals expect the full author citation at the first mention of each species. Including this information would improve the nomenclatural rigor of the manuscript and align it with common practices in botanical publishing.
A: We have added the citation of the taxonomic authorities:
"This study aims to use two closely related endemic Mexican species, Epidendrum anisatum Lex and Epidendrum marmoratum A. Rich. & Galeotti, to provide the first genomic profiling for this genus..."
__R3. __Abbreviation of Genus Names: I noticed inconsistencies in the abbreviation of scientific names throughout the manuscript. Standard scientific style dictates that the full genus name (Epidendrum) should be written out only at its first mention in the Abstract and again at the first mention in the main text. Thereafter, it should be abbreviated (e.g., E. anisatum, E. marmoratum), unless the name appears at the beginning of a sentence or if abbreviation would cause ambiguity with another genus. Please revise the text to apply this abbreviation consistently.
A: We have made the changes requested as necessary.
__R3. __Genome Size Notation: In the Abstract and throughout the text, genome size estimates are presented using the statistical symbol for the mean (x). While mathematically accurate, this notation is generic and does not immediately inform the reader about the biological nature of the DNA content (i.e., whether it refers to the gametic 1C or somatic 2C value). In plant cytometry literature, it is standard practice to explicitly label these values using C-value terminology to prevent ambiguity and eliminate the effect of the number of chromosome sets (Bennett & Leitch 2005; Greilhuber et al. 2005; Doležel et al. 2018). I strongly suggest replacing references to "x" with "1C" (e.g., changing "x = 2.58 Gb" to "mean 1C value = 2.58 Gb") to ensure immediate clarity and alignment with established conventions in the field.
__A: __We have revised the text in every instance, for example, in the results section:
"The 1C value in gigabases (Gb; calculated from mass in pg) of* E. anisatum* ranged from 2.55 to 2.62 Gb (mean 1C value = 2.59 Gb) and that of E. marmoratum from 1.11 to 1.18 Gb (mean 1C value = 1.13 Gb; Supplementary Data Table S1)."
__R3. __Justification of the Sequencing Method: Although the sequencing strategy is clearly described, the manuscript would benefit from a bit more contextualization regarding the choice of low-pass genome skimming. In the Introduction, a short justification of why this approach is suitable for estimating genome size, heterozygosity, and repeat composition, particularly in plants with large, repeat-rich genomes, would help readers better understand the methodological rationale. Likewise, in the Methods section, briefly outlining why the selected sequencing depth is appropriate, and how it aligns with previous studies using similar coverage levels, would strengthen the clarity of the methodological framework. These additions would make the rationale behind the sequencing approach more transparent and accessible to readers who may be less familiar with low-coverage genomic strategies.
__A: __We have added the following short sentence in P.7:
"This sequencing method produces suitable data sets without systematic biases, allowing the estimation of genome size and the proportion of repetitive DNA. "
__R3. __Wording Improvement Regarding RepeatExplorer2 Results: In the Results section, several sentences attribute biological outcomes to the RepeatExplorer2 "protocols" (e.g., "According to this protocol, both species have highly repetitive genomes..."; "The comparative protocol showed a 67% total repeat proportion, which falls between the estimated repeat proportions of the two species according to the results of the individual protocol"). Since the RepeatExplorer2 protocol itself only provides the analytical workflow and not species-specific results, this phrasing may be misleading.
A: We have rephrased these sections to emphasize that these are "the results of" the protocols and not the protocols themselves.
Reviewer #3 (Significance (Required)):
Significance
General assessment
Strengths
1.First Detailed Genomic Profile for the Genus Epidendrum: The study provides the first integrated dataset on genome size, ploidy, heterozygosity, and repeatome for species of the genus Epidendrum, a novel contribution for an extremely diverse and under-explored group in terms of cytogenomics.
Cross-validation of in vitro and in silico analyses: Flow cytometry is considered the gold standard for genome size (GS) estimation because it physically measures DNA quantity (Doležel et al. 2007; Śliwińska 2018). However, it typically requires fresh tissue, which is not always available. Conversely, k-mer analysis is a rapid bioinformatics technique utilizing sequencing data that does not rely on a reference genome. Nevertheless, it is frequently viewed with skepticism or distrust due to discrepancies with laboratory GS estimates (Pflug et al. 2020; Hesse 2023). In this study, by comparing computational results with flow cytometry data, the authors were able to validate the reliability of computational estimates for the investigated species. Since the 'true' GS was already established via flow cytometry, the authors used this value as a benchmark to test various software tools (GenomeScope, findGSE, CovEst) and parameters. This approach allowed for the identification of which tools perform best for complex genomes. For instance, they found that tools failing to account for heterozygosity (such as findGSE-hom) drastically overestimated the genome size of E. anisatum, whereas GenomeScope and findGSE-het (which account for heterozygosity) yielded results closer to the flow cytometry values. Thus, they demonstrated that this cross-validation is an effective method for estimating plant genome sizes with greater precision. This integrative approach is essential not only for defining GS but also for demonstrating how bioinformatics methods must be calibrated (particularly regarding depth of coverage and maximum k-mer coverage) to provide accurate data for non-model organisms when flow cytometry is not feasible.
Limitations
- Limited Taxonomic Sampling: The study analyzes only two species of Epidendrum, which restricts the ability to make broad inferences regarding genome evolution across the genus. Given the outstanding diversity of Epidendrum (>1,800 species), the current sampling is insufficient to propose generalized evolutionary patterns. As the authors state by the end of the Discussion (page 18) "Future work should investigate to what extent LTR transposons and satellite DNA have been responsible for shaping genome size variation in different lineages of Epidendrum, analyzing a greater portion of its taxic diversity in an evolutionary context.". 2.Lack of Cytogenetic Results and Mapping: One of the major finding of this study is the identification of the AniS1 satellite as a potential key driver of the genome size difference between the species, occupying ~11% of the E. anisatum genome and virtually absent in E. marmoratum. While the authors use bioinformatic metrics (C and P indices) to infer a dispersed organization in the Discussion (Page 18), the study lacks physical validation via Fluorescence in situ Hybridization (FISH) - and a basic validation of the chromosome number. Without cytogenetic mapping, it is impossible to confirm the actual chromosomal distribution of this massive repetitive array, for instance, whether it has accumulated in specific heterochromatic blocks (e.g., centromeric or subtelomeric regions) or if it is genuinely interspersed along the chromosome arms. I suggest acknowledging this as a limitation in the Discussion, as the physical organization of such abundant repeats has significant implications for understanding the structural evolution of the species' chromosomes.
Advance
To the best of our knowledge, this study represents the first comprehensive genome profiling and repeatome characterization for any species of the genus Epidendrum. By integrating flow cytometry, k-mer-based approaches, and low-pass sequencing, the authors provide the first insights into the genomic architecture of Epidendrum, including quantitative assessments of transposable elements, lineage-specific satellite DNA, and repeat-driven genome expansion. This constitutes both a technical and a conceptual advance: technically, the study demonstrates the feasibility and limitations of combining in vitro and in silico methods for genome characterization in large, repeat-rich plant genomes; conceptually, it offers new evolutionary perspectives on how repetitive elements shape genome size divergence within a highly diverse orchid lineage. These results broaden the genomic knowledge base for Neotropical orchids and establish a foundational reference for future comparative, cytogenomic, and phylogenomic studies within Epidendrum and related groups.
Audience
This study will primarily interest a broad audience, including researchers in plant genomics, evolutionary biology, cytogenomics, and bioinformatics, especially those working with non-model plants or groups with large, repetitive genomes. It also holds relevance for scientists engaged in genome size evolution, repetitive DNA biology, and comparative genomics. Other researchers are likely to use this work as a methodological reference for genome profiling in non-model taxa, especially regarding the integration of flow cytometry and k-mer-based estimations and the challenges posed by highly repetitive genomes. The detailed repeatome characterization, including identification of lineage-specific satellites and retrotransposon dynamics, will support comparative genomic analyses, repeat evolution studies, and future cytogenetic validation (e.g., FISH experiments). Additionally, this dataset establishes a genomic baseline that can inform phylogenomic studies, species delimitation, and evolutionary inference within Epidendrum and related orchid groups.
Reviewer's Backgrounds
The review was prepared by two reviewers. Our expertise lies in evolution and biological diversity, with a focus on cytogenomic and genome size evolution. Among the projects in development, the cytogenomics evolution of Neotropical orchids is one of the main studies (also focused on Epidendrum). These areas shape my perspective in evaluating the evolutionary, cytogenomic, and biological implications of the study. However, we have limited expertise in methodologies related to k-mer-based genome profiling and heterozygosity modeling. Therefore, our evaluation does not deeply assess the technical validity of these analytical pipelines.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #3
Evidence, reproducibility and clarity
A review on "Nuclear genome profiling of two Mexican orchids of the genus Epidendrum" by Alcalá-Gaxiola et al. submitted to ReviewCommons
The present manuscript presented genomic data for two endemic Maxican orchids: Epidendrum anisatum and E. marmoratum. Authors aim to determine the genome size and ploidy using traditional (flow cytometry and chromosome counts) and genomic techniques (k-mer analysis, heterozygosity), along with the repetitive DNA composition characterization.
Considering the genomic composition, the main difference observed in repeat composition between the two species was attributed to the presence of a 172 bp …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #3
Evidence, reproducibility and clarity
A review on "Nuclear genome profiling of two Mexican orchids of the genus Epidendrum" by Alcalá-Gaxiola et al. submitted to ReviewCommons
The present manuscript presented genomic data for two endemic Maxican orchids: Epidendrum anisatum and E. marmoratum. Authors aim to determine the genome size and ploidy using traditional (flow cytometry and chromosome counts) and genomic techniques (k-mer analysis, heterozygosity), along with the repetitive DNA composition characterization.
Considering the genomic composition, the main difference observed in repeat composition between the two species was attributed to the presence of a 172 bp satDNA (AniS1) in E. anisatum, which represents about 11% of its genome but is virtually absent in E. marmoratum. The differences in the genomic proportion of AniS1 and Ty3-gypsy/Ogre lineage TEs between E. anisatum and E. marmoratum are suggested as potential drivers of the GS difference identified between the two species.
Our main concern are about the GS estimation and chromosome number determination. Along with many issues related to GS estimations by flow cytometry, results related to chromosome number determination are missing on the manuscript. Improvements in both techiniques and results are crucial since authors aim to compare different methods to GS and ploidy determination.
Genome size: Following the abstract, it is no possible to understand that authors confirm the GS by flow cytometry - as clarified after on the manuscript. Please, since the approach used to obtain the results are crucial on this manuscript, make it clear on the abstract. Flow cytometry methodology: For a standart protocol, it is mandatory to use, at least, three individuals, each one analyzed on triplicate. Is is also important to check the variation among measurements obtained from the same individual and the values obtained from different individuals. Such variation should be bellow 3%. The result should be the avarege C-value following the standard deviation, what inform us the variation among individuals and measurements. Checking Fig. 1, we could not see the Pisum peack. If authors performed an analysis with external standart, it should be clarified on Methods. I suggest always use internal standard. Besides, comparing Fig. 1 for leave and pollinium, it seems to be necessary to set up the Flow Cytoemtry equipament. Note that the 2C peack change its position when comparing different graphs. The data could be placed more central on x-axis by setting the flow cytometry. Action Required: Considering that authors want to compare indirect genomic approaches to determine the GS, I suggest authors improve the GS determination by Flow Cytometry. Please, on Methodology section, keep both techniques focused on GS close one another. Follow the same order on Methodology, Results and Discussion sections.
Chromosome count: In Introduction section (page 5), the authors explicitly aim to provide "bioinformatics ploidy level estimation and chromosome counting." Furthermore, the Methods section (page 7, subsection "Ploidy level estimation and chromosome counts") details a specific protocol for chromosome counting involving root tip pretreatment, fixation, and staining. However, no results regarding chromosome counting are presented in the manuscript. There are no micrographs of metaphase plates, no tables with counts, and no mention of the actual counts in the Results section or Supplementary Material. Despite this absence of evidence, the Discussion (Page 18) states: "ploidy and chromosome counts of both E. anisatum and E. marmoratum are the same (2n=40)." The value of 2n=40 is presented as a finding of this study, however, there is no reference to this results. Action Required: The authors must resolve this discrepancy by either providing the missing empirical data (micrographs and counts). This detail needs to be reviewed with greater care and scientific integrity. Minor reviews (Sugestions): Refining the Title (Optional): Although the current title is descriptive, we believe it undersells the value of the manuscript. Since this study provides the first genome profiling and repeatome characterization for the genus Epidendrum and offers important insights into the calibration of bioinformatics tools and flow cytometry for repetitive genomes, I suggest modifying the title to reflect these aspects. The comparative access of GS is also an importante feature. This would make the article more attractive to a broader audience interested in genomics of non-model organisms.
Botanical Nomenclature (Optional): Although citing taxonomic authorities is not strictly required in all fields of plant sciences, most botanical journals expect the full author citation at the first mention of each species. Including this information would improve the nomenclatural rigor of the manuscript and align it with common practices in botanical publishing.
Abbreviation of Genus Names: I noticed inconsistencies in the abbreviation of scientific names throughout the manuscript. Standard scientific style dictates that the full genus name (Epidendrum) should be written out only at its first mention in the Abstract and again at the first mention in the main text. Thereafter, it should be abbreviated (e.g., E. anisatum, E. marmoratum), unless the name appears at the beginning of a sentence or if abbreviation would cause ambiguity with another genus. Please revise the text to apply this abbreviation consistently.
Genome Size Notation: In the Abstract and throughout the text, genome size estimates are presented using the statistical symbol for the mean (x). While mathematically accurate, this notation is generic and does not immediately inform the reader about the biological nature of the DNA content (i.e., whether it refers to the gametic 1C or somatic 2C value). In plant cytometry literature, it is standard practice to explicitly label these values using C-value terminology to prevent ambiguity and eliminate the effect of the number of chromosome sets (Bennett & Leitch 2005; Greilhuber et al. 2005; Doležel et al. 2018). I strongly suggest replacing references to "x" with "1C" (e.g., changing "x = 2.58 Gb" to "mean 1C value = 2.58 Gb") to ensure immediate clarity and alignment with established conventions in the field.
Justification of the Sequencing Method: Although the sequencing strategy is clearly described, the manuscript would benefit from a bit more contextualization regarding the choice of low-pass genome skimming. In the Introduction, a short justification of why this approach is suitable for estimating genome size, heterozygosity, and repeat composition, particularly in plants with large, repeat-rich genomes, would help readers better understand the methodological rationale. Likewise, in the Methods section, briefly outlining why the selected sequencing depth is appropriate, and how it aligns with previous studies using similar coverage levels, would strengthen the clarity of the methodological framework. These additions would make the rationale behind the sequencing approach more transparent and accessible to readers who may be less familiar with low-coverage genomic strategies.
Wording Improvement Regarding RepeatExplorer2 Results: In the Results section, several sentences attribute biological outcomes to the RepeatExplorer2 "protocols" (e.g., "According to this protocol, both species have highly repetitive genomes..."; "The comparative protocol showed a 67% total repeat proportion, which falls between the estimated repeat proportions of the two species according to the results of the individual protocol"). Since the RepeatExplorer2 protocol itself only provides the analytical workflow and not species-specific results, this phrasing may be misleading.
Significance
General assessment
Strengths
- First Detailed Genomic Profile for the Genus Epidendrum: The study provides the first integrated dataset on genome size, ploidy, heterozygosity, and repeatome for species of the genus Epidendrum, a novel contribution for an extremely diverse and under-explored group in terms of cytogenomics.
- Cross-validation of in vitro and in silico analyses: Flow cytometry is considered the gold standard for genome size (GS) estimation because it physically measures DNA quantity (Doležel et al. 2007; Śliwińska 2018). However, it typically requires fresh tissue, which is not always available. Conversely, k-mer analysis is a rapid bioinformatics technique utilizing sequencing data that does not rely on a reference genome. Nevertheless, it is frequently viewed with skepticism or distrust due to discrepancies with laboratory GS estimates (Pflug et al. 2020; Hesse 2023). In this study, by comparing computational results with flow cytometry data, the authors were able to validate the reliability of computational estimates for the investigated species. Since the 'true' GS was already established via flow cytometry, the authors used this value as a benchmark to test various software tools (GenomeScope, findGSE, CovEst) and parameters. This approach allowed for the identification of which tools perform best for complex genomes. For instance, they found that tools failing to account for heterozygosity (such as findGSE-hom) drastically overestimated the genome size of E. anisatum, whereas GenomeScope and findGSE-het (which account for heterozygosity) yielded results closer to the flow cytometry values. Thus, they demonstrated that this cross-validation is an effective method for estimating plant genome sizes with greater precision. This integrative approach is essential not only for defining GS but also for demonstrating how bioinformatics methods must be calibrated (particularly regarding depth of coverage and maximum k-mer coverage) to provide accurate data for non-model organisms when flow cytometry is not feasible.
Limitations
- Limited Taxonomic Sampling: The study analyzes only two species of Epidendrum, which restricts the ability to make broad inferences regarding genome evolution across the genus. Given the outstanding diversity of Epidendrum (>1,800 species), the current sampling is insufficient to propose generalized evolutionary patterns. As the authors state by the end of the Discussion (page 18) "Future work should investigate to what extent LTR transposons and satellite DNA have been responsible for shaping genome size variation in different lineages of Epidendrum, analyzing a greater portion of its taxic diversity in an evolutionary context.".
- Lack of Cytogenetic Results and Mapping: One of the major finding of this study is the identification of the AniS1 satellite as a potential key driver of the genome size difference between the species, occupying ~11% of the E. anisatum genome and virtually absent in E. marmoratum. While the authors use bioinformatic metrics (C and P indices) to infer a dispersed organization in the Discussion (Page 18), the study lacks physical validation via Fluorescence in situ Hybridization (FISH) - and a basic validation of the chromosome number. Without cytogenetic mapping, it is impossible to confirm the actual chromosomal distribution of this massive repetitive array, for instance, whether it has accumulated in specific heterochromatic blocks (e.g., centromeric or subtelomeric regions) or if it is genuinely interspersed along the chromosome arms. I suggest acknowledging this as a limitation in the Discussion, as the physical organization of such abundant repeats has significant implications for understanding the structural evolution of the species' chromosomes.
Advance
To the best of our knowledge, this study represents the first comprehensive genome profiling and repeatome characterization for any species of the genus Epidendrum. By integrating flow cytometry, k-mer-based approaches, and low-pass sequencing, the authors provide the first insights into the genomic architecture of Epidendrum, including quantitative assessments of transposable elements, lineage-specific satellite DNA, and repeat-driven genome expansion. This constitutes both a technical and a conceptual advance: technically, the study demonstrates the feasibility and limitations of combining in vitro and in silico methods for genome characterization in large, repeat-rich plant genomes; conceptually, it offers new evolutionary perspectives on how repetitive elements shape genome size divergence within a highly diverse orchid lineage. These results broaden the genomic knowledge base for Neotropical orchids and establish a foundational reference for future comparative, cytogenomic, and phylogenomic studies within Epidendrum and related groups.
Audience
This study will primarily interest a broad audience, including researchers in plant genomics, evolutionary biology, cytogenomics, and bioinformatics, especially those working with non-model plants or groups with large, repetitive genomes. It also holds relevance for scientists engaged in genome size evolution, repetitive DNA biology, and comparative genomics. Other researchers are likely to use this work as a methodological reference for genome profiling in non-model taxa, especially regarding the integration of flow cytometry and k-mer-based estimations and the challenges posed by highly repetitive genomes. The detailed repeatome characterization, including identification of lineage-specific satellites and retrotransposon dynamics, will support comparative genomic analyses, repeat evolution studies, and future cytogenetic validation (e.g., FISH experiments). Additionally, this dataset establishes a genomic baseline that can inform phylogenomic studies, species delimitation, and evolutionary inference within Epidendrum and related orchid groups.
Reviewer's Backgrounds
The review was prepared by two reviewers. Our expertise lies in evolution and biological diversity, with a focus on cytogenomic and genome size evolution. Among the projects in development, the cytogenomics evolution of Neotropical orchids is one of the main studies (also focused on Epidendrum). These areas shape my perspective in evaluating the evolutionary, cytogenomic, and biological implications of the study. However, we have limited expertise in methodologies related to k-mer-based genome profiling and heterozygosity modeling. Therefore, our evaluation does not deeply assess the technical validity of these analytical pipelines.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #2
Evidence, reproducibility and clarity
Summary:
With this work, the authors provide genome profiling information on the Epidendrum genus. They performed low-coverage short read sequencing and analysis, as well as flow cytometry approaches to estimate genome size, and perform comparative analysis for these methods. They also used the WGS dataset to test different approaches and models for genome profiling, as well as repeat abundance estimation, empathising the importance of genome profiling to provide basic and comparative genomic information in our non-model study species. Results show that the two "closely-related" Epidendrum species analysed (E. marmoratum and E. …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #2
Evidence, reproducibility and clarity
Summary:
With this work, the authors provide genome profiling information on the Epidendrum genus. They performed low-coverage short read sequencing and analysis, as well as flow cytometry approaches to estimate genome size, and perform comparative analysis for these methods. They also used the WGS dataset to test different approaches and models for genome profiling, as well as repeat abundance estimation, empathising the importance of genome profiling to provide basic and comparative genomic information in our non-model study species. Results show that the two "closely-related" Epidendrum species analysed (E. marmoratum and E. anisatum) have different genome profiles, exhibiting a 2.3-fold genome size difference, mostly triggered by the expansion of repetitive elements in E. marmoratum, specially of Ty3-Gypsy LTR-retrotransposon and a 172 tandem repeat (satellite DNA).
Major comments:
Overall, the manuscript is well-written, the aim, results and methods are explained properly, and although I missed some information in the introduction, the paper structure is overall good, and it doesn't lack any important information. The quality of the analysis is also adequate and no further big experiments or analysis would be needed. However, from my point of view, two main issues would need to be addressed:
- The methods section is properly detailed and well explained. However, the project data and scripts are not available at the figshare link provided, and the BioProject code provided is not found at SRA. This needs to be solved as soon as possible, as if they're not available for review reproducibility of the manuscript cannot be fully assessed.
- The authors specify in the methods that 0.06x and 0.43x sequencing depths were used as inputs for the RE analysis of E. anisatum and E. marmoratum. I understand these are differences based on the data availability and genome size differences. However, they don't correspond to either of the recommendations from Novak et al (2020):
In the context of individual analysis: "The number of analyzed reads should correspond to 0.1-0.5× genome coverage. In the case of repeat-poor species, coverage can be increased up to 1.0-1.5×." Therefore, using 0.06x for E. anisatum should be justified, or at least addressed in the discussion. Moreover, using such difference in coverage might affect any comparisons made using these results. Given that the amount of reads is not limiting in this case, why such specific coverages have been used should be discussed in detail.
In the context of comparative analysis: "Because different genomes are being analyzed simultaneously, the user must decide how they will be represented in the analyzed reads, choosing one of the following options. First, the number of reads analyzed from each genome will be adjusted to represent the same genome coverage. This option provides the same sensitivity of repeat detection for all analyzed samples and is therefore generally recommended; however, it requires that genome sizes of all analyzed species are known and that they do not substantially differ. In the case of large differences in genome sizes, too few reads may be analyzed from smaller genomes, especially if many species are analyzed simultaneously. A second option is to analyze the same number of reads from all samples, which will provide different depth of analysis in species differing in their genome sizes, and this fact should be considered when interpreting analysis results. Because each of these analysis setups has its advantages and drawbacks, it is a good idea to run both and cross-check their results." Therefore, it should be confirmed how much it was used for this approach (as in the methods it is only specified how much it was used for the individual analysis), and why.
Minor comments:
General comments:
- The concept of genome endoreplication and the problem it represents for C-value estimations needs to be better contextualised. It would be nice to have some background information in the introduction on how this is an issue (specially in Orchid species). Results shown are valuable and interesting but require a little more context on how frequent this is in plants, especially in Orchids, and across different tissues.
Comments and suggestions on the figures:
- In fig 1, the flow cytometry histograms need to be more self-explanatory. What are the Y axis "counts" of? Also, please either place the label for both rows or for each, but don't make it redundant. The axis fonts need to be made a bit larger too. If possible, explain briefly in the figure legend (and not only in the text) what each peak means.
- Fig 5. Horizontal axis labels are illegible. Please make these larger (maybe make the plot wider by moving the plot legend to the top/bottom of the figure? - just a suggestion).
Small text editing suggestions:
- Methods, "Ploidy level estimation and chromosome counts" section. It would be easier for the reader if this paragraph was either divided into two methods sections, or into two paragraphs at least, since these are two very different approaches and provide slightly different data or information.
- Methods, "Genome size estimation by k-mer analysis" section. Please specify whether the coverage simulations (of 5x to 40x) were made based on 1c or 2c of the genome size? I assumed haploid genome size but best to clarify.
- Results, "Genome size estimation by k-mer analysis and ploidy estimation" section. In the first two paragraphs, the results presented appear to conform to anticipated patterns based on known properties of these types of datasets. Although this information confirms expected patterns, it does not provide new or biologically significant insights into the genomes analysed. It may be beneficial to further summarize these paragraphs so that the focus of this section can shift toward the comparison of methods and the biological interpretation of the genome profiles of Epidendrum.
- Discussion, "Genome size estimation using flow cytometry" section. In the second paragraph, it is discussed how potential endoduplication events can "trick" the flow cytometry measurements. This has probably previously been discussed on other C-value calculation studies and would benefit from context from literature. How does this endoduplication really affect C-value measurements across plant taxa? I understand it is a well-known issue, so maybe add some references?
- Discussion, "Repetitive DNA composition in Epidendrum anisatum and E. marmoratum" section. In the second paragraph, when mentioning the relative abundance of Ty3-gypsy and Ty1-copia elements, it is also worth mentioning their differences in genomic distribution and the potential structural role of Ty3-gypsy elements.
- Discussion, "Repetitive DNA composition in Epidendrum anisatum and E. marmoratum" section. In the third paragraph, it is mentioned that both species have 2n=40. I believe these are results from this work since there is a methods section for chromosome counting. This data should therefore go into results.
- Discussion, "Repetitive DNA composition in Epidendrum anisatum and E. marmoratum" section. I'd recommend expanding a bit more on repetitive DNA differences based on the RepeatExplorer results. Providing references on whether this has been found in other taxa would be helpful too. For example, Ogre bursts have been previously described in other species (e.g. legumes, Wang et al., 2025). Moreover, I consider worth highlighting and discussing other interesting differences found, such as the differences in unknown repeats (could be due to one species having "older" elements- too degraded to give any database hits- compared to the other), or Class II TE differences between species (and how these account less for genome size difference because of their size), etc.
Significance
Overall, this study provides a valuable contribution to our understanding of genome size diversity and repetitive DNA dynamics within Epidendrum, particularly through its combined use of low-coverage sequencing, flow cytometry, and comparative genome profiling. Its strongest aspects lie in the clear methodological framework and the integration of multiple complementary approaches, which together highlight substantial genome size divergence driven by repeat proliferation-an insight of clear relevance for orchid genomics and plant genome evolution more broadly.
While the work would benefit from improved data availability, additional contextualization of the problem of endoreduplication in flow cytometry, and clarification of some figure elements and methodological details, the study nonetheless advances the field by presenting new comparative genomic information for two understudied species and by evaluating different strategies for genome profiling in non-model taxa.
The primary audience will include researchers in non-model plant genomics, cytogenetics, and evolutionary biology, although the methodological comparisons may also be useful to a broader community working on genome characterization in diverse lineages. My expertise is in plant genomics, genome size evolution, and repetitive DNA biology; I am not a specialist in flow cytometry instrumentation or cytological methods, so my evaluation of those aspects is based on general familiarity rather than technical depth.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #1
Evidence, reproducibility and clarity
Summary:
The study provides a comprehensive overview of genome size variation in two related species of the genus Epidendrum, which appear to be homoploid, although their DNA content more closely corresponds to that of heteroploid species. While I have a few serious concerns regarding the data analysis, the study itself demonstrates a well-designed approach and offers a valuable comparison of different methods for genome size estimation. In particular, I would highlight the analysis of repetitive elements, which effectively explains the observed differences between the species. However, I encourage the authors to adopt a more …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #1
Evidence, reproducibility and clarity
Summary:
The study provides a comprehensive overview of genome size variation in two related species of the genus Epidendrum, which appear to be homoploid, although their DNA content more closely corresponds to that of heteroploid species. While I have a few serious concerns regarding the data analysis, the study itself demonstrates a well-designed approach and offers a valuable comparison of different methods for genome size estimation. In particular, I would highlight the analysis of repetitive elements, which effectively explains the observed differences between the species. However, I encourage the authors to adopt a more critical perspective on the k-mer analysis and the potential pitfalls in data interpretation.
Major comments:
p. 9: Genome size estimation via flow cytometry is an incorrect approach. The deviation is approximately 19% for E. anisatum and about 25% for E. marmoratum across three repeated measurements of the same tissue over three days? These values are far beyond the accepted standards of best practice for flow cytometry, which recommend a maximum deviation of 2-5% between repeated measurements of the same individual. Such variability indicates a systemic methodological issue or improper instrument calibration. Results with this level of inconsistency cannot be considered reliable estimates of genome size obtained by flow cytometry. If you provide the raw data, I can help identify the likely source of error, but as it stands, these results are not acceptable.
p. 14 and some parts of Introduction: It may seem unusual, to say the least, to question genome size estimation in orchids using flow cytometry, given that this group is well known for extensive endoreplication. However, what effect does this phenomenon have on genome size analyses based on k-mers, or on the correct interpretation of peaks in k-mer histograms? How can such analyses be reliably interpreted when most nuclei used for DNA extraction and sequencing likely originate from endoreplicated cells? I would have expected a more detailed discussion of this issue in light of your results, particularly regarding the substantial variation in genome size estimates across different k-mer analysis settings. Could endoreplication be a contributing factor?
You repeatedly refer to the experiment on genome size estimation using analyses with maximum k-mer coverage of 10,000 and 2 million, under different k values. However, I would like to see a comparison - such as a correlation analysis - that supports this experiment. The results and discussion sections refer to it extensively, yet no corresponding figure or analysis is presented.
Minor comments:
p. 3: You stated: "Flow cytometry is the gold standard for genome size estimation, but whole-genome endoreplication (also known as conventional endoreplication; CE) and strict partial endoreplication (SPE) can confound this method." How did you mean this? Endopolyploidy is quite common in plants and flow cytometry is an excellent tool how to detect it and how to select the proper nuclei fraction for genome size estimation (if you are aware of possible misinterpretation caused by using inappropriate tissue for analysis). The same can be applied for partial endoreplication in orchids (see e.g. Travnicek et al 2015). Moreover, the term "strict partial endoreplication" is outdated and is only used by Brwon et al. In more recent studies, the term partial endoreplication is used (e.g. Chumova et al. 2021- 10.1111/tpj.15306 or Piet et al. 2022 - 10.1016/j.xplc.2022.100330).
p. 5: "...both because of its outstanding taxic diversity..." There is no such thing as "taxic" diversity - perhaps you mean taxonomic diversity or species richness.
p. 6: In description of flow cytometry you stated: "Young leaves of Pisum sativum (4.45 pg/1C; Doležel et al. 1998) and peripheral blood mononuclear cells (PBMCs) from healthy individuals...". What does that mean? Did you really use blood cells? For what purpose?
p. 7: What do you mean by this statement "...reference of low-copy nuclear genes for each species..."? As far as I know, the Granados-Mendoza study used the Angiosperm v.1 probe set, so did you use that set of probes as reference?
p. 7: Chromosome counts - there is a paragraph of methodology used for chromosome counting, but no results of this important part of the study.
p. 12: Depth of coverage used in repeatome analysis - why did you use different coverage for both species? Any explanation is needed.
p. 16: The variation in genome size of orchids is even higher, as the highest known DNA amount has been estimated in Liparis purpureoviridis - 56.11 pg (Travnicek et al 2019 - doi: 10.1111/nph.15996)
Fig. 1 - Where is the standard peak on Fig. 1? You mention it explicitly on page 9 where you are talking about FCM histograms.
Significance
This study provides a valuable contribution to understanding genome size variation in two Epidendrum species by combining flow cytometry, k-mer analysis, and repetitive element characterization. Its strength lies in the integrative approach and in demonstrating how repetitive elements can explain interspecific differences in DNA content. The work is among the first to directly compare flow cytometric and k-mer-based genome size estimates in orchids, extending current knowledge of genome evolution in this complex plant group. However, the study would benefit from a more critical discussion of the limitations and interpretative pitfalls of k-mer analysis and from addressing methodological inconsistencies in the cytometric data. The research will interest a specialized audience in plant genomics, cytogenetics, and genome evolution, particularly those studying non-model or highly endoreplicated species.
Field of expertise: plant cytogenetics, genome size evolution, orchid genomics.
-
