coelsch: Platform-agnostic single-cell analysis of meiotic recombination events
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Review Commons)
Abstract
Background: Meiotic recombination creates genetic diversity through reciprocal exchange of haplotypes between homologous chromosomes. Scalable and robust methods for mapping recombination breakpoints are essential for understanding meiosis and for genetic mapping. Single cell sequencing of gametes offers a direct approach to recombination mapping, yet the effect of technical differences between single-cell sequencing methods for crossover detection remains unclear. Results: We benchmark single cell methods for droplet-based chromatin accessibility and RNA sequencing and plate-based whole-genome amplification for mapping meiotic recombination in Arabidopsis thaliana. For this purpose we introduce two novel open-source tools coelsch_mapping_pipeline and coelsch for haplotype-aware alignment and per-cell crossover detection, using them to recover known recombination frequencies and quantify the effects of coverage sparsity. We subsequently apply our approach to a panel of 40 recombinant F 1 hybrids derived from crosses of 22 diverse natural accessions, successfully recovering genetic maps for 34 F 1 s in a single dataset. This analysis reveals substantial variation in recombination rate and identifies a ~10 Mb pericentric inversion in the accession Zin-9, the largest natural inversion reported in A. thaliana to date. Conclusions: These results demonstrate the applicability and scalability of single-cell gamete sequencing for high-throughput mapping of meiotic recombination, and highlight the strengths and limitations of different single-cell modalities. The accompanying open-source tools provide a framework for haplotyping and crossover detection analysis using sparse single-cell sequencing data. Our methodology enables parallel analysis of large numbers of hybrids in a single dataset, removing a major technical barrier to large-scale studies of natural variation in recombination rate.
Article activity feed
-
Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
Learn more at Review Commons
Reply to the reviewers
Revision Plan
1. General Statements
We thank the reviewers for their positive and constructive assessment of the manuscript. We are encouraged that all three reviewers recognise the value of coelsch as an open-source framework for haplotyping and crossover detection from single-cell gamete sequencing data, and that they view the study as a useful contribution to the fields of recombination and genetic research. We are particularly grateful that Reviewer 1 described the manuscript as an "interesting and important study" and a "genuinely useful methodological framework that fills a real gap in the recombination biology toolkit", while Reviewer 2 …
Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
Learn more at Review Commons
Reply to the reviewers
Revision Plan
1. General Statements
We thank the reviewers for their positive and constructive assessment of the manuscript. We are encouraged that all three reviewers recognise the value of coelsch as an open-source framework for haplotyping and crossover detection from single-cell gamete sequencing data, and that they view the study as a useful contribution to the fields of recombination and genetic research. We are particularly grateful that Reviewer 1 described the manuscript as an "interesting and important study" and a "genuinely useful methodological framework that fills a real gap in the recombination biology toolkit", while Reviewer 2 highlighted its "strong innovation, complete technical pipeline, and significant biological implications" and considered it an "important technical breakthrough". We also appreciate Reviewer 3's assessment that the study provides "timely guidance for experimental design", that the results are "important for guiding plant single-cell research" in general, and that the work "has the potential to attract a broad readership".
In our view, the main contribution of the manuscript is the development of a platform-agnostic method for recovering haplotypes and crossover events from single-cell sequencing data. This addresses an important practical gap: single-cell gamete sequencing has strong potential for high-throughput haplotyping and recombination mapping, but its broader use requires tools that can accommodate the very different coverage structures produced by different sequencing modalities and platforms. coelsch was designed to meet this need.
The experimental datasets in the manuscript serve two purposes. First, they demonstrate that coelsch can be applied across multiple single-cell modalities and platforms, including scRNA, scATAC and scWGA sequencing from 10x Genomics, BD, and Takara platforms. Second, they illustrate the kinds of biological and practical questions that can be addressed with single-cell gamete sequencing, including crossover detection in meiotic mutants and large-scale analysis of natural variation in recombination.
While all reviewers strongly supported the publication of the work, they also raised important points about specific aspects, including technical variation and reproducibility, the rationale for using 10x scRNA to generate the diversity panel dataset, and the effects of coverage on crossover localisation, amongst others. We agree that addressing these points will make the manuscript clearer and more useful to readers. Our planned revisions therefore aim to strengthen the experimental and computational support for the framework, clarify the interpretation of the modality comparisons, and provide additional guidance for researchers who may wish to apply coelsch or related single-cell sequencing approaches in future studies.
2. Description of the planned revisions
2.1. Additional technical replicates and clearer treatment of batch/sample-handling effects
Reviewers 1, 2 and 3 all noted that the comparison of different platforms and modalities is based on limited replication, with different nuclei isolation and processing strategies used for different technologies. Reviewer 3 requested a fully controlled benchmark in which the same nuclei preparation is split across all tested platforms. We agree that this would be the ideal design for a dedicated head-to-head benchmarking study. However, the primary aim of the manuscript is to demonstrate the applicability of coelsch across different single-cell sequencing data types, rather than to provide a definitive benchmark of the intrinsic performance of each modality and platform.
In addition, a fully matched and replicated cross-platform experiment for all technologies is not feasible. Isolated nuclei deteriorate rapidly after preparation and must be processed promptly for single-cell library construction; this makes it impractical to distribute the same preparation across multiple time- and labour-intensive workflows. However, this design is feasible for 10x scRNA-seq and 10x scATAC-seq. To address this point directly, we will therefore generate two matched technical replicates each of 10x scRNA-seq and 10x scATAC-seq from nuclei isolated in the same sorting run.
We will also improve our library-level QC summary tables. We will report, where available, the number of nuclei used for loading, recovered barcodes, barcodes retained after QC, inferred high-quality nuclei and artefacts, informative fragments per nucleus, genomic bin coverage, and final nuclei used for crossover calling. This will make the effects of loading, capture efficiency, QC filtering, and modality-specific data loss more transparent.
In the revised text, we will distinguish more clearly between modality-specific effects and possible batch/sample-preparation effects. Where the current manuscript implies that differences are intrinsic properties of sequencing platforms, we will soften the interpretation unless supported by the new replicate data, reproducibility analyses, or well-supported properties that have been reported previously in literature.
2.2. Rationale for using 10x scRNA-seq in the natural variation panel
Reviewers 1 and 3 asked why the natural variation panel was analysed using 10x scRNA-seq, given that Takara scWGA produced higher per-cell crossover localisation accuracy in the modality comparison. We will revise the manuscript to explain this experimental decision more clearly.
The natural variation panel was designed as a high-throughput experiment requiring sufficient numbers of usable nuclei from many pooled F₁ hybrids. In our hands, 10x scRNA-seq has generally produced the largest number of usable nuclei barcodes and the lowest proportion of artefacts. This makes 10x scRNA-seq well suited to experiments where many nuclei are required per genotype. By contrast, applying Takara scWGA to a pooled panel of this scale would be expected to recover only tens of usable nuclei per F₁ hybrid, which would be insufficient for robust recombination-rate or landscape estimation.
We will add this explanation to the relevant Results section and clarify that the choice of 10x scRNA-seq reflects a trade-off between per-cell crossover resolution and the number of informative nuclei recovered per genotype. We will also add genotype-level summaries for the pooled natural variation experiment, including assigned nuclei per genotype and genotype-specific genomic coverage of informative fragments.
2.3. Reproducibility of recombination landscapes across replicates and modalities
Reviewer 1 requested recombination landscape plots for all tested modalities, and several comments raised the need to show within-modality reproducibility. We will add recombination landscape plots for wild-type Col-0 × Ler libraries across the tested modalities, including the newly generated replicate 10x scATAC and scRNA libraries.
We will assess reproducibility using comparisons of unsmoothed, non-overlapping windowed recombination-rate estimates, both within and between modalities. These will be quantified using bootstrapped estimates of spearman rank correlation coefficient, and visualised using scatterplots and/or recombination landscapes.
2.4. Sequencing depth, coverage, and crossover localisation resolution
Reviewers 1 and 3 requested clearer quantitative reporting of crossover resolution and a stronger analysis of depth effects. We will revise the manuscript to report practical crossover localisation resolution for each modality, including median and interquartile localisation error or interval size in genomic units.
We will expand the simulation analyses to compare false-positive and negative rates and localisation accuracy across modalities, including telomere-proximal error profiles for scWGA and scATAC as well as 10x RNA data. We will perform downsampling analyses to assess how crossover detection accuracy changes as a function of informative-fragment depth. Where feasible, we will compare depth-matched subsets across modalities to distinguish effects of sequencing depth from modality-specific coverage structure.
These analyses will be used to clarify the extent to which each modality is suitable for different applications, such as broad landscape estimation, crossover counting, or fine localisation.
2.5. Artefact detection, high doublet rates, and representativeness after filtering
All three reviewers raised concerns about the high proportion of barcodes excluded by the filtering procedure, particularly in the Takara scWGA dataset. In hindsight, we believe part of this concern stems from the poor choice of terminology ("doublets") we used to describe these excluded barcodes.
While true doublets (i.e. two nuclei entering a single droplet or nanowell) are one likely source of such signals, the filtering procedure more broadly identifies artefactual barcodes that do not exhibit a clear single-gamete haplotype structure. These barcodes may arise from a variety of sources, including doublets, multiplets, high levels of ambient DNA or RNA, or empty droplets containing only ambient material. Although visual examination can be used to make predictions about the source of these artefacts, our detection method does not attempt to distinguish between them, and artefacts in different modalities may stem from different sources in varying proportions. We will therefore revise the terminology throughout the manuscript to clarify that these represent a broader class of low-confidence or noise barcodes, rather than confirmed doublets.
For the Takara scWGA data, we will revise the manuscript to discuss the discrepancy between the CellSelect well classifications (which uses proprietary software to label doublets) and the final artefact predictions from coelsch. We can only speculate as to why CellSelect failed to detect many apparent doublet and multiplet artefacts in this experiment, but we agree with the reviewer that the most likely explanation is the small size of Arabidopsis pollen nuclei relative to the expectations of the imaging and classification procedure. To support this interpretation, we will add supplementary analysis comparing the CellSelect images from individual nanowells with the final doublet predictions inferred from scWGA data. This will allow readers to see examples of wells classified as acceptable by CellSelect but subsequently inferred to contain artefacts based on their haplotype structure.
We will also add sensitivity analyses showing how key results change under different artefact-filtering thresholds. These analyses will include crossover count distributions, recombination landscape estimates, and modality-level comparisons. We will examine the extreme upper tail of crossover counts observed in 10x scATAC-seq and assess whether these barcodes are artefacts that have escaped detection.
Finally, we will assess whether retained singlets are representative of the input data with respect to informative-fragment counts, coverage, and inferred crossover patterns. This will address the concern that filtering could preferentially remove nuclei with particular recombination profiles.
2.6. Biases arising from pollen nuclear biology
Reviewer 2 raised an issue concerning the biases arising from the two different nuclei types present in mature trinuclear Arabidopsis pollen, and reviewer 3 endorsed this point. While we do not agree with the reviewer that scRNA and scATAC cannot capture sperm nuclei due to their condensed nature (see Parker et al. 2025 PLoS Biology for evidence against this claim), it is true that technical variation in nuclei isolation and sorting may affect the relative representation of nuclei types - usually, however, resulting in the underrepresentation of vegetative nuclei (Parker et al. 2025). We will add text addressing this point to the manuscript.
It is also true that differences in expressed genes between vegetative and sperm nuclei, which have very different transcriptomic profiles, will affect the distribution of informative reads for crossover analysis in scRNA data, and therefore may also have an impact on the recovered recombination landscapes (despite that the underlying landscapes are biologically identical). We will address this in the manuscript by adding recombination landscape plots and reproducibility scatterplots (as described in point 2.3) comparing sperm and vegetative nuclei from scRNA-seq to the manuscript.
2.7. Robustness of the pipeline and parameter choices
Reviewer 3 raised the concern that quantitative conclusions depend on a single pipeline with fixed parameter choices. We will address this by adding a parameter-sensitivity analysis for the main computational steps. Specifically, we will test the robustness of crossover calling on simulated data to changes in bin size and rHMM parameters, showing how these affect sensitivity to noise and agreement of predictions with ground truth data.
2.8. Natural variation analysis: genotype-specific coverage and terminal crossover enrichment
Reviewers 1, 2 and 3 raised concerns about whether natural variation in crossover rate and terminality could be influenced by genotype-specific coverage, marker density, pooling imbalance, or dropout. We will add a more detailed description of how pollen from different F₁ hybrids was pooled and how genotype assignment was performed. We will report genotype-level recovery statistics, including the six hybrids excluded from downstream analysis, and discuss how imbalances may arise, e.g. through biological variation in pollen count and fertility, biases in nuclei isolation or sequencing, and biases in genotyping and informative fragments.
Reviewer 1 specifically asked whether the lower terminal crossover index observed in Cvi-0 crosses compared with Col-0 crosses could reflect systematic differences in informative-fragment distributions rather than true biological differences in crossover localisation. We will address this by using the genotype-specific informative-fragment distributions observed in the diversity-panel scRNA-seq dataset to simulate crossover datasets with known ground truth. This will allow us to test whether differences in marker variant or expressed-gene distributions causing variation in informative-fragment distribution could systematically bias terminal crossover detection in Cvi-0 crosses relative to Col-0 crosses.
If feasible within the revision timeframe, we will also perform an orthogonal validation experiment for a selected comparison showing a clear difference in crossover terminality, such as Col-0 × Sah-0 and Cvi-0 × Sah-0. This would use progeny sequencing of backcross populations to estimate recombination landscapes independently of single-cell scRNA-seq, providing a direct test of whether the inferred terminality difference is supported by conventional recombination mapping. If this experiment cannot be completed within the revision timeframe, we will clearly state this limitation and base the revised interpretation on the simulation analyses described above.
2.9. Broader applicability and practical guidance for users
Reviewer 1 requested more discussion of applicability beyond Arabidopsis and to outcrossing or polyploid species. We will expand the Discussion to address the requirements and limitations of applying coelsch in other systems.
2.10. Minor figure, reference, and presentation revisions
We will address the remaining minor comments, including adding missing axis labels and checking duplicated references.
3. Description of the revisions that have already been incorporated in the transferred manuscript
No revisions have yet been incorporated in the transferred manuscript.
4. Description of analyses that authors prefer not to carry out
4.1. Full new benchmark across all modalities from the same nuclei preparation.
As acknowledged in section 2.1, we agree with Reviewer 3 that a fully controlled benchmark in which the same isolated nuclei preparation is split across all tested platforms would be the ideal experimental design for separating intrinsic modality- or platform-specific effects from sample-handling and batch effects. However, this is not feasible for all technologies within the scope of this revision, because isolated nuclei degrade quickly, the single-cell sequencing methods are time- and labour-intensive, and the relevant platforms are not all available to us in the same location.
We will therefore not perform a complete new cross-platform benchmark across all modalities. Instead, we will address this issue in the parts of the experiment where a matched design is feasible: we will generate two additional matched technical replicates each for 10x scRNA-seq and 10x scATAC-seq from nuclei isolated in the same sorting run. We will also revise the manuscript to more clearly acknowledge the limitations imposed by the lack of a fully matched cross-platform design and to ensure that our conclusions are interpreted in that context.
4.2. Profiling the natural variation panel with a second modality
Reviewer 1 suggested profiling at least a subset of the diversity panel with an additional single-cell modality. We agree that this would be useful, but we do not currently plan to generate a second-modality dataset for the natural variation panel. We would like to point out that this dataset introduces 34 genetic maps in a single sequencing experiment, which is not easily repeated.
The natural variation experiment was designed as a high-throughput survey across many F₁ hybrids, and repeating even a subset with scWGA or scATAC would require substantial additional sample preparation and sequencing. Instead, we will strengthen the justification for the use of 10x scRNA-seq by adding genotype-level coverage summaries and simulations to show which conclusions are well supported at the observed data density.
4.3. Orthogonal progeny sequencing from the exact same F₁ plants
Reviewer 3 suggested that progeny sequencing from the same F₁ plants used for single-cell assays would provide a direct ground truth. This experiment would require additional crosses, progeny generation, and matched single-cell and progeny sequencing, which would not be justified by the insights that this effort delivers: While progeny sequencing can provide an independent validation dataset, we do not agree that it would constitute a substantially better ground truth than the simulations used here. Simulations provide a known ground truth for every individual barcode, whereas progeny sequencing cannot, for the obvious reason that pollen grains are destroyed during single-cell sequencing and therefore cannot be used to generate offspring. In addition, progeny-derived recombination landscapes are not a perfect ground truth at the population level, since segregation distortion and post-meiotic selection can alter the observed distribution of recombination events relative to those present in the original pollen population.
4.4. Formal benchmarking of ____coelsch____ as a structural-variant detection method
Reviewer 2 asked whether large structural variants were identified in other accessions besides Zin-9, and what sensitivity and specificity can be expected from recombination coldspot-based structural-variant detection. We agree that this is an interesting question, given that the Zin-9 inversion was identified through its strong effect on recombination. However, we do not plan to develop or benchmark coelsch as a comprehensive structural-variant detection method as part of this revision.
The Zin-9 event was identified by visual inspection of the recombination maps, where it appeared as an unusually large and conspicuous recombination coldspot. We did not develop a systematic structural-variant calling procedure, as we do not view recombination suppression alone as a sufficiently specific signal for structural-variant detection. Coldspots can arise for many reasons, including centromere proximity or local recombination modifiers. Therefore, although large rearrangements such as inversions or translocations may sometimes be detectable through their effects on recombination, coelsch should not be considered as a general-purpose structural-variant caller.
In the revised manuscript, we will clarify this limitation and avoid implying that recombination coldspot analysis provides comprehensive structural-variant discovery. We will report that we did not observe other genotype-specific coldspots of comparable scale to the Zin-9 event among the other analysed accessions, although smaller coldspots such as one corresponding to the previously reported 2.2Mb inversion on Chromosome 1 of N13 were identifiable. We will not provide formal estimates of sensitivity and specificity for structural-variant detection, as this would require independent benchmark datasets or dedicated simulations that are beyond the scope of the present study.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #3
Evidence, reproducibility and clarity
In this study, Parker et al. benchmark three single-cell sequencing modalities (scRNA-seq, scATAC-seq, and scWGA) in Arabidopsis gametes and deliver an open-source, end-to-end framework for data processing that enables high-throughput crossover mapping across hybrids. By systematically comparing these modalities, the work quantifies trade-offs in throughput, genomic coverage, and crossover detection sensitivity, offering timely guidance for experimental design in plant systems where single-cell genomics is still emerging and platform benchmarks are very limited. The pipelines are further supported by the discovery of a previously …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #3
Evidence, reproducibility and clarity
In this study, Parker et al. benchmark three single-cell sequencing modalities (scRNA-seq, scATAC-seq, and scWGA) in Arabidopsis gametes and deliver an open-source, end-to-end framework for data processing that enables high-throughput crossover mapping across hybrids. By systematically comparing these modalities, the work quantifies trade-offs in throughput, genomic coverage, and crossover detection sensitivity, offering timely guidance for experimental design in plant systems where single-cell genomics is still emerging and platform benchmarks are very limited. The pipelines are further supported by the discovery of a previously unrecognized ~10 Mb pericentric inversion in the Zin-9 accession. The experimental design is technically interesting, and the results are important for guiding plant single-cell research. The work has the potential to attract a broad readership. However, several aspects of the experimental design, validation strategy, and parameter robustness require further clarification and, where possible, additional analyses.
Major comments
- The modality comparison is based on one scRNA-seq library and two libraries each for scATAC-seq and scWGA. While the limited replication is acknowledged in the Discussion, the authors also report unexpected and run-specific observations (e.g. unusually high doublet rates in the 10x scRNA-seq library; "unexpected" doublet behavior in scWGA), making it difficult to separate platform-intrinsic properties from sample preparation and run-to-run variation. Differences in nuclei isolation buffers, purification strategies (e.g. density gradients, FACS, centrifugation), and potentially loaded nuclei numbers between platforms (which have not been specified in detail) further confound modality-level conclusions. For example, total usable barcodes vary drastically between the samples (e.g. 15k/20k/33k for 10x scRNA-seq, only 3.8k for BD even though it has the same capture capacity as 10X). Do these differences reflect different capture efficiencies between the platforms, or variation in nuclei quality/quantity, or modality-specific limitations in QC thresholds? It would strengthen the study to provide, for each library, the number of nuclei prior to loading and before/after QC, and to add independent biological replicates under modality-appropriate, optimized handling, ideally including a design where the same nuclei pool is split across all three modalities.
- All quantitative inferences rely on one custom analysis pipeline with multiple interdependent steps and fixed parameter choices (e.g. bin size, HMM transition structure, smoothing settings, background subtraction, doublet filters). The lack of benchmarking against independent crossover callers, or of systematic parameter sweeps, leaves it unclear how robust key patterns are to alternative analytical choices. It would substantially increase confidence to assess sensitivity of the main conclusions to key parameters (for example varying bin size, rigid chain length/transition penalties, enabling/disabling background subtraction and doublet filtering), and/or compare coelsch to other HMM-based crossover callers such as sgcocaller/comapr on at least a subset of the data.
- Accuracy is evaluated by comparisons to prior backcross/progeny datasets generated in different conditions, and by simulations calibrated to those references. While this is informative, systematic biases shared between the new pipeline and the reference datasets could remain undetected. Internal, orthogonal validation (e.g. progeny sequencing performed on the same F₁ plants used for single-cell assays) would provide a more direct ground truth and avoid potential circularity in bias assessment.
- The benchmark does not evaluate the impact of sequencing depth across modalities, which could influence the variation in per-barcode fragment counts and genomic bin coverage between scRNA-seq, scATAC-seq, and scWGA. Down-sampling aligned reads or informative fragments to fixed per-barcode targets (e.g. 250, 500, 1000 informative fragments) within each modality would clarify how much of the observed performance gap is attributable to depth rather than modality-specific biology or library structure. Constructing depth-matched subsets between scWGA and scATAC/scRNA datasets would help to test whether the breadth vs. depth trade-offs persist when sequencing resources are equalized.
- In the pooled 34-hybrid single-nucleus RNA-seq dataset, it would be very informative to present detection sensitivity and resolution across genotypes (e.g. captured nuclei, distributions of informative fragments, covered bins, and expected localization error by genotype). Genotypes will differ in expression patterns, which will alter the number and distribution of informative fragments per nucleus, and thus ultimately influence inferred recombination rates and crossover terminality. Furthermore, the background subtraction filter relies on genotype-level background models. Given that all genotypes were pooled prior to nuclei isolation, can the authors show that estimated ambient/background profiles are comparable across genotypes?
Minor comments
- The manuscript currently attributes more uneven coverage in scRNA-seq primarily to expression-biased sampling of heterozygous sites. Would the choice of using nuclei, rather than whole cells which would also allow the capture of cytosolic RNA, for the scRNA-seq be an additional reason for lower total number and genomic dispersion of informative fragments?
- The sentence "This allows informed experimental and analytical choices ..." could be accompanied with a compact infographic or table (for example as an extension of Fig. 1B) summarizing key trade-offs and recommended use-cases for each modality (throughput, per-cell resolution, coverage breadth, susceptibility to doublets/ambient RNA, recommended applications).
- Related to the point above, the choice to profile the F₁ hybrids using the 10x scRNA-seq modality is understandable from a throughput perspective, but the results presented in Fig. 1 and Table 1 suggest scWGA offers higher crossover accuracy, scATAC superior genomic breadth, compared to 10x scRNA-seq which in addition also showed a high doublet rate. Expanding the rationale for prioritizing scRNA-seq here (e.g. cost, compatibility with downstream expression analyses, or technical constraints for scWGA/scATAC at this scale) would clarify the experimental logic for the reader.
Referee cross-commenting
I strongly agree with the points raised by Reviewers #1 and #2. In particular, including additional replicates (ideally derived from the same pollen pool, processed identically and run across all modalities) would provide robustness to the benchmark. However, repeating these experiments, re-running the benchmark, and updating the interpretation would require substantial additional time, likely exceeding the suggested 1-3 month revision timeframe proposed by the other reviewers. Additional clarification of the analysis and representation of requested details (e.g. the recombination landscape plots (Reviewer #1), clarification of balanced pollen representation from each F₁ during pooling (Reviewers #2 and #3), and evaluation of how varying filtering strategies (e.g. doublet detection thresholds) affect the observed recombination patterns (Reviewers #2 and #3)) would also improve evaluation and transparency of the study. From a technical perspective major point 3 raised by Reviewer #2 (including information on the intrinsic biological characteristics of the material in the modality performance analysis) would provide substantially important context for users and improve interpretation of the benchmark.
Significance
Previous studies have successfully applied single-cell whole-genome amplification and linked-read sequencing to individual gametes to measure recombination rates and distributions, demonstrating the feasibility of this high-throughput alternative to progeny sequencing. This study extends that concept by delivering open-source pipelines for multiple single-cell modalities and by directly comparing the performance of scRNA-seq, scATAC-seq, and scWGA for mapping meiotic recombination in Arabidopsis gametes, offering both a practical resource and a performance evaluation for plant single-cell genomics.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #2
Evidence, reproducibility and clarity
This manuscript presents coelsch, a cross-platform computational framework for single-cell gamete recombination analysis. It systematically benchmarks the performance of four mainstream single-cell sequencing modalities in meiotic crossover detection, successfully applies the method to a natural variation panel of Arabidopsis thaliana, and identifies the largest natural inversion reported in this species to date. This work demonstrates strong innovation, a complete technical pipeline, and significant biological implications. I would like to recommend revision. My concerns are listed below for the authors' consideration and revision.
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #2
Evidence, reproducibility and clarity
This manuscript presents coelsch, a cross-platform computational framework for single-cell gamete recombination analysis. It systematically benchmarks the performance of four mainstream single-cell sequencing modalities in meiotic crossover detection, successfully applies the method to a natural variation panel of Arabidopsis thaliana, and identifies the largest natural inversion reported in this species to date. This work demonstrates strong innovation, a complete technical pipeline, and significant biological implications. I would like to recommend revision. My concerns are listed below for the authors' consideration and revision.
Major concerns
- Biological Replicates and Batch Effect Control The number of biological replicates per sequencing modality is limited (2 libraries for 10x scATAC and Takara scWGA, 1 library each for 10x scRNA and BD scRNA), and experiments for different modalities were performed in separate batches. Have the authors evaluated the impact of inter-batch technical variation on recombination rate estimates? In particular, for platforms with drastically different doublet rates (e.g., 49.7% for 10x scRNA vs. 26.3% for BD scRNA), how did the authors distinguish or avoid inherent platform differences from batch effects?
The natural variation analysis used a pooled library strategy for 40 F₁ hybrids without biological replicates. How did the authors ensure balanced pollen representation of each F₁ during pooling? For the 6 F₁ hybrids excluded due to insufficient data, was this due to initial pooling bias or sequencing capture preference? Could this introduce systematic bias into the natural variation analysis results?
- Consistency of Pollen Nuclei Isolation Methods Different nuclei isolation protocols were used for each sequencing modality: Percoll density gradient centrifugation for 10x scATAC, no Percoll purification for Takara scWGA, and flow cytometry sorting combined with 10x/BD scRNA. Have the authors assessed how these different isolation methods affect nuclei integrity, viability, and capture bias for pollen nuclei? For example, could flow cytometry sorting selectively exclude nuclei of specific sizes or densities, thereby compromising the representativeness of recombination rate estimates? 3.Systematic impact of the inherent structure of pollen on different sequencing modalities Mature Arabidopsis thaliana pollen has a canonical trinucleate structure, consisting of one transcriptionally hyperactive vegetative nucleus and two sperm nuclei with highly condensed chromatin and almost complete transcriptional silencing. While all three nuclei share identical genome sequences, they exhibit fundamental differences in chromatin state and molecular features, which will have profoundly distinct effects on different sequencing modalities-an issue not addressed or controlled for in this study.
Differential technical capture bias: scRNA-seq and scATAC-seq rely on mRNA and accessible chromatin signals, respectively, and thus theoretically can only capture valid data from vegetative nuclei; sperm nuclei will be filtered out during quality control due to insufficient signal. In contrast, scWGA is based on whole-genome DNA amplification, independent of transcriptional activity or chromatin state, and can capture both vegetative and sperm nuclei. Have the authors validated the actual nuclear type composition in datasets from each modality through experiments (e.g., nuclear size sorting, DAPI staining quantification, immunofluorescence labeling)? Could this systematic difference in nuclear type composition compromise the fairness of performance comparisons between modalities? The uneven coverage of scRNA/scATAC is primarily determined by gene expression levels and chromatin accessibility (e.g., high coverage at highly expressed genes, extremely low coverage at heterochromatic regions such as centromeres), whereas coverage bias in scWGA mainly stems from technical preferences of whole-genome amplification. When comparing the resolution and accuracy of recombination detection across modalities, did the authors clarify the contributions of "intrinsic biological characteristics of nuclear types" from "technical characteristics of the sequencing technologies themselves"?
- Accuracy and Validation of Doublet Detection Method This study reports exceptionally high doublet rates (~49% for 10x scATAC, ~70% for Takara scWGA), and there is a significant discrepancy with the results from Takara's official CellSelect software (80% of wells labeled "Good" by CellSelect were classified as doublets by coelsch). Have the authors validated the false positive and false negative rates of coelsch's doublet detection method through independent experiments (e.g., mixing pollen of known genotypes, manual microscopic validation of selected wells)? Such a high doublet filtering rate leads to a drastic reduction in the number of effective cells (e.g., only 628 singlets remained from a total of 2081 barcodes in the two Takara scWGA libraries). Have the authors assessed the representativeness of the remaining cells after filtering? In particular, for low-coverage scRNA data, could filtering result in the loss of cells with specific recombination patterns?
- Depth and Breadth of Natural Variation Analysis This study finds significant differences in recombination rate and terminal crossover enrichment among different natural accessions, with Cvi-0 hybrids exhibiting higher overall recombination rates but lower terminal recombination rates. Have the authors further explored the genetic basis underlying these differences? Besides the 10 Mb inversion in Zin-9, did the authors identify similar large structural variations in other natural accessions? What is the sensitivity and specificity of the recombination coldspot-based method for detecting structural variation? For example, what is the minimum size of inversions or translocations that can be reliably detected?
Minor concerns
- The mutants used in this study (zyp1, figl1, recq4ab, etc.) were generated by crossing mutant lines in the Col-0 background with corresponding mutant lines in the Ler background, resulting in heterozygous F₁ backgrounds. For example, the zyp1 mutant used Col-0 background zyp1-1 and Ler background zyp1-6. Could this heterozygous mutant background affect the accurate measurement of meiotic processes and recombination rates? Have the authors considered validation using F₁ populations from homozygous mutant lines?
- The Takara scWGA dataset for wild-type Col-0 × Ler contains only 224 high-quality nuclei, while mutant sample sizes range from tens to hundreds. Is this sample size sufficient for fine-scale analysis of recombination rate distributions, especially for the detection of low-frequency recombination events? There are also a few minor issues regarding the references-some appear to be duplicates, such as references 11 and 31, which seem to be the same in both the published version and the bioRxiv preprint. Please double check. Additionally, have the authors considered the cost implications of these single-cell-based technologies, as well as their previously published linked-read sequencing approach?
Overall, this manuscript represents an important technical breakthrough in the field of meiotic recombination research, providing a unified computational framework for large-scale, cross-platform single-cell gamete recombination analysis. The above questions mainly focus on the rigor of experimental design (especially the omission of the unique biological issue of pollen trinucleate structure), the depth of computational method validation, and the expansion of biological findings, and do not affect the core conclusions of the manuscript. I suggest that the authors address these questions and provide clear responses in the revised manuscript. If these issues are properly resolved, this work will provide a powerful tool for investigating the genetic and molecular mechanisms of plant meiotic recombination.
Referee cross-commenting
I agree with Reviewers 1 and 3. Addressing most of the points we raised would bring this manuscript to publication standard.
Significance
This study develops a unified computational framework for meiotic crossover (CO) mapping using single‑cell sequencing of Arabidopsis pollen, benchmarks four single‑cell modalities, and identifies natural recombination variation and a large novel pericentric inversion. Overall, the work is technically sound, biologically meaningful, and fills a key gap in scalable gamete‑based recombination profiling.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #1
Evidence, reproducibility and clarity
Summary:
Parker et al. present coelsch and coelsch_mapping_pipeline, two open-source tools for platform-agnostic haplotyping and crossover detection from single-cell sequencing data, benchmarked across four modalities: 10x scATAC, 10x scRNA, BD scRNA, and Takara scWGA. The study applies these tools to Arabidopsis thaliana F₁ pollen to recover known recombination frequencies, characterise the effects of coverage sparsity via simulation, and profile natural variation in crossover rate and distribution across 34 F₁ hybrids from 22 diverse accessions. As a by-product of the recombination maps, the authors identify a previously …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #1
Evidence, reproducibility and clarity
Summary:
Parker et al. present coelsch and coelsch_mapping_pipeline, two open-source tools for platform-agnostic haplotyping and crossover detection from single-cell sequencing data, benchmarked across four modalities: 10x scATAC, 10x scRNA, BD scRNA, and Takara scWGA. The study applies these tools to Arabidopsis thaliana F₁ pollen to recover known recombination frequencies, characterise the effects of coverage sparsity via simulation, and profile natural variation in crossover rate and distribution across 34 F₁ hybrids from 22 diverse accessions. As a by-product of the recombination maps, the authors identify a previously unrecognised ~10 Mb pericentric inversion in the accession Zin-9 - the largest natural inversion described to date in A. thaliana.
This is an interesting and important study and is suitable in scope and rigour for publication in a Review Commons affiliate journal. By combining computational and experimental framework, the authors address a genuine methodological gap: while single-cell gamete sequencing is a powerful approach for recombination mapping, the consequences of choosing among available sequencing modalities have not been systematically evaluated. The tools are open-source, data are deposited, and the biological conclusions are well-grounded. Importantly, the limitations of the tools are also mentioned, which is appreciated. Therefore, this manuscript presents a genuinely useful methodological framework that fills a real gap in the recombination biology toolkit. The biological discovery (Zin-9 inversion) adds independent value. However, several analytical choices require better justification, some results sections are under interpreted, and a number of presentation issues should be addressed before acceptance.
Major comments:
- Mismatch between best-performing modality and diversity panel application
The most critical concern is a logical inconsistency in the experimental design. The authors demonstrate convincingly that Takara scWGA achieves higher per-cell resolution and more accurate crossover detection than the droplet-based RNA methods. Yet the diversity panel - the study's key biological application - is analysed exclusively using 10x scRNA. No comparison with other modalities is provided for the panel, and no external recombination data for these accessions are included for validation. The authors should either: (i) include at least a subset of accessions profiled by an additional modality; or (ii) provide a more thorough quantitative justification for why 10x scRNA throughput outweighs the loss of resolution in this specific context, showing that cross-accession comparisons remain interpretable at scRNA coverage levels.
- Could variation in crossover terminality result from analysis artefacts?
The authors demonstrate consistently higher rates of terminal crossovers in Col hybrids than in Cvi hybrids, 'implying genetic background modulation of crossover localisation'. However, their simulation analysis also demonstrates that telomere proximal crossovers are disproportionally missed in 10x RNA data. Therefore, could the Col vs. Cvi terminality differences result from a greater/lower occurrence of false negatives in different genotypes using this approach, rather than bona fide differences in CO number (caused by e.g. differences in telomere proximal marker density in Col vs. Cvi)? If so, this should be explicitly mentioned.
- Doublet rates in Takara scWGA are unexplained
The Takara iCELL8 platform implements microscopy-based automated well selection to prevent doublets, yet coelsch identifies a ~70% doublet rate in these libraries. This is mentioned briefly but not adequately explained in the main text. The authors should provide a more thorough explanation for why the CellSelect imaging software fails to exclude pollen nuclei doublets (likely due to small nuclear size), and they should discuss what this implies for the utility of this platform for future experiments. This is important practical information for readers considering the Takara workflow.
- Recombination landscape figures are incomplete
Figure 2C shows recombination landscapes only for mutant genotypes profiled by Takara scWGA. Equivalent per-chromosome landscape plots should be provided for all modalities tested on wild-type Col-0 × Ler material. This is essential to visually communicate the coverage-driven differences in landscape resolution that the authors describe, and to verify that 10x scATAC and scRNA recover similar gross distributions despite lower per-cell depth.
- Extreme crossovers number in 10x scATAC are not discussed
The violin plots in Figure 2A show that 10x scATAC produces a wider upper tail of estimated crossover numbers than other modalities, with some barcodes exceeding 20 crossovers per nucleus - values far above the biological expectation for Arabidopsis. This is not acknowledged or explained. Is this an artefact of the high doublet contamination in this dataset (even after filtering), or a property of the HMM applied to fragmented ATAC data? An explicit discussion or supplementary analysis is required.
- Resolution of crossover detection is undereported
Figure 3C shows boxplots of crossover localisation error across modalities, but this analysis is not discussed quantitatively in the main text. Readers need to understand the practical resolution (in kb) achievable by each modality in terms of crossover interval size. This is particularly important because the paper claims applicability for genetic mapping experiments, where localisation precision directly determines utility.
- Telomeric false-negative rate in scWGA is not reported
The simulation analysis of false negatives near telomeres (Figure 3B) is presented only for 10x RNA data. Given that the authors use Takara scWGA for mutant genotyping and claim higher sensitivity, it is critical to also show the telomeric false-negative profile for scWGA. The current text implies that scWGA should avoid this problem, but this is not demonstrated.
- Comparison between libraries from the same modality is absent
Two independent 10x scATAC and two Takara scWGA libraries were generated, but no within-modality reproducibility analysis of crossover rates or landscapes is presented. Crossover rates and landscape correlations between technical replicates should be shown to establish that the observed modality-level differences are not driven by library-preparation variability.
- Applicability to non-Arabidopsis and heterozygous species
The Discussion notes that the approach relies on isogenic founder crosses and high-quality parental assemblies but does not explore the practical barriers to applying coelsch in outcrossing or polyploid species. Given the broad framing of the title ('platform-agnostic'), the authors should discuss what adaptations would be needed for crop species or other organisms where chromosome-scale haplotype-resolved assemblies are not available.
Minor comments:
- Figure 5B - Please add axis labels in Mb.
- Figure 2A - library replicates: The two 10x scATAC libraries are not differentiated in Figure 2A. Showing them separately (or indicating per-library medians) would improve transparency.
- Droplet vs. plate combination: The Discussion does not address whether complementary modalities could be combined (e.g., using droplet-based data for landscape estimation and scWGA for localisation refinement within the same experiment). A brief discussion of this possibility would strengthen the practical utility of the framework.
Referee cross-commenting
All points raised by reviewers 2 & 3 seem reasonable and would substantially improve the quality of the manuscript
Significance
General assessment: The paper from Parker et al., provides the first systematic evaluation of single-cell sequencing modalities for recombination mapping in Arabidopsis and presents new bioinformatic tools for analysing recombination in single-cell data. The novel utility of the approach is demonstrated for assessing recombination rate across a wide variety of Arabidopsis hybrids. Different platforms provide different benefits/limitations and these are well presented. However, the manuscript would benefit from a more thorough presentation of all the different analyses that were performed.
Advance: Most recombination mapping studies in Arabidopsis utilise progeny sequencing. Here, the authors present an alternative approach, using single-cell gamete sequencing which will more easily facilitate recombination mapping in large populations, which will be particularly useful for future studies investigating the influence of natural variation on recombination rate and location. The advance is mostly technical, but the study also generates novel biological observations about chromosome structural rearrangements in Arabidopsis.
Audience: The study is likely to be of main interest to individuals studying recombination in plants (particularly using bioinformatic approaches and analysing the influence of natural variation). However, researchers with an interest in single-cell sequencing and broader genomics will also be an audience for this paper.
Describe your expertise:
I am a researcher in plant meiotic recombination and I am well placed to assess the general importance and impact of the study within the context of the field. However, I would not consider myself a specific expert in bioinformatics.
-
