A method for low-coverage single-gamete sequence analysis demonstrates adherence to Mendel’s first law across a large sample of human sperm

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The authors first develop a new flexible and robust method to detect deviations from Mendelian inheritance in genomic data from gametes. The authors then apply this method to study deviations from Mendelian inheritance in human sperm data, but find no evidence for it. Even though this is a negative result, and overall the results are expected based on previous studies. the reviewers agreed that the research is rigorous and valuable.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Recently published single-cell sequencing data from individual human sperm ( n =41,189; 969–3377 cells from each of 25 donors) offer an opportunity to investigate questions of inheritance with improved statistical power, but require new methods tailored to these extremely low-coverage data (∼0.01× per cell). To this end, we developed a method, named rhapsodi, that leverages sparse gamete genotype data to phase the diploid genomes of the donor individuals, impute missing gamete genotypes, and discover meiotic recombination breakpoints, benchmarking its performance across a wide range of study designs. We then applied rhapsodi to the sperm sequencing data to investigate adherence to Mendel’s Law of Segregation, which states that the offspring of a diploid, heterozygous parent will inherit either allele with equal probability. While the vast majority of loci adhere to this rule, research in model and non-model organisms has uncovered numerous exceptions whereby ‘selfish’ alleles are disproportionately transmitted to the next generation. Evidence of such ‘transmission distortion’ (TD) in humans remains equivocal in part because scans of human pedigrees have been under-powered to detect small effects. After applying rhapsodi to the sperm data and scanning for evidence of TD, our results exhibited close concordance with binomial expectations under balanced transmission. Together, our work demonstrates that rhapsodi can facilitate novel uses of inferred genotype data and meiotic recombination events, while offering a powerful quantitative framework for testing for TD in other cohorts and study systems.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    This manuscript reports a new analytical method (rhapsodi) to impute genotypes on human gamete data. The authors characterize the specificity and sensitivity of the approach and benchmark it against the current tool to analyze gamete data. rhapsodi is more efficient and versatile than the current approach, and thus represents an important technical feat. The last analysis of the manuscript is a reanalysis of the SpermSeq dataset, a massive sequencing effort to characterize recombination in human sperm haplotype data. rhapsodi fails to find any deviations from random segregation and challenges the notion that there are distorters in the human genome. In general, the manuscript represents an important technical piece but the results could be better contextualized to provide a perspective of what are the implications of the findings for our understanding of human recombination and segregation distortion.

    Thank you for appreciating the technical importance of our work for improving the analysis of transmission distortion (TD) based on low-coverage single-cell sequencing data from gametes. We agree that the results (in regard to the method performance, statistical power, and implications for human TD) should be better contextualized, which we address in a point-by-point manner below.

    Reviewer #2 (Public Review):

    This paper describes a new and powerful method of inferring gametic haplotypes using low-coverage sperm sequencing data, rhapsodi. It is a highly useful tool, and the authors demonstrate its robustness using simulations and comparisons to the current gold standard, Hapi. The authors also use the results of rhapsodi on a sample of low-coverage human sperm sequencing data to assess the evidence for moderate transmission distortion (TD), a pattern that previous studies using pedigrees have sought to identify without replicable success. The work's main strength lies in the method the authors have developed and their clear and thorough description and validation of its use. The rhapsodi method clearly performs substantially better than Hapi in several relevant use cases, and in some instances it is usable when Hapi would fail to run or require unreasonable resources. This study, then, provides a highly useful tool to researchers wishing to phase donor haplotypes, infer gamete genotypes, and estimate rough locations of recombination breakpoints using Sperm-seq data.

    Thank you for engaging with our method and for noting its use cases and performance.

    A major limitation is the lack of consideration of strong TD. Under this scenario, there may be "allelic dropout" in the low-coverage Sperm-seq data; without information on the parental genotype from somatic cells, over-transmission of one allele would appear to be absence of the alternate allele (i.e., the donor would be erroneously inferred to be homozygous). Some known examples of TD in other species are extremely strong; e.g., the SD locus in Drosophila can cause distortion as strong as k=0.99. Such cases seem highly likely to be missed using Sperm-seq + rhapsodi, and a lack of power to detect them would influence both ability to observe individual cases of TD as well as the authors' test for a global signal of biased transmission. Since the provided simulations only include scenarios up to 70% transmission of one allele, the paper does not address this potential limitation.

    The authors claim that their work conclusively excludes the presence of ongoing TD in their sample of human males, which, if they are from the same populations as former studies, may provide additional evidence against ongoing TD in these human populations. However, whereas earlier studies were only highly powered for extremely strong TD, the current method appears to be highest powered for intermediate levels of TD, strong enough to generate differences from binomial expectations, but not so strong that one allele might be missing in the low coverage pool of sperm serving as input to rhapsodi. This claim, then, may be better framed as a lack of evidence for TD of intermediate strength in current samples, rather than the strict adherence to Mendelian transmission indicated in the title.

    This is an interesting and important point, and we agree that extreme TD would produce apparent tracts of homozygosity across the sample of sperm genomes. Without external knowledge of heterozygous sites in the donor genome, such SNPs would be unobserved within the sperm sequencing data. To address this possibility, we performed additional simulations of very strong TD (transmission rate, k = 0.99; Figure 4-figure supplement 3; lines 416-434; lines 1062-1083). These simulations demonstrate that despite the homozygosity of the causal SNP, recombination in flanking regions recovers heterozygosity but still manifests extreme and detectable TD. Specifically, across 2,200 simulations (100 independent simulations x 22 chromosomes; k = 0.99) with parameters matching a typical Sperm-seq donor, we identified the TD signature in all 2,200 cases (Power = 1) despite homozygosity (and thus filtering) of the causal SNP in 89% of cases (1958 / 2200). This high power also holds for donor samples with higher (Power = 1) and lower (Power = 1) coverages, respectively.

    In summary, even though it is the case that the causal SNP and nearby flanking SNPs “drop out” of the data, recombination occurs as one extends out from these regions in both directions, and very strong signals (well beyond genome-wide significance thresholds) are detectable within these heterozygous regions. While we cannot attribute the signal to the true causal SNP, this limitation is not unique to our study, but is a general limitation of any study design (including pedigree and pooled sequencing studies) that must contend with linkage disequilibrium.

    Nevertheless, as highlighted by Reviewer 3, the use of the term “strict” in the title may be too subjective. TD of 5% or less could be considered strong from a population genetic perspective, but undetectable based on binomial variance and our stringent multiple testing corrections. We have therefore removed the word “strict” from the title and moderated the adjectives we use when describing the strength of detectable TD throughout the paper. We also enumerate various forms of TD that would be undetectable based on our study design in the Discussion (lines 581-586; lines 603-638).

    Reviewer #3 (Public Review):

    The authors reanalyze an existing dataset of single-cell Sperm-seq data to search for signals of transmission distortion. They develop an improved genotype imputation method and use this approach to phase donors and characterize the landscape of ancestry across each sperm genome. Using these data, the authors determined that there are no regions in any of the male donors' genomes that display a significant excess of TD. The main biological claim of the paper is that there is a strict adherence to Mendelian transmission ratios in human males.

    The computational approaches for accurately phasing and reconstructing haplotypes in individually lightly sequenced gametes is a potentially useful advance that I expect may be valuable for geneticists analyzing similar datasets. The quality of software documentation and usability is high. I have concerns about the appropriateness of the comparisons selected for this approach and the algorithm does not appear particularly novel.

    I have no doubt about the authors' basic conclusion that there are no strong male TD loci in the male donors examined. However, I find their statements about "strict adherence to Mendelian ratios" and many references to strong statistical power to be oversold. The power of this study is still quite limited relative to the strength of TD that we would expect to find in human populations.

    Thank you for your comments and for engaging with our manuscript so closely. We agree that additional discussion of statistical power, the strength of TD that can be detected, and the uses of our software are necessary, and these changes have substantially strengthened our revised manuscript.

    Major Concerns:

    There are really two distinct papers here. One is about improved imputation and crossover analysis from sperm-seq data and one is about TD. The bulk of the methodological development is a rework of the approach for genotype imputation and haplotype phasing in Sperm-seq. Yet, the major conclusions are focused on a scan for TD. I am left wondering if analyzing these data using the original method in the Bell et al paper would have produced different conclusions about either? If not, is there a systematic bias such that one would find an excess of false detections of TD? Phasing slightly more markers is not a particularly compelling link between these sections because even fairly sparsely distributed markers that are correctly phased would certainly be fine in a scan for TD within a single individual due to linkage. If this cannot be shown I wonder if this work would be better split into two manuscripts with one more technical paper describing the differences in recombination maps associated with rhapsodi and the other as a brief report stating that strong TD is probably uncommon in human males.

    While we agree that there are two important aspects of our study, we feel that the combination of a generalizable method as well as an application to test an important biological hypothesis is a strength of our work.

    For additional context, Dr. Bell is a co-author on our study and collaborated with us in part based on the motivation to build a reproducible software toolkit for similar analyses. Bell et al. (2020) did not implement their method as generalizable software, but rather as a set of analysis scripts tested only with their data and computing environment. Unlike our method (rhapsodi) and the comparison approach (Hapi), those scripts were not written as user-friendly software and are therefore less likely to be used by the research community.

    It is not surprising that rhapsodi outperforms Hapi since Hapi was designed for a very different quantity of samples and sequencing depths. I appreciate the authors' point that Hapi performed better than other methods in comparisons run by the Hapi authors. However, they were looking at very few gametes (10 or so, I believe). For that reason, this comparison is not appropriate to address the application to the datasets used in this paper. The authors should include an analysis comparing rhapsodi against hapcut2, PHMM and other methods that are appropriate for the full scale and sequencing depth of the data. Additionally, the original Bell paper used a phasing + HMM approach of some kind for exactly this data. Why wasn't that approach considered as a point of comparison?

    While your point is well taken, we do not believe that a direct comparison between rhapsodi and PHMM would provide additional insight. In the publication describing PHMM (Hou et al. 2013), their algorithm was designed for datasets containing lower numbers of cells (11-41) sequenced to higher coverage per cell (0.4-0.9) relative to the data analyzed by rhapsodi. PHMM is therefore, like Hapi, optimized for a more narrow range of parameters than rhapsodi. Across this range of parameters, Hapi uniformly performs better than PHMM. Other tools such as hapcut2 may be designed to work with lower coverages and higher cell numbers than PHMM and Hapi, but are designed for use exclusively with diploid genomes. rhapsodi is therefore the first haploid phasing tool that can work with large numbers of low-coverage cells and there is no existing software that operates in the same niche. While the parameter spaces of Hapi and rhapsodi only partially overlap, Hapi therefore remains the most appropriate point of comparison.

    In addition to the point about analysis scripts versus a generalizable software package, we note two major differences between the steps employed in Bell et al. 2020 and rhapsodi’s method:

    1. For phasing, Bell et al. (2020) used Hapcut2 in an “off-label” way that required artificial assignment of alleles from the same sperm cell to the same “read” for input. This approach ignores the positional information that was already encoded in the alignment and may not take full advantage of the co-inheritance patterns of the SNP alleles. The phasing method implemented in rhapsodi is a principled approach tailored to the structure of the input data and knowledge of the biological process of meiosis.

    2. For crossover discovery, Bell et al. (2020) handled genotype error by encoding an “error” state in the HMM. In our method, we assign gamete-level genotypes via HMM-based imputation prior to detecting recombination breakpoints. We believe dealing with the error prior to crossover discovery is a simpler approach that better leverages the strengths of HMMs.

    With respect to the method for imputation, no comparison is made to known recombination maps nor do the authors make any comparison across the maps derived from each donor. Reporting an improved method without it motivating novel biological conclusions is not compelling in itself. I suggest the authors expand that analysis to consider these are related questions. E.g., are there males whose recombination maps differ in specific regions? Are those associated with known major chromosomal abnormalities? Is this map consistent with estimates from LD, pedigrees, Bell et al?

    We agree that evaluating the inferred crossover landscape in relation to published maps would be useful as a technical evaluation of our method, though we respectfully disagree with the suggestion to expand the scope of the manuscript to the analysis of inter-individual variability in the crossover landscape—topics that were the main focus of Bell et al. (2020). The distinction between our work and that study was addressed in our responses to previous comments.

    To address the suggestion to compare to existing maps, we counted the number of inferred recombination events for each 1 Mbp genomic bin, pooling across the donors. We compared this result with a published male-specific recombination map inferred from trio sequencing data (Halldorsson et al. 2019) and observed a strong correlation with our map (R = 0.9; Figure 5-figure supplement 5). We have incorporated this in “Results: Application to data from human sperm” (lines 372-377; lines 385-391) and note the potential biological and technical reasons for the observed discrepancies (lines 391-399). One such technical reason for the observed modest discrepancy appears to be related to the sample sequencing depth of coverage. Rather than pooling the number of inferred recombination events for each bin across all donors, we repeated the correlation analysis in a donor specific manner. Then, we fit a linear regression model with the sample-specific sequencing depths of coverage as the predictor and the sample-specific correlations as the response variable. We found that the sample-specific correlation with the deCODE map was positively associated with depth of coverage (lines 391-399).

    Most of the validations presented are based on simulated data. This is fine and has some advantages, but real data imposes challenges that these analyses do not address. My understanding is that the Bell et al. (2020) paper includes a donor with a phased diploid genome. A comparison of rhapsodi's phasing accuracy against that genome should be included.

    Bell et al. 2020 included only sperm donors with previously unknown genomes, and phased their genomes via the sperm sequencing data. They validated their phasing approach in two ways: 1) via simulated data and 2) via comparing to the phase generated by Eagle (Loh et al 2016, Nat Gen) for one donor genome, specifically comparing the phase of neighboring sites phased with both approaches. Importantly, such population-based approaches achieve only local phasing of common variation, as opposed to the chromosome-scale phasing achieved via gamete sequencing. Nevertheless, we acknowledge that real data exhibits features that are not captured by simulated data. We tried to capture the most significant potential contributors from real data (e.g., genotyping errors) in our simulations. Our newly added comparisons to the Halldorsson et al. (2019) map help address this concern (Figure 5-figure supplement 5).

    The main biological conclusion about a "strict adherence to Mendelian expectations across sperm genomes" is an overstatement. Statistical power of this study is still limited relative to the strength of TD that would be expected within human populations. One reason is the multiple testing correction. Another is that 1000-3000 draws from a binomial distribution with expected p = 0.5 is just not sufficient to overcome binomial sampling variance. In light of this concern and the central conclusion of this paper, the authors' discussion of power is inadequate. The main text really should contain explicit discussion of the required genotype ratio skew for TD in each donor to be detected with good power. Given previous pedigree studies, it is not surprising that no significant TD was discovered that exceeded the necessary ~10% effect sizes to be detectable. Recent, much more powerful analyses in mice, Drosophila and plants, indicate that strong TD is probably uncommon and even weak effects can be detected but are uncommon.

    Thank you for these detailed suggestions regarding statistical power. Our manuscript is greatly improved by these updates to the power analysis and our comparison to alternative methods for investigating TD.

    Specifically, we added additional simulations of TD at different rates (including very strong TD, as also noted in response to Reviewer 1) to demonstrate the range in which our study would be able to detect TD in this sample, considering the burden of multiple testing (Figure 4-figure supplement 3).

    We added to the section titled “Results: Statistical power to detect moderate and strong TD” a statement about the strength of TD that would be detectable within the Sperm-seq dataset (lines 400-415). Briefly, the 25 donors have an average of 1711 gametes each (range 969-3377). Based on this sample size, we have Power = 0.681 to detect deviations of 0.07 (i.e., 57% transmission of one allele in a single donor) and Power = 0.912 to detect deviations of 0.08, accounting for multiple hypothesis testing across the genome and across donors (p-value threshold = 1.78 x 10-7). For an individual with 950 gametes, we have Power = 0.637 to detect deviations of 0.09 and Power = 0.84 to detect deviations of 0.1.

    Based on these calculations, we agree that the term “strict” is subjective and may be considered an over-statement depending on the point of comparison, and we have modified the title accordingly.

    This manuscript would benefit from a much clearer examination of statistical power and a detailed comparison of the power of this approach vs pedigree-based analyses as well as bulk gamete sequencing approaches. Although the authors are correct that all scans for TD in human genomes have been pedigree or single-cell based, more powerful alternatives are known. These are based on sequencing pools of individuals or gametes (e.g., Wei et al. 2017, Corbett-Detig et al. 2019). Each of those studies has been able to identify signatures of segregation distortion below the thresholds required for significance in this study. These and related works should be acknowledged in both the introduction and discussion. Although I appreciate that the ability to phase the genome in a single experiment may be appealing, phasing diploid genomes via hi-c omni-c is straightforward and the advantages in statistical power suggest that approaches using pools of gametes are preferable for well-powered scans for TD.

    Thank you for your suggestions regarding contextualizing the statistical power of single-gamete sequencing-based approaches. Our steps to address these comments have strengthened our manuscript and made the paper more applicable to future research.

    The single-cell nature of the low-coverage (~0.01x) Sperm-seq data allowed us to augment our sample size 100-fold at each SNP in a way that is not possible with a pooled sequencing approach. Pooled sequencing methods may augment statistical power for detecting TD by 1) combining information from nearby SNPs and 2) assuming different sperm are sampled at each site. This approach has relied on external knowledge of haplotypes (e.g., obtained through sequencing of inbred strains of Drosophila). This permits aggregation of alleles supporting one haplotype or the other across adjacent SNPs, which can increase statistical power. The same statistical test for TD cannot be applied to bulk sequencing data from human sperm (e.g., Bruess et al. 2019, Yang et al. 2021) without external knowledge of the parental haplotypes. One potential approach for circumventing this issue would be local phasing using patterns of LD from a reference panel, but this would limit the analysis to common SNPs within relatively small windows that can be adequately phased with such methods.

    It is not immediately obvious that pooled sequencing studies have greater power for discovering TD than single-cell studies. None of the pooled sequencing studies mentioned by the reviewer performed similarly exhaustive power analyses, and the power analyses that were performed in pooled sequencing studies were done in systems with different levels of heterozygosity, different genome sizes, different sample sizes of donor individuals, etc. All of these factors affect the multiple testing burden, making it impossible to compare directly to a study in humans. Given the above considerations, we believe that an in-depth analysis of the statistical power of pooled sequencing approaches for discovering TD in humans lies outside the scope of our study.

    We have nevertheless updated our manuscript to discuss the strengths of pooled sequencing methods as an approach for investigating TD, citing relevant studies in both the Introduction (lines 37-46) and Discussion (lines 508-529; lines 557-580). We acknowledge that these methods have been successfully applied in other species (e.g., Wei et al. 2017, Corbett-Detig et al. 2019) and their potential to improve statistical power. We note the steps that would be necessary for making these methods applicable for TD scans in humans as new datasets are produced.

    We added a general power analysis of pedigree studies (Figure 4-figure supplement 4A) to illustrate the large sample sizes necessary to detect weak TD. To demonstrate the large sample size required for a pedigree study to achieve strong statistical power, we plot the number of informative transmissions of each SNP in the two pedigrees from Meyer et al. 2012 for which data was publicly accessible (Figure 4-figure supplement 4B).

    Importantly, in a single-gamete sequencing study, the number of informative transmissions is equal to the number of genotyped gametes for all heterozygous SNPs. In a pedigree-based study, the number of informative transmissions varies across SNPs, as not all parent-offspring trios will include one or more parent heterozygous for a given SNP. For example, the Hardy-Weinberg expected proportion of heterozygous parents for a common SNP with an allele frequency of 0.5 is 2pq = 0.5. Meanwhile, variants at lower frequencies will possess smaller proportions of heterozygotes, thus capturing fewer informative transmissions and limiting statistical power. One implication of this distinction is that pedigree-based studies rely on distorter alleles that act across multiple families, effectively restricting such scans to variants that are common in the population. This contrasts with single-gamete sequencing studies, which provide equal power for detecting TD involving common and rare alleles, provided that they are heterozygous in the sampled donor individual. We note this in the Discussion (lines 508-529).

    As noted by the reviewer, single-cell sequencing allows both phasing and examination of TD in a single study, allowing the investigation of meiotic recombination and its potential relationship with TD and fertility profiles. We have added text in the Conclusion (lines 659-693) to address this important point. Because of this study design, we are uniquely positioned to detect TD caused by any rare alleles we do capture; this contrasts with pedigree-based studies, where a distorter would need to be acting across multiple families to be detectable (thus restricting these scans to common variants). We have noted this in the Discussion (lines 521-529).

  2. Evaluation Summary:

    The authors first develop a new flexible and robust method to detect deviations from Mendelian inheritance in genomic data from gametes. The authors then apply this method to study deviations from Mendelian inheritance in human sperm data, but find no evidence for it. Even though this is a negative result, and overall the results are expected based on previous studies. the reviewers agreed that the research is rigorous and valuable.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

  3. Reviewer #1 (Public Review):

    This manuscript reports a new analytical method (rhapsodi) to impute genotypes on human gamete data. The authors characterize the specificity and sensitivity of the approach and benchmark it against the current tool to analyze gamete data. rhapsodi is more efficient and versatile than the current approach, and thus represents an important technical feat. The last analysis of the manuscript is a reanalysis of the SpermSeq dataset, a massive sequencing effort to characterize recombination in human sperm haplotype data. rhapsodi fails to find any deviations from random segregation and challenges the notion that there are distorters in the human genome. In general, the manuscript represents an important technical piece but the results could be better contextualized to provide a perspective of what are the implications of the findings for our understanding of human recombination and segregation distortion.

  4. Reviewer #2 (Public Review):

    This paper describes a new and powerful method of inferring gametic haplotypes using low-coverage sperm sequencing data, rhapsodi. It is a highly useful tool, and the authors demonstrate its robustness using simulations and comparisons to the current gold standard, Hapi. The authors also use the results of rhapsodi on a sample of low-coverage human sperm sequencing data to assess the evidence for moderate transmission distortion (TD), a pattern that previous studies using pedigrees have sought to identify without replicable success.

    The work's main strength lies in the method the authors have developed and their clear and thorough description and validation of its use. The rhapsodi method clearly performs substantially better than Hapi in several relevant use cases, and in some instances it is usable when Hapi would fail to run or require unreasonable resources. This study, then, provides a highly useful tool to researchers wishing to phase donor haplotypes, infer gamete genotypes, and estimate rough locations of recombination breakpoints using Sperm-seq data.

    A major limitation is the lack of consideration of strong TD. Under this scenario, there may be "allelic dropout" in the low-coverage Sperm-seq data; without information on the parental genotype from somatic cells, over-transmission of one allele would appear to be absence of the alternate allele (i.e., the donor would be erroneously inferred to be homozygous). Some known examples of TD in other species are extremely strong; e.g., the SD locus in Drosophila can cause distortion as strong as k=0.99. Such cases seem highly likely to be missed using Sperm-seq + rhapsodi, and a lack of power to detect them would influence both ability to observe individual cases of TD as well as the authors' test for a global signal of biased transmission. Since the provided simulations only include scenarios up to 70% transmission of one allele, the paper does not address this potential limitation.

    The authors claim that their work conclusively excludes the presence of ongoing TD in their sample of human males, which, if they are from the same populations as former studies, may provide additional evidence against ongoing TD in these human populations. However, whereas earlier studies were only highly powered for extremely strong TD, the current method appears to be highest powered for intermediate levels of TD, strong enough to generate differences from binomial expectations, but not so strong that one allele might be missing in the low coverage pool of sperm serving as input to rhapsodi. This claim, then, may be better framed as a lack of evidence for TD of intermediate strength in current samples, rather than the strict adherence to Mendelian transmission indicated in the title.

  5. Reviewer #3 (Public Review):

    The authors reanalyze an existing dataset of single-cell Sperm-seq data to search for signals of transmission distortion. They develop an improved genotype imputation method and use this approach to phase donors and characterize the landscape of ancestry across each sperm genome. Using these data, the authors determined that there are no regions in any of the male donors' genomes that display a significant excess of TD. The main biological claim of the paper is that there is a strict adherence to Mendelian transmission ratios in human males.

    The computational approaches for accurately phasing and reconstructing haplotypes in individually lightly sequenced gametes is a potentially useful advance that I expect may be valuable for geneticists analyzing similar datasets. The quality of software documentation and usability is high. I have concerns about the appropriateness of the comparisons selected for this approach and the algorithm does not appear particularly novel.

    I have no doubt about the authors' basic conclusion that there are no strong male TD loci in the male donors examined. However, I find their statements about "strict adherence to Mendelian ratios" and many references to strong statistical power to be oversold. The power of this study is still quite limited relative to the strength of TD that we would expect to find in human populations.

    Major Concerns:

    There are really two distinct papers here. One is about improved imputation and crossover analysis from sperm-seq data and one is about TD. The bulk of the methodological development is a rework of the approach for genotype imputation and haplotype phasing in Sperm-seq. Yet, the major conclusions are focused on a scan for TD. I am left wondering if analyzing these data using the original method in the Bell et al paper would have produced different conclusions about either? If not, is there a systematic bias such that one would find an excess of false detections of TD? Phasing slightly more markers is not a particularly compelling link between these sections because even fairly sparsely distributed markers that are correctly phased would certainly be fine in a scan for TD within a single individual due to linkage. If this cannot be shown I wonder if this work would be better split into two manuscripts with one more technical paper describing the differences in recombination maps associated with rhapsodi and the other as a brief report stating that strong TD is probably uncommon in human males.

    It is not surprising that rhapsodi outperforms Hapi since Hapi was designed for a very different quantity of samples and sequencing depths. I appreciate the authors' point that Hapi performed better than other methods in comparisons run by the Hapi authors. However, they were looking at very few gametes (10 or so, I believe). For that reason, this comparison is not appropriate to address the application to the datasets used in this paper. The authors should include an analysis comparing rhapsodi against hapcut2, PHMM and other methods that are appropriate for the full scale and sequencing depth of the data. Additionally, the original Bell paper used a phasing + HMM approach of some kind for exactly this data. Why wasn't that approach considered as a point of comparison?

    With respect to the method for imputation, no comparison is made to known recombination maps nor do the authors make any comparison across the maps derived from each donor. Reporting an improved method without it motivating novel biological conclusions is not compelling in itself. I suggest the authors expand that analysis to consider these are related questions. E.g., are there males whose recombination maps differ in specific regions? Are those associated with known major chromosomal abnormalities? Is this map consistent with estimates from LD, pedigrees, Bell et al?

    Most of the validations presented are based on simulated data. This is fine and has some advantages, but real data imposes challenges that these analyses do not address. My understanding is that the Bell et al. (2020) paper includes a donor with a phased diploid genome. A comparison of rhapsodi's phasing accuracy against that genome should be included.

    The main biological conclusion about a "strict adherence to Mendelian expectations across sperm genomes" is an overstatement. Statistical power of this study is still limited relative to the strength of TD that would be expected within human populations. One reason is the multiple testing correction. Another is that 1000-3000 draws from a binomial distribution with expected p = 0.5 is just not sufficient to overcome binomial sampling variance. In light of this concern and the central conclusion of this paper, the authors' discussion of power is inadequate. The main text really should contain explicit discussion of the required genotype ratio skew for TD in each donor to be detected with good power. Given previous pedigree studies, it is not surprising that no significant TD was discovered that exceeded the necessary ~10% effect sizes to be detectable. Recent, much more powerful analyses in mice, Drosophila and plants, indicate that strong TD is probably uncommon and even weak effects can be detected but are uncommon.

    This manuscript would benefit from a much clearer examination of statistical power and a detailed comparison of the power of this approach vs pedigree-based analyses as well as bulk gamete sequencing approaches. Although the authors are correct that all scans for TD in human genomes have been pedigree or single-cell based, more powerful alternatives are known. These are based on sequencing pools of individuals or gametes (e.g., Wei et al. 2017, Corbett-Detig et al. 2019). Each of those studies has been able to identify signatures of segregation distortion below the thresholds required for significance in this study. These and related works should be acknowledged in both the introduction and discussion. Although I appreciate that the ability to phase the genome in a single experiment may be appealing, phasing diploid genomes via hi-c omni-c is straightforward and the advantages in statistical power suggest that approaches using pools of gametes are preferable for well-powered scans for TD.