Benchmarking and Optimization of Methods for the Detection of Identity-By-Descent in High-Recombining Plasmodium falciparum Genomes

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This important study presents an evaluation of several tools used for detecting Identity-By-Descent (IBD) segments in highly recombining genomes, using simulated data to replicate the high recombination and low marker density of Plasmodium falciparum, the parasite responsible for malaria. Most of the evidence presented by the authors is solid demonstrating that users should be cautious calling IBD when SNP density is low and recombination rate is high. This study will be of interest to scientists working in the field of genome evolution and infectious diseases.

This article has been Reviewed by the following groups

Read the full article

Abstract

Genomic surveillance is crucial for identifying at-risk populations for targeted malaria control and elimination. Identity-by-descent (IBD) is increasingly being used in Plasmodium population genomics to estimate genetic relatedness, effective population size ( N e ), population structure, and signals of positive selection. Despite its potential, a thorough evaluation of IBD segment detection tools for species with high recombination rates, such as P. falciparum , remains absent. Here, we perform comprehensive benchmarking of IBD callers – probabilistic (<monospace>hmmIBD</monospace>, <monospace>isoRelate</monospace>), identity-by-state-based (<monospace>hap-IBD</monospace>, <monospace>phased IBD</monospace>) and others (<monospace>Refined IBD</monospace>) – using population genetic simulations tailored for high recombination, and IBD quality metrics at both the IBD segment level and the IBD-based downstream inference level. Our results demonstrate that low marker density per genetic unit, related to high recombination relative to mutation, significantly compromises the accuracy of detected IBD segments. In genomes with high recombination rates resembling P. falciparum , most IBD callers exhibit high false negative rates for shorter IBD segments, which can be partially mitigated through optimization of IBD caller parameters, especially those related to marker density. Notably, IBD detected with optimized parameters allows for more accurate capture of selection signals and population structure; IBD-based N e inference is very sensitive to IBD detection errors, with IBD called from <monospace>hmmIBD</monospace> uniquely providing less biased estimates of N e in this context. Validation with empirical data from the MalariaGEN Pf 7 database, representing different transmission settings, corroborates these findings. We conclude that context-specific evaluation and parameter optimization are essential for accurate IBD detection in high-recombining species and recommend <monospace>hmmIBD</monospace> for quality-sensitive analysis, such as estimation of N e in these species. Our optimization and high-level benchmarking methods not only improve IBD segment detection in high-recombining genomes but also enhance overall genomic analysis, paving the way for more accurate genomic surveillance and targeted intervention strategies for malaria.

Article activity feed

  1. eLife Assessment

    This important study presents an evaluation of several tools used for detecting Identity-By-Descent (IBD) segments in highly recombining genomes, using simulated data to replicate the high recombination and low marker density of Plasmodium falciparum, the parasite responsible for malaria. Most of the evidence presented by the authors is solid demonstrating that users should be cautious calling IBD when SNP density is low and recombination rate is high. This study will be of interest to scientists working in the field of genome evolution and infectious diseases.

  2. Reviewer #1 (Public review):

    Summary:

    Authors benchmarked 5 IBD detection methods (hmmIBD, isoRelate, hap-IBD, phasedIBD, and Refined IBD) in Plasmodium falciparum using simulated and empirical data. Plasmodium falciparum has a mutation rate similar to humans but a much higher recombination rate and lower SNP density. Thus, the authors evaluated how recombination rate and marker density affect IBD segment detection. Next, they performed parameter optimization for Plasmodium falciparum and benchmarked the robustness of downstream analyses (selection detection and NE inference) using IBD detected by each of the methods. They also tracked the computational efficiency of these methods. The authors work is valuable for the tested species and the analyses presented appear to support their claim that users should be cautious calling IBD when SNP density is low and recombination rate is high.

    Strengths:

    The study design was solid. The authors set up their reasoning for using P. falciparum very well. The high recombination rate and similar mutation rate to human is indeed an interesting case. Further, they chose methods that were developed explicitly for each species. This was a strength of the work, as well as incorporating both simulated and empirical data to support their goal that IBD detection should be benchmarked in P. falciparum.

    Weaknesses:

    The scope of the optimization and application of results from the work are narrow, in that everything is fine-tuned for Plasmodium. Some of the results were not entirely unexpected for users of any of the tested software that was developed for humans. For example, it is known that Refined IBD is not going to do well with the combination of short IBD segments and low SNP density. Lastly, it appears the authors only did one large-scale simulation (there are no reported SDs).

  3. Reviewer #2 (Public review):

    Summary:

    Guo et al. benchmarked and optimized methods for detecting Identity-By-Descent (IBD) segments in Plasmodium falciparum (Pf) genomes, which are characterized by high recombination rates and low marker density. Their goal was to address the limitations of existing IBD detection tools, which were primarily developed for human genomes and do not perform well in the genomic context of highly recombinant genomes. They first analysed various existing IBD callers, such as hmmIBD, isoRelate, hap-IBD, phased-IBD, refinedIBD. They focused on the impact of recombination on the accuracy, which was calculated based on two metrics, the false negative rate and the false positive rate. The results suggest that high recombination rates significantly reduce marker density, leading to higher false negative rates for short IBD segments. This effect compromises the reliability of IBD-based downstream analyses, such as effective population size (Ne) estimation.
    They showed that the best tool for IBD detection in Pf is hmmIBD, because it has relatively low FN/FP error rates and is less biased for relatedness estimates. However, this method is the less computationally efficient.
    Their suggestion is to optimize human-oriented IBD methods and use hmmIBD only for the estimation of Ne.

    Strengths:

    Although I am not an expert on Plasmodium falciparum genetics, I believe the authors have developed a valuable benchmarking framework tailored to the unique genomic characteristics of this species. Their framework enables a thorough evaluation of various IBD detection tools for non-human data, such as high recombination rates and low marker density, addressing a key gap in the field.
    This study provides a comparison of multiple IBD detection methods, including probabilistic approaches (hmmIBD, isoRelate) and IBS-based methods (hap-IBD, Refined IBD, phased IBD). This comprehensive analysis offers researchers valuable guidance on the strengths and limitations of each tool, allowing them to make informed choices based on specific use cases. I think this is important beyond the study of Pf.
    The authors highlight how optimized IBD detection can help identify signals of positive selection, infer effective population size (Ne), and uncover population structure.
    They demonstrate the critical importance of tailoring analytical tools to suit the unique characteristics of a species. Moreover, the authors provide practical recommendations, such as employing hmmIBD for quality-sensitive analyses and fine-tuning parameters for tools originally designed for non-P. falciparum datasets before applying them to malaria research.

    Overall, this study represents a meaningful contribution to both computational biology and malaria genomics, with its findings and recommendations likely to have an impact on the field.

    Weaknesses:

    One weakness of the study is the lack of emphasis on the broader importance of studying Plasmodium falciparum as a critical malaria-causing organism. Malaria remains a significant global health challenge, causing hundreds of thousands of deaths annually. The authors could have introduced better the topic, even though I understand this is a methodological paper. While the study provides a thorough technical evaluation of IBD detection methods and their application to Pf, it does not adequately connect these findings to the broader implications for malaria research and control efforts. Additionally, the discussion on malaria and its global impact could have framed the study in a more accessible and compelling way, making the importance of these technical advances clearer to a broader audience, including researchers and policymakers in the fight against malaria.

  4. Author response:

    Provisional Responses to Review #1's comments:

    We thank the reviewer for the comments, which highlight both strengths and weaknesses.

    We acknowledge that the optimized parameter values are somewhat specific to Plasmodium, as demographic and mutation/recombination rates can vary across species. However, we would like to emphasize that our simulation and benchmarking framework, along with associated tools like the efficient ibdutils, should be broadly applicable to many species, such as Apicomplexan parasites and other high-recombining eukaryotes, especially when their demographic and evolutionary parameters can be provided or estimated. We will update relevant paragraphs in the disucssion to highlight this point.

    Results related to Refined IBD may not seem unexpected, but our work demonstrates that its direct application to malaria parasites without species-specific optimization can be suboptimal, as has previously occurred in malaria research with their validity not formally evaluated. We believe it is crucial for the research community focusing on non-standard model organisms to validate assumptions made in methods developed for standard models, such as humans, before they are applided to new species.

    Although standard deviations (SDs) are not provided for many analyses, we argue that simulating 14 chromosomes independently serves as repeats (data were shown as means over chromosomes), particularly when assessing the accuracy of IBD segments or scanning for selection signals. For analyses that aggregate information across chromosomes, we are planning to conduct additional repeated simulations or analyses to quantify the uncertainty of estimates. In the upcoming revised version, we will provide SDs where appropriate and explanations when repeated simulation are not necessary given a large number of data points have well captured their empirical distributions.

    Provisional response to review #2's comment:

    Thank you to the reviewer for the suggestions. We agree with the comments, and addressing the mentioned weakness will improve the manuscript's clarity and impact. We plan to enhance the introduction by highlighting the significance of studying malaria and specifically focusing on P. falciparum in this work. We will also update the discussion to reinforce the connection between our findings and malaria research and control and further emphasize the broader implications for the field.