Evaluation of in silico predictors on short nucleotide variants in HBA1, HBA2, and HBB associated with haemoglobinopathies

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    The increased use of gene and exome sequencing of individuals for diagnostic purposes has led to the identification of numerous single nucleotide variants (SNVs). However, annotating the probable clinical significance of every newly identified variant relies on multiple criteria, and in silico predictions can be used by curation experts to classify variants in databases. Since the reliability of such predictions is of paramount importance, this study compares the performance of 31 computational tools in classifying the pathogenicity of SNVs in the human adult globin genes and proposes an improved approach to achieve balanced predictions. The paper will be of interest to scientists and clinicians in the field of hemoglobinopathies.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Haemoglobinopathies are the commonest monogenic diseases worldwide and are caused by variants in the globin gene clusters. With over 2400 variants detected to date, their interpretation using the American College of Medical Genetics and Genomics (ACMG)/Association for Molecular Pathology (AMP) guidelines is challenging and computational evidence can provide valuable input about their functional annotation. While many in silico predictors have already been developed, their performance varies for different genes and diseases. In this study, we evaluate 31 in silico predictors using a dataset of 1627 variants in HBA1 , HBA2, and HBB . By varying the decision threshold for each tool, we analyse their performance (a) as binary classifiers of pathogenicity and (b) by using different non-overlapping pathogenic and benign thresholds for their optimal use in the ACMG/AMP framework. Our results show that CADD, Eigen-PC, and REVEL are the overall top performers, with the former reaching moderate strength level for pathogenic prediction. Eigen-PC and REVEL achieve the highest accuracies for missense variants, while CADD is also a reliable predictor of non-missense variants. Moreover, SpliceAI is the top performing splicing predictor, reaching strong level of evidence, while GERP++ and phyloP are the most accurate conservation tools. This study provides evidence about the optimal use of computational tools in globin gene clusters under the ACMG/AMP framework.

Article activity feed

  1. Evaluation Summary:

    The increased use of gene and exome sequencing of individuals for diagnostic purposes has led to the identification of numerous single nucleotide variants (SNVs). However, annotating the probable clinical significance of every newly identified variant relies on multiple criteria, and in silico predictions can be used by curation experts to classify variants in databases. Since the reliability of such predictions is of paramount importance, this study compares the performance of 31 computational tools in classifying the pathogenicity of SNVs in the human adult globin genes and proposes an improved approach to achieve balanced predictions. The paper will be of interest to scientists and clinicians in the field of hemoglobinopathies.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

  2. Reviewer #1 (Public Review):

    Selecting appropriate Bioinformatics approaches to arrive at a consensus classification of SNVs can be labor intensive and misleading due to discordance in results from different programs. The authors evaluated 31 Bioinformatic or computational tools used for in silico prediction of single nucleotide variants (SNVs). They selected a filtered list of SNVs at the HBA1, HBA2, and HBB genes, and compared in silico prediction results with annotations based on evidence in literature and databases curated by an expert panel comprising co-authors of this study. They found both specificity and concordance among different tools lacking in certain aspects when thresholds are chosen to maximize the Matthews correlation (MCC) and thus proposed an improved strategy. For this, the authors focused on the top prediction algorithms and varied their decision thresholds separately for pathogenic and benign variant classification and optimized the predictive power of these tools by choosing thresholds that generated at least supporting strength likelihood ratios (LRs) to achieve balanced classification.

    The authors have likely spent significant effort annotating the list of pathogenic or benign SNVs in adult globin genes by iteratively evaluating independent annotations submitted by experts and arriving at a consensus. These annotations when added to the database of SNVs might improve the breadth of knowledge on the pathogenicity of adult globin SNVs and likely lead to an improved prediction by the existing tools. Further, setting non-overlapping thresholds for pathogenic and benign variants seems to improve the balance in the prediction of some of the tools (with certain tradeoffs) in the context of the gene and the variant class. This is consistent with the findings of Wilcox et. al., while at the same time the authors have focused on globin variants and compared many more programs. Thus, while not a novel approach, the scale is expansive, and might guide future studies with the improved ACMG/AMP framework.

    However, there are certain caveats from my perspective and these need to be explained or improved.

    • The authors' approach relies heavily on the revised consensus annotations which, from my understanding, is essentially being considered as a "truth dataset", whereas variants are classified in silico according to existing annotations in the databases. The binary classification metrics compare the in silico predictions to the authors' annotations and these showed low specificity but higher sensitivity and accuracy indicating that many benign variants were misclassified as pathogenic. The authors have not clearly mentioned whether the "observed_pathogenecity" information in the input dataset in supplementary file 2 is from the Ithagenes database or the authors' reannotations. Hence, if a significant number of pathogenic variants were reannotated as benign by the expert panel, that will likely result in the tools misclassifying them as pathogenic since the tools rely on database annotations.

    • The results and measure of success focus on different benchmarks for the two major analyses the authors performed. While they generated a lot of data, they have not attempted to explore and present all facets of the data for each analysis. For instance, to assess the predictive power of the 31 tools initially, the authors focus on benchmark metrics for binary classification such as Accuracy, Sensitivity, Specificity and MCC. However, in the later improved approach, the focus is on LRs but the effect of separate thresholds for pathogenic and benign classification on accuracy, sensitivity, and specificity and MCC are not explored in the results instead just mentioning PPV for certain variant types, tools, and genes.

    • There is a general trade-off to altering thresholds to increase specificity which leads to reduced accuracy and sensitivity. Thus, in this case, the improved approach suggested by the authors increases specificity but there is a simultaneous reduction in accuracy and sensitivity thus leading to the potentially higher misclassification of pathogenic variants as benign. One has to consider then, whether this is ideal in the case of globins where an in silico misclassification of pathogenicity can be easily verified by subsequent diagnostic testing to confirm whether the variant actually affects hemoglobin. Overclassification of pathogenicity in the case of globins is thus not necessarily a major problem since they will not directly lead to patients receiving treatment before additional confirmatory tests. However, misclassification of pathogenic variants as benign will pose greater harm to individuals at risk of disease.

    • This is a largely descriptive study of the performance of various programs, but the authors did not attempt to explain why according to them the various tools performed a certain way in their analysis. Thus, their rationale for proposing the improved approach of separate thresholds for pathogenic and benign variants was unclear. Attempting to understand whether there is a correlation between the type of data the tool uses, and its performance could explain the tools' prediction power and how to improve it. For instance, some of the tools are metapredictors that take as input scores from various other tools also tested in this study. Thus, there will be some redundancy in the final classification.

    • Expanding on the previous point, the reason for discordance in HBA genes but concordance in HBB was unclear. It might be a result of the bigger HBB dataset compared to HBA although the authors did not explore or mention whether the size of the dataset correlates with concordance. They also did not test for concordance or discordance after the separate thresholds were applied so it is not clear whether their proposed approach improves concordance for the HBA variant predictions of the top tools.

  3. Reviewer #2 (Public Review):

    The presented manuscript about "Evaluation of in silico predictors on short nucleotide variants in HBA1, HBA2 and HBB associated with haemoglobinopathies" is appropriate and appears to be well executed. The presentation is very clear and easy to comprehend. This paper is of interest to scientists within the field of Haemoglobinopathies diseases. This study helps us with the optimal use of in silico predictors in globin gene clusters.

  4. Author Response

    Reviewer 1

    Comment 1

    Selecting appropriate Bioinformatics approaches to arrive at a consensus classification of SNVs can be labor intensive and misleading due to discordance in results from different programs. The authors evaluated 31 Bioinformatic or computational tools used for in silico prediction of single nucleotide variants (SNVs). They selected a filtered list of SNVs at the HBA1, HBA2, and HBB genes, and compared in silico prediction results with annotations based on evidence in literature and databases curated by an expert panel comprising coauthors of this study. They found both specificity and concordance among different tools lacking in certain aspects when thresholds are chosen to maximize the Matthews correlation (MCC) and thus proposed an improved strategy. For this, the authors focused on the top prediction algorithms and varied their decision thresholds separately for pathogenic and benign variant classification and optimized the predictive power of these tools by choosing thresholds that generated at least supporting strength likelihood ratios (LRs) to achieve balanced classification.

    The authors have likely spent significant effort annotating the list of pathogenic or benign SNVs in adult globin genes by iteratively evaluating independent annotations submitted by experts and arriving at a consensus. These annotations when added to the database of SNVs might improve the breadth of knowledge on the pathogenicity of adult globin SNVs and likely lead to an improved prediction by the existing tools. Further, setting non-overlapping thresholds for pathogenic and benign variants seems to improve the balance in the prediction of some of the tools (with certain tradeoffs) in the context of the gene and the variant class. This is consistent with the findings of Wilcox et. al., while at the same time the authors have focused on globin variants and compared many more programs. Thus, while not a novel approach, the scale is expansive, and might guide future studies with the improved ACMG/AMP framework.

    Response: We would like to thank the reviewer for the positive comments and for appreciating the potential impact of the manuscript.

    Comment 2:

    However, there are certain caveats from my perspective and these need to be explained or improved.

    • The authors' approach relies heavily on the revised consensus annotations which, from my understanding, is essentially being considered as a "truth dataset", whereas variants are classified in silico according to existing annotations in the databases. The binary classification metrics compare the in silico predictions to the authors' annotations and these showed low specificity but higher sensitivity and accuracy indicating that many benign variants were misclassified as pathogenic. The authors have not clearly mentioned whether the "observed_pathogenecity" information in the input dataset in supplementary file 2 is from the Ithagenes database or the authors' reannotations. Hence, if a significant number of pathogenic variants were reannotated as benign by the expert panel, that will likely result in the tools misclassifying them as pathogenic since the tools rely on database annotations.

    Response: The IthaGenes database did not provide an annotation about the clinical impact of each globin gene variant before this study. Nevertheless, as part of this study, an initial annotation was provided based on the information already annotated in the database and the pathogenicity criteria defined for this study (pg 7). This initial set was subsequently provided to the experts that proceeded with confirmation or reclassification of each variant’s pathogenicity using a Delphi approach. The final classification was used for the study and is now also included in IthaGenes. This process is now clarified in the Methodology (pg. 7-8; lines 145-147 & 155-160), while the initial and final classification are provided in the dataset file (Supplementary File 2; sheet “Input-Full dataset”). In addition, Supplementary Figure 1 illustrates the changes in pathogenicity annotation after the expert evaluation, demonstrating that annotation of benign variants to pathogenic and vice versa was minimal.

    We consider the final annotation as the best available evidence regarding the pathogenicity of globin gene variants due to the involvement of experts responsible for molecular diagnosis of haemoglobinopathies in five countries worldwide and the use of a Delphi approach for the variant pathogenicity.

    Comment 3

    • The results and measure of success focus on different benchmarks for the two major analyses the authors performed. While they generated a lot of data, they have not attempted to explore and present all facets of the data for each analysis. For instance, to assess the predictive power of the 31 tools initially, the authors focus on benchmark metrics for binary classification such as Accuracy, Sensitivity, Specificity and MCC. However, in the later improved approach, the focus is on LRs but the effect of separate thresholds for pathogenic and benign classification on accuracy, sensitivity, and specificity and MCC are not explored in the results instead just mentioning PPV for certain variant types, tools, and genes.

    Response: In the latter improved approach, we do not use a binary prediction, but instead we trichotomize the problem to define different thresholds for each pathogenicity class. Therefore, benchmark metrics like MCC are not informative as many of the variants are not classified (i.e., they are classified as VUS) due to the grey zone between the benign and pathogenic thresholds. The benign and pathogenic thresholds are derived from two different binary classifiers, with their corresponding binary metrics provided in Supplementary File 3. We have now included the sensitivity and specificity of the pathogenic and benign classifiers, respectively, in Table 2 and we discuss them in pg 15-16 (lines 338-343).

    Moreover, Table 2 presents the analysis on the full dataset and, also, in different subsets of the dataset, based on variant type (missense/non-missense) and globin genes.

    Comment 4

    • There is a general trade-off to altering thresholds to increase specificity which leads to reduced accuracy and sensitivity. Thus, in this case, the improved approach suggested by the authors increases specificity but there is a simultaneous reduction in accuracy and sensitivity thus leading to the potentially higher misclassification of pathogenic variants as benign. One has to consider then, whether this is ideal in the case of globins where an in silico misclassification of pathogenicity can be easily verified by subsequent diagnostic testing to confirm whether the variant actually affects hemoglobin. Overclassification of pathogenicity in the case of globins is thus not necessarily a major problem since they will not directly lead to patients receiving treatment before additional confirmatory tests. However, misclassification of pathogenic variants as benign will pose greater harm to individuals at risk of disease.

    Response: The reduction of accuracy in the improved approach is not due to the misclassification of pathogenic variants as benign, but instead due to misclassification of benign or pathogenic variants as VUS. The reviewer is correct the LR-based approach decreases the accuracy of the method, but at the same time it increases the confidence of a pathogenic or benign classification. This is the basis of the Bayesian ACMG/AMP framework. We further clarify this point on pg 17 (lines 384-386).

    Misclassification of benign or pathogenic variants as VUS will not have a significant impact on variant classification. However, we do not agree that overclassification of pathogenic variants is not a major issue, because many countries provide prevention programmes based on prenatal screening (see IthaMaps: https://ithanet.eu/db/ithamaps?hc=3). Therefore, a misclassified benign or VUS variant as pathogenic can lead to unnecessary abortions, preimplantation genetic diagnosis tests and can have devastating psychological and economic impact on lives of affected individuals

    Comment 5

    • This is a largely descriptive study of the performance of various programs, but the authors did not attempt to explain why according to them the various tools performed a certain way in their analysis. Thus, their rationale for proposing the improved approach of separate thresholds for pathogenic and benign variants was unclear. Attempting to understand whether there is a correlation between the type of data the tool uses, and its performance could explain the tools' prediction power and how to improve it. For instance, some of the tools are metapredictors that take as input scores from various other tools also tested in this study. Thus, there will be some redundancy in the final classification.

    Response: We thank the reviewer for the suggestion. We have now addressed this point at the revised manuscript (pages: 18 – 19, lines: 397-408 and 417-426) by adding two paragraphs discussing the low concordance rate between predictors, aligned with previous studies. We also discuss the main reasons for the superior performance observed for meta predictors.

    Comment 6

    • Expanding on the previous point, the reason for discordance in HBA genes but concordance in HBB was unclear. It might be a result of the bigger HBB dataset compared to HBA although the authors did not explore or mention whether the size of the dataset correlates with concordance. They also did not test for concordance or discordance after the separate thresholds were applied so it is not clear whether their proposed approach improves concordance for the HBA variant predictions of the top tools.

    Response: In the later improved approach, we have assessed the concordance of the top performing tools as shown in Figure 3B and 3C, but without grouping by gene. We have now added a heatmap of concordance in Supplementary Figure 4 which still shows higher concordance for pathogenic variants on HBB (also see pg , lines 360-364).

    The lower concordance that the tools exhibit for HBA1 and HBA2 can be explained by the fact that the pathogenicity of HBA1 and HBA2 variants almost always cannot be determined by the phenotype at the heterozygous state simply, as is the case for HBB, due to gene numbers (4 alpha alleles VS 2 beta alleles).

    This exacerbates the frequent confusion between the categorization of pathogenic at the gene level (reduced expression) versus the phenotypic level! (See MacArthur’s seminal paper MacArthur DG, et al. Guidelines for investigating causality of sequence variants in human disease. Nature. 2014 Apr 24;508(7497):469-76. doi: 10.1038/nature13127. PMID: 24759409; PMCID: PMC4180223 – although you must be familiar with it). We have discussed this point in pg 18-19 (lines 409-416).