A structure and function-based complete mutational map of Human Hemoglobin using AI

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Hemoglob+in (Hb), a well-characterized protein central to oxygen transport and molecular medicine, serves as a model for studying how sequence variations influence protein structure and function. Its precise activity depends on tightly regulated structural dynamics, which can be disrupted by mutations that give rise to structural hemoglobinopathies—including sickle cell disease, unstable hemoglobins, methemoglobins, and hemoglobins with altered oxygen affinity—each associated with distinct functional and clinical consequences.Among genetic variants, missense mutations are the most widely studied in clinical settings. Accurately predicting their clinical impact remains challenging, requiring integration of evolutionary, biochemical, and structural data. While broad deep learning models like AlphaMissense show promise, they often lack interpretability and protein-specific precision. This motivates the development of focused models that leverage detailed knowledge of individual proteins, like hemoglobin, to improve both predictive power and mechanistic understanding.

In this work, we conducted a comprehensive analysis of all known and potential human adult hemoglobin (HbA) variants, guided by the hypothesis that a deep understanding of the sequence–structure–function relationship in Hb can yield interpretable and predictive insights into the functional and clinical consequences of single amino acid substitutions. We curated an updated dataset of HbA variants annotated with their clinical classifications—Benign, Pathogenic, or of Uncertain Significance (VUS)—and systematically mapped each to a range of features, including structural location and classification, predicted impact on folding stability, and evolutionary conservation. Using this data, we developed a pathogenicity prediction model and benchmarked it against AlphaMissense, demonstrating strong and complementary performance. Additionally, we generated a complete mutational landscape of all possible single amino acid substitutions (SAS) in HbA, providing a resource for future clinical interpretation.

Our findings provide insight into the molecular basis for variant effects in HbA and highlight the utility of combining structure-informed features with Machine Learning (ML) for variant interpretation. Moreover, our results offer a framework for evaluating the portability and interpretability of variant effect predictors across structurally dynamic systems, with implications in the improvement of variant classification in other protein families.

Article activity feed