EpitopeGNN: A Graph Neural Network for Influenza A Virus Hemagglutinin Subtype Classification Based on 3D Structure
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Hemagglutinin (HA) is the primary surface protein of the influenza A virus, determining its subtype and antigenic properties. Traditional subtype classification methods rely on DNA or amino acid sequence analysis, which does not account for protein spatial folding. Methods: In this work, we propose EpitopeGNN—a graph neural network (GNN) that constructs a residue interaction network (RIN) from the 3D structure of HA and classifies the virus subtype. The model was trained on 249 structures from the Protein Data Bank (PDB), containing H1N1, H3N2, H5N1, and other subtypes. Results: After rigorous sequence redundancy reduction (92% identity), the model maintained 95–100% accuracy on non-redundant data, significantly outperforming sequence-only baselines (the best baseline achieved 85% for multi-class and 92.3% for binary classification). A significant correlation was found between the obtained structural embeddings and phylogenetic distances (r = 0.38, p < 0.001), confirming their biological relevance and opening opportunities for structural monitoring of virus evolution, as well as rapid analog searching for novel strains. Conclusions: We developed a new graph neural network that classifies influenza A virus subtypes directly from the 3D structure of hemagglutinin using residue interaction networks and physicochemical features, which can serve as a foundation for predicting influenza virus receptor specificity and epitope immunogenicity.