SEHI-PPI: An End-to-End Sampling-Enhanced Human-Influenza Protein-Protein Interaction Prediction Framework with Double-View Learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Influenza poses a persistent global health challenge, necessitating innovative approaches to predict host-virus interactions and inform antiviral strategies. Despite advancements, machine-learning-based computational methods typically struggle with limited high-quality negative samples and inadequate modeling of complex host-virus interactions, which hinder predictive accuracy and generalization. To address these challenges, we present SEHI-PPI, a novel end-to-end protein-protein interaction (PPI) prediction for human-influenza. SEHI-PPI proposes a double-view deep learning approach to extract global and local sequence features with a novel adaptive negative sampling strategy for high-quality negative sample generation. SEHI-PPI outperforms various benchmarks, including the state-of-the-art large language models, with superior performance in sensitivity (0.986) and AUROC (0.987). In a test where both human and influenza protein families are new from the training data, our model reached an AUROC of 0.837. We further validate its generalizability by applying it to other human-virus PPI predictions, and on average, we achieved 0.929 in sensitivity and 0.928 in AUROC. Combined with the structural predictions from AlphaFold3, our case studies show that viral proteins predicted to bind the same human protein have similar structures and functions based on the clustering results. These discoveries demonstrate the reliability of our SEHI-PPI framework in uncovering biologically meaningful host-virus interactions and potential therapeutic targets.