Influ-BERT: An Interpretable Model for Enhancing Low-Frequency Influenza A virus Subtype Recognition

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Influenza A Virus (IAV) poses a continuous threat to global public health due to its wide host adaptability, high-frequency antigenic variation, and potential for cross-species transmission. Accurate recognition of IAV subtypes is crucial for the early pandemic warning. Here, we propose Influ-BERT, a domain-adaptive pretraining model based on the transformer architecture. Optimized from DNABERT-2, Influ-BERT constructed a dedicated corpus of approximately 900,000 in-fluenza genome sequences, developed a custom Byte Pair Encoding (BPE) tokenizer, and employ a two-stage training strategy involving domain-adaptive pretraining followed by task-specific fine-tuning. This approach significantly enhanced recognition performance for low-frequency subtypes. Experimental results demonstrate that Influ-BERT outper-forms traditional machine learning methods and general genomic language models (DNABERT-2, MegaDNA) in sub-type recognition, achieving a substantial improvement in F1-score, particularly for subtypes H5N8, H5N1, H7N9, H9N2. Furthermore, sliding window perturbation analysis revealed the model’s specific focus on key regions of the IAV genome, providing interpretable evidence supporting the observed performance gains.

Availability

Source code is written in PyTorch and available at https://github.com/oooo111/Influenza-BERT and https://huggingface.co/rongye1/Influenza_BERT under the MIT license.

Contact

songshh@big.ac.cn (Song S).

Biographical note

Rongye Ye is currently a master’s student at the Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation.

Lun Li is an assistant professor at Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation.

Shuhui Song is a professor at Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation.

Article activity feed