BERT-T6: Towards High-accuracy T6SS Bacterial Toxin Identification Using Protein Language Model

Xianwei Mo
Jianxiu Cai
Shirley W. I. Siu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Type VI secretion system effectors target the cell wall, membranes and nucleic acids, leading to the killing of bacteria or impairment of host cell defense mechanisms. Accurate identification of T6SEs will be beneficial to understand the virulence of these bacteria via type VI secretion systems as well as bacterial pathogenesis. Although some traditional machine learning-based and deep learning-based tools have been developed to distinguish T6SEs from non-T6SEs, we believe there is still room for further improvement. To obtain the robust feature for model construction, we successively investigate various classic sequence-based features and embeddings from pre-trained transformer-based protein language models. Building upon the model incorporating ProtBert embeddings, we employed a transfer learning approach to fine-tune the ProtBert protein language model with a downstream T6SE classification task. The resulting BERT-T6 model demonstrates performance significantly superior to baseline models. More importantly, with an accuracy of 0.959, a sensitivity of 0.909, a specificity of 0.973, a precision of 0.905, a F1-score of 0.907, MCC of 0.881, our model achieves performance competitive with state-of-the-art binary and multi-class predictors. This work highlights the effectiveness of utilizing BERT with transfer learning for T6SE prediction. BERT-T6 provides a robust and precise approach for identifying T6SEs, offering promise for enhancing studies of bacterial virulence mechanisms.

Version published to 10.1101/2025.10.17.683028 on bioRxiv
Oct 17, 2025

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025
Decrypting viral dark matter through key proteins using an NLP-enhanced framework

This article has 10 authors:
1. Zhihua Du
2. Min Li
3. Kaihuang Lin
4. Bo Xing
5. Yuehua Ou
6. Wenchen Song
7. Jie Chen
8. Junhua Li
9. Jianqiang Li
10. Minfeng Xiao
This article has no evaluationsLatest version Jan 13, 2026
Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

Decrypting viral dark matter through key proteins using an NLP-enhanced framework

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods