MoCETSE: A mixture-of-convolutional experts and transformer-based model for predicting Gram-negative bacterial secreted effectors

Hua Shi
Yihang Lin
Dachen Liu
Quan Zou

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Identifying effector proteins of Gram-negative bacterial secretion systems is crucial for understanding their pathogenic mechanisms and guiding antimicrobial strategies. However, existing studies often directly rely on the outputs of protein language models for learning, which may lead to difficulties in accurately recognizing complex sequence features and long-range dependencies, thereby affecting prediction performance. In this study, we propose a deep learning model named MoCETSE to predict Gram-negative bacterial effector proteins. Specifically, MoCETSE first uses the pre-trained protein language model ESM-1b to transform raw amino acid sequences into context-aware vector representations. Then, by employing a target preprocessing network based on a mixture of convolutional experts, multiple sets of convolutional kernel “experts” process the data in parallel to separately learn local motifs and short-range dependencies as well as broader contextual information, generating more expressive sequence representations. In the transformer module, MoCETSE incorporates relative positional encoding to explicitly model the relative distances between residues, enabling the attention mechanism to precisely recognize the sequential relationships and long-range functional dependencies among amino acids, thereby achieving high-accuracy prediction of secreted effectors. MoCETSE has demonstrated outstanding predictive ability in 5-fold cross-validation and independent testing. Benchmark test shows that the performance of MoCETSE surpasses existing excellent binary and multi-class classifiers.

Author Summary

Gram-negative bacteria inject effector proteins into host cells via secretion systems, disrupting normal cellular functions and inducing diseases. Accurately identifying these virulent proteins is key to understanding bacterial pathogenic mechanisms and developing therapies. However, existing methods face issues like feature redundancy, inadequate capture of long-range dependent signals, and low computational efficiency. We developed MoCETSE, a novel computational method enabling end-to-end intelligent prediction of effector proteins from raw sequences. Due to the high computational cost of position-specific scoring matrix encoding, we use pre-trained protein language models to extract structural, evolutionary, and functional features from sequences, providing biologically meaningful inputs for subsequent deep learning models. Our hybrid convolutional expert network reduces dimensionality of high-dimensional embeddings and extracts multi-scale features, effectively overcoming feature redundancy and information loss, and improving model performance and efficiency. In learning secretion signal features, relative positional encoding models amino acid order, capturing critical long-range dependent signals, and enhancing the biological interpretability of predictions. MoCETSE outperforms existing tools like DeepSecE in cross-category predictions, offering a high-throughput method for effector protein prediction and clues for studying bacterial infections and developing therapies.

Version published to 10.1101/2025.08.06.668857 on bioRxiv
Aug 8, 2025

DNABERT2-CAMP: A Hybrid Transformer-CNN Model for E. coli Promoter Recognition

This article has 4 authors:
1. Hua-Lin Xu
2. Xiu-Jun Gong
3. Hua Yu
4. Ying-Kai Wang
This article has no evaluationsLatest version Dec 28, 2025
Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Author Summary

Article activity feed

Related articles

DNABERT2-CAMP: A Hybrid Transformer-CNN Model for E. coli Promoter Recognition

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction