MoCETSE: A mixture-of-convolutional experts and transformer-based model for predicting Gram-negative bacterial secreted effectors

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Identifying effector proteins of Gram-negative bacterial secretion systems is crucial for understanding their pathogenic mechanisms and guiding antimicrobial strategies. However, existing studies often directly rely on the outputs of protein language models for learning, which may lead to difficulties in accurately recognizing complex sequence features and long-range dependencies, thereby affecting prediction performance. In this study, we propose a deep learning model named MoCETSE to predict Gram-negative bacterial effector proteins. Specifically, MoCETSE first uses the pre-trained protein language model ESM-1b to transform raw amino acid sequences into context-aware vector representations. Then, by employing a target preprocessing network based on a mixture of convolutional experts, multiple sets of convolutional kernel “experts” process the data in parallel to separately learn local motifs and short-range dependencies as well as broader contextual information, generating more expressive sequence representations. In the transformer module, MoCETSE incorporates relative positional encoding to explicitly model the relative distances between residues, enabling the attention mechanism to precisely recognize the sequential relationships and long-range functional dependencies among amino acids, thereby achieving high-accuracy prediction of secreted effectors. MoCETSE has demonstrated outstanding predictive ability in 5-fold cross-validation and independent testing. Benchmark test shows that the performance of MoCETSE surpasses existing excellent binary and multi-class classifiers.

Author Summary

Gram-negative bacteria inject effector proteins into host cells via secretion systems, disrupting normal cellular functions and inducing diseases. Accurately identifying these virulent proteins is key to understanding bacterial pathogenic mechanisms and developing therapies. However, existing methods face issues like feature redundancy, inadequate capture of long-range dependent signals, and low computational efficiency. We developed MoCETSE, a novel computational method enabling end-to-end intelligent prediction of effector proteins from raw sequences. Due to the high computational cost of position-specific scoring matrix encoding, we use pre-trained protein language models to extract structural, evolutionary, and functional features from sequences, providing biologically meaningful inputs for subsequent deep learning models. Our hybrid convolutional expert network reduces dimensionality of high-dimensional embeddings and extracts multi-scale features, effectively overcoming feature redundancy and information loss, and improving model performance and efficiency. In learning secretion signal features, relative positional encoding models amino acid order, capturing critical long-range dependent signals, and enhancing the biological interpretability of predictions. MoCETSE outperforms existing tools like DeepSecE in cross-category predictions, offering a high-throughput method for effector protein prediction and clues for studying bacterial infections and developing therapies.

Article activity feed