Improved Prediction of Bacterial Type VI Secretion Effector Proteins Using an Integrated Convolutional Neural Network Model Combining N-terminal Signal Sequences, Evolutionary Information and Pre-Trained Protein Language Features
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Type VI secretion system effectors (T6SEs) are crucial for bacterial pathogenicity, making their accurate identification essential for understanding bacterial virulence mechanisms. This study analyzed the differences in amino acid composition of N-terminal signal sequences between T6SEs and non-T6SEs, uncovering distinct positional amino acid preferences in T6SEs. Using a combination of unsupervised and supervised analysis, we evaluated feature encoding methods and developed T6CNN, an ensemble model that integrates N-terminal signal sequences, evolutionary information, and pre-trained protein language features for T6SE prediction. T6CNN demonstrated outstanding performance in independent testing, outperforming existing tools with a 7.9% accuracy increase (to 0.953), a 13.2% sensitivity improvement (to 0.964), and a 6.6% specificity enhancement (to 0.951). The T6CNN model offers a reliable and accurate solution for T6SE prediction, with significant potential to advance research on bacterial pathogenicity.
Importance
This study introduces T6CNN, a new computational model for identifying Type VI secretion system effectors used by harmful bacteria. By analyzing early protein sequences, T6CNN uncovers unique features that reliably distinguish effector proteins. Integrating evolutionary data and pre-trained protein language features, the model outperforms existing methods in accuracy, sensitivity, and specificity. This enhanced prediction tool deepens our understanding of bacterial infection mechanisms and offers researchers a valuable resource for pinpointing key virulence factors. Ultimately, T6CNN may help drive the development of more targeted antibacterial treatments and strategies to combat bacterial diseases.