Pathogenic potential prediction of Vibrio parahaemolyticus by using pangenome data with high performance machine learning algorithms

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The presence of Vibrio parahaemolyticus ( Vp ) at various stages of seafood production has adversely affected public health and threatened the sustainability of the industry. To address the critical public health threats posed by this prevalent seafood-borne pathogen, this research applied advanced machine learning (ML) and deep learning (DL) algorithms to predict the pathogenic potential of Vp using pangenome data. Utilizing comprehensive pangenomic assemblies and sophisticated ML/DL models, this study achieved robust and precise pathogenic potential prediction of Vp based on source attribution, which provides a novel reliable diagnostic tool facilitating conventional serotyping and virulence gene combination approaches. Based on results, non-core regions in Vp pangenome exhibited useful signals ML models can utilize in pathogenic potential determination process. Tree-based ensemble learning methods (Random Forest and Gradient Boosting Trees) have shown the distinguished performance with AUC score 0.97 based on selected pangenome matrix. Furthermore, Convolutional Neural Network successfully predicted the pathogenic potential of isolates with slightly better performance with AUC score 0.98 based on full pangenome. Critical biological insights revealing critical pathogenic potential-associated genes were retrieved from established ML/DL models: the gene feature weight analysis from Random Forest revealed the importance of accessory genes during Vp evolution (similarly highlighted by Gram-cam analysis of Convolutional Neural Network), which provided potential guidance for future research direction.

Article activity feed