MOBFinder: a tool for MOB typing for plasmid metagenomic fragments based on language model

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

MOB typing is a classification scheme that classifies plasmid genomes based on their relaxase gene. The host range of plasmids of different MOB categories are diverse and MOB typing is crucial for investigating the mobilization of plasmid, especially the transmission of resistance genes and virulence factors. However, MOB typing of plasmid metagenomic data is challenging due to the highly fragmented characteristic of metagenomic contigs.

Results

We developed MOBFinder, an 11-class classifier to classify the plasmid fragments into 10 MOB categories and a non-mobilizable category. We first performed the MOB typing for classifying complete plasmid genomes using the relaxes information, and constructed the artificial benchmark plasmid metagenomic fragments from these complete plasmid genomes whose MOB types are well annotated. Based on natural language models, we used the word vector to characterize the plasmid fragments. Several random forest classification models were trained and integrated for predicting plasmid fragments with different lengths. Evaluating the tool over the benchmark dataset, MOBFinder demonstrates higher performance compared to the existing tool, with an overall accuracy of approximately 59% higher than the MOB-suite. Moreover, the balanced accuracy, harmonic mean and F1-score could reach 99% in some MOB types. In an application focused on a T2D cohort, MOBFinder offered insights suggesting that the MOBF type might accelerate the antibiotic resistance transmission in patients suffering from T2D.

Conclusions

To the best of our knowledge, MOBFinder is the first tool for MOB tying for plasmid metagenomic fragments. MOBFinder is freely available at https://github.com/FengTaoSMU/MOBFinder .

Article activity feed

  1. AbstractBackground MOB typing is a classification scheme that classifies plasmid genomes based on their relaxase gene. The host range of plasmids of different MOB categories are diverse and MOB typing is crucial for investigating the mobilization of plasmid, especially the transmission of resistance genes and virulence factors. However, MOB typing of plasmid metagenomic data is challenging due to the highly fragmented characteristic of metagenomic contigs.Results We developed MOBFinder, an 11-class classifier to classify the plasmid fragments into 10 MOB categories and a non-mobilizable category. We first performed the MOB typing for classifying complete plasmid genomes using the relaxes information, and constructed the artificial benchmark plasmid metagenomic fragments from these complete plasmid genomes whose MOB types are well annotated. Based on natural language models, we used the word vector to characterize the plasmid fragments. Several random forest classification models were trained and integrated for predicting plasmid fragments with different lengths. Evaluating the tool over the benchmark dataset, MOBFinder demonstrates higher performance compared to the existing tool, with an overall accuracy of approximately 59% higher than the MOB-suite. Moreover, the balanced accuracy, harmonic mean and F1-score could reach 99% in some MOB types. In an application focused on a T2D cohort, MOBFinder offered insights suggesting that the MOBF type might accelerate the antibiotic resistance transmission in patients suffering from T2D.Conclusions To the best of our knowledge, MOBFinder is the first tool for MOB tying for plasmid metagenomic fragments. MOBFinder is freely available at https://github.com/FengTaoSMU/MOBFinder.

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giae047), where the paper and peer reviews are published openly under a CC-BY 4.0 license. These peer reviews were as follows:

    Reviewer 2: Dan Wang

    The manuscript provides a comprehensive background on the necessity and challenges of MOB typing in the context of plasmid genomics and its significance in tracking the transmission of resistance genes and virulence factors. The innovation introduced by MOBFinder, which incorporates an 11-class classification system, addresses a critical gap in current research methodologies by enhancing the precision of plasmid fragment classification. Key Strengths: Innovation: MOBFinder represents a novel approach in the typing of metagenomic plasmid fragments using word vector characterization combined with machine learning techniques. Methodological Rigor: The methodological approach, including the use of random forest models and the construction of a benchmark dataset from annotated complete plasmid genomes, is robust and well-executed. Performance: The tool demonstrates superior performance compared to existing tools like MOBscan and MOB-suite, providing a significant improvement in accuracy. Impact on Field: The application of MOBFinder in a T2D cohort illustrates the tool's practical utility and its potential to influence antibiotic resistance studies. Recommendation: Given the thorough revisions and the contributions this manuscript offers to the field of microbial genomics and antibiotic resistance, I recommend that the manuscript be accepted for publication in GigaScience.

  2. AbstractBackground MOB typing is a classification scheme that classifies plasmid genomes based on their relaxase gene. The host range of plasmids of different MOB categories are diverse and MOB typing is crucial for investigating the mobilization of plasmid, especially the transmission of resistance genes and virulence factors. However, MOB typing of plasmid metagenomic data is challenging due to the highly fragmented characteristic of metagenomic contigs.Results We developed MOBFinder, an 11-class classifier to classify the plasmid fragments into 10 MOB categories and a non-mobilizable category. We first performed the MOB typing for classifying complete plasmid genomes using the relaxes information, and constructed the artificial benchmark plasmid metagenomic fragments from these complete plasmid genomes whose MOB types are well annotated. Based on natural language models, we used the word vector to characterize the plasmid fragments. Several random forest classification models were trained and integrated for predicting plasmid fragments with different lengths. Evaluating the tool over the benchmark dataset, MOBFinder demonstrates higher performance compared to the existing tool, with an overall accuracy of approximately 59% higher than the MOB-suite. Moreover, the balanced accuracy, harmonic mean and F1-score could reach 99% in some MOB types. In an application focused on a T2D cohort, MOBFinder offered insights suggesting that the MOBF type might accelerate the antibiotic resistance transmission in patients suffering from T2D.Conclusions To the best of our knowledge, MOBFinder is the first tool for MOB tying for plasmid metagenomic fragments. MOBFinder is freely available at https://github.com/FengTaoSMU/MOBFinder.

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giae047), where the paper and peer reviews are published openly under a CC-BY 4.0 license. These peer reviews were as follows:

    **Reviewer 1: Haruo Suzuki **

    I recommend that the authors consider revising based on the following points.

    1. the unpaired Wilcoxon signed-rank two-sided test. -> should be corrected to either "Wilcoxon rank-sum test" or "Mann-Whitney U test"

    https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test "Wilcoxon rank-sum test" redirects here. For Wilcoxon signed-rank test, see Wilcoxon signed-rank test. https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test Not to be confused with Wilcoxon rank-sum test.

    1. Since MOBscan can only predict the MOB type with plasmid proteins, we annotated the plasmids in the test set with Prokka, then manually submitted them to the MOBscan website for MOB type annotation.

    Given that MOBScan operates as an online tool and cannot be executed locally, the calculation of MOBScan's run time was confined to the duration spent on preprocessing with Prokka locally." (Please refer to Line 313-319 in the revised manuscript).

    -> Actually, it can be executed locally using the scripts included in https://github.com/santirdnd/COPLA/. It may not be necessary to run MOBscan locally (it may be okay that they manually submitted them to the MOBscan website), but I'll inform you regardless.

    1. In the comparison, it was observed that MOBscan did not perform well, achieving low accuracy and kappa values across sequences of varying lengths, while MOB-suite exhibited marginally better performance than MOBscan when handling sequences of greater length (Figure 3A, 3B). (Please refer to Line 418-421 in the revised manuscript).

    -> Do the authors' results contradict the following general expectation? MOB-typer utilizes BLAST, whereas MOBscan utilizes hmmscan, and therefore, MOBscan is expected to retrieve more distantly related proteins than MOB-typer.

    1. MOB-suit and MOBscan are represented by blue lines, orange lines and gray lines respectively. -> should be "MOB-suite"
    1. I suggest receiving English language editing before publishing the paper. "For the MOB typing, MOBscan [18] uses the HMMER model to annotated the relaxases and further perform MOB typing." -> should be "For the MOB typing, MOBscan [18] uses the HMMER model to annotate the relaxases and further perform MOB typing."