MiGenPro: A linked data workflow for phenotype-genotype prediction of microbial traits using machine learning.
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Availability of microbial genomic data and development of machine learning methods create a unique opportunity to establish associations between genetic information and phenotypes. Here we present a computational workflow for Microbial Genome Prospecting (MiGenPro) that combines phenotype and genomic information. MiGenPro serves as a workflow for the generation of machine learning models that predict microbial traits from genome sequences. Microbial genomes have been consistently annotated and features were stored in a semantic framework. The data was used to train machine learning models and successfully predicted microbial traits such as motility, Gram stain, optimal temperature range, and sporulation capabilities. To ensure robustness, five-fold cross-validation was implemented and demonstrated consistent model performance across iterations and without overfitting. Effectiveness was further validated through comparison with existing models, showing comparable accuracy, with modest variations attributed to differences in datasets rather than methodology. Classification can be further explored using feature importance characterisation to identify biologically relevant genomic features. MiGenPro provides an interoperable workflow to build models and predict phenotypes from microbes based on their genome.