Predicting bacterial phenotypic traits through improved machine learning using high-quality, curated datasets

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Predicting prokaryotic phenotypes - observable traits that govern functionality, adaptability, and interactions - holds significant potential for fields such as biotechnology, environmental sciences, and evolutionary biology. This study leverages machine learning to explore the relationship between prokaryotic genotypes and phenotypes. Taking advantage of the highly standardized datasets in the BacDive database, we modeled eight physiological properties based on protein family inventories, discuss the evaluation metrics, and explore the biological implications of our models. The high confidence values of our predictions highlight the importance of data quality and quantity for a reliable inference of bacterial phenotypes. Our approach yielded nearly 55,000 new data points for approximately 20,000 strains which are published openly in the BacDive database, enriching existing phenotypic datasets and paving the way for future research and analysis. The open-source software generated can readily be applied to other datasets, for example the IMG/M system for metagenomics, as well as different applications, like the assessment of the potential of soil bacteria for bioremediation projects.

Article activity feed