Machine learning for optimal growth temperature prediction of prokaryotes using amino acid descriptors

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The optimal growth temperature of organisms is valuable in bioprospecting enzymes that work under extremophilic temperature conditions. Existing prediction models achieve high accuracy, but they mainly capture the trends of the mesophiles (OGT = 15-45°C) and taxonomies that are abundantly present in the training set. In this study, we investigated the use of a weighted root mean square deviation (RMSE) and phylogenetic splits to improve the generalizability of the prediction models trained on amino acid descriptors. To do this, we first built a new OGT database of more than 10,000 species distributed over 51 phyla of Bacteria, Archaea and Eukaryota with special attention to include extreme temperature data. Then, we trained machine learning models on 6,401 observations with available genomes. The best performance was from the multi-layer perceptron with an RMSE of 5.05°C and an R 2 of 0.81. The most important model descriptors were related to charged residues, as well as bulky, hydrophobic residues.

Article activity feed