Machine learning for optimal growth temperature prediction of prokaryotes using amino acid descriptors

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

The optimal growth temperature (OGT) of organisms is valuable in bioprospecting enzymes that work under extreme conditions. Existing OGT prediction models achieve high accuracy, but mainly capture trends of overrepresented groups in the training set including organisms that thrive at moderate temperatures and those from well-described taxa.

Results

In this study, we incorporated weighted scoring and phylogenetic splits to improve the generalizability of the prediction models. We first built a new growth temperature dataset comprising more than 21,000 species distributed over all three domains of life, with special attention to include OGT and extreme temperature data. We then trained machine learning models on the OGT data of 6,401 prokaryotes using proteome-averaged amino acid descriptors. The best-performing model was the multilayer perceptron with a cross-validated RMSE of 5.07°C ( ± 0.24) and an R 2 of 0.89 ( ± 0.04). The most important proteome features were related to backbone flexibility, charged residues, as well as surface accessibility.

Availability and Implementation

The MLP model is integrated in the command line tool OGTFinder and available under MIT license at: https://github.com/SC-Git1/OGTFinder .

Article activity feed