Machine learning inference of natural product chemistry across biosynthetic gene cluster types

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

With ever-increasing volumes of sequencing data for biosynthetic gene clusters (BGCs), computational methods for the prediction of resulting secondary metabolites are critically needed. Here, we present CHAMOIS, a machine learning tool inferring metabolite properties from protein domains in BGCs. Out of 539 relevant chemical properties from the ChemOnt ontology, CHAMOIS predicts 120 with an AUPRC > 0.5. Although entirely data-driven, CHAMOIS infers many protein-metabolite links that are consistent with the scientific literature and suggests interesting novel biosynthetic functions of uncharacterized proteins. Finally, to guide experimental BGC characterisation, CHAMOIS can pinpoint which BGC within a given genome produces a pre-specified metabolite.

Article activity feed