Accelerating Natural Product Discovery with Linked MS-Genomics and Language/Transformer-Based Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
An integrated multi-modal characterization of a microbial strain library streamlines the effort for natural product discovery. By integrating language- and transformer-based models to cross-validate mass spectrometry (MS)-genome datasets, microbial producers of diverse natural products are rapidly identified with high (75-100%) precision. Our findings demonstrate the transformative potential of linked MS-genome datasets at the strain-level to significantly accelerate discovery and enhance our understanding of microbes beyond currently known and curated knowledge.