Generating antimicrobial peptides via genomic transfer learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We present a generative machine learning pipeline for the design of linear antimicrobial peptides (AMPs). To extend diversity beyond synthetically validated peptide datasets (∼7,000 entries), we apply transfer learning by training a Generative Pre-trained Transformer (GPT) on the genomically derived AMPSphere dataset (∼863,000 entries), before fine-tuning on the Database of Antimicrobial Activity and Structure of Peptides (DBAASP). We assess the filtered sequences with a committee of Minimum Inhibitory Concentration (MIC) predictive models built with a Bi-LSTM architecture, and ESM-2 and QSAR feature vectors.
The fine-tuned GPT model produced a 28% reduction in test loss compared to training on DBAASP alone, and generates peptides that are simultaneously more novel and more physicochemically plausible. Our top-ranked candidates are predicted to possess antimicrobial activity comparable to polymyxin B.
We anticipate this transfer-learning approach is broadly applicable for leveraging massive, unlabelled genomic datasets to enrich targeted peptide discovery. Our identified sequences have been submitted to the 2027 AMP Challenge 1 (team name VINCI) for experimental validation, and the complete codebase and workflow are open source 2 .