AI-guided discovery and optimization of antimicrobial peptides through species-aware language model

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Log in to save this article

Abstract

The rise of antibiotic-resistant bacteria drives an urgent need for novel antimicrobial agents. Antimicrobial peptides (AMPs) show promise solutions due to their multiple mechanisms of action and reduced propensity for resistance development. This study introduces LLAMP (Large Language model for AMP activity prediction), a target species-aware AI model that leverages pre-trained language models to predict minimum inhibitory concentration values of AMPs. Using LLAMP, we screened approximately 5.5 million peptide sequences, identifying peptides 13 and 16 as the most selective and most potent candidates, respectively. Analysis of attention values allowed us to pinpoint critical amino acid residues (e.g., Trp, Lys, and Phe). Using the critical amino acids, the sequence of the most selective peptide 13 was engineered to increase amphipathicity through targeted modifications, yielding peptide 13–5 with an overall enhancement in antimicrobial activity but a reduction in selectively. Notably, peptides 13–5 and 16 demonstrated antimicrobial potency and selectivity comparable to the clinically investigated AMP pexiganan. Our work demonstrates the potential of AI to expedite the discovery of peptide-based antibiotics to combat antibiotic resistance.

Article activity feed

  1. Very cool study! One suggestion: the pseudo perplexity values still seem pretty high even after fine-tuning, which may indicate some degree of underfitting. This could be due to the relatively small size of the 35M ESM2 model. Have you considered trying a larger model (150M or 650M)? If fine-tuning a larger ESM2 model is computationally prohibitive, it might still be informative to compare against the zero-shot performance of a larger model to assess whether fine-tuning is necessary, or whether a larger baseline alone achieves comparable predictive results.