PepSeek: Universal Functional Peptide Discovery with Cooperation Between Specialized Deep Learning Models and Large Language Model
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recent computational foundation models have revitalized the scientific discovery pipeline. However, developing foundational models for functional peptide discovery is costly due to the scarcity of wet-lab validated data. Meanwhile, conventional deep learning models are hard to generalize to unseen tasks or data distribution. Here, we introduce PepSeek, a universal approach for peptide discovery that synergistically integrates the most advanced large language model (LLM) with specialized small models. PepSeek harnesses the robust reasoning and generalization capabilities of LLM while leveraging the high predictive accuracy of specialized models trained for tasks such as antimicrobial activity regression and functional peptide generation. We have devised multiple collaborative strategies and task-specific modules demonstrating leading performance in peptide identification and generation. Notably, PepSeek achieves remarkable zero-shot prediction accuracy for peptides with diverse functionalities. We used PepSeek to identify a group of broad-spectrum antimicrobial peptide that exhibits low toxicity and high activity against drug-resistant bacteria, with the best surpassing all peptides currently undergoing clinical trials. Our framework establishes a new pipeline for scientific discovery with the help of LLM and specialized models.