This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Generative pre-trained transformers (GPT) models provide powerful tools for de novo protein design (DNPD). GPT-based DNPD involves three procedures: (a) finetuning the model with proteins of interest; (b) generating sequence candidates with the finetuned model; and (c) prioritizing the sequence candidates. Existing prioritization strategies heavily rely on sequence identity, undermining the diversity. Here, we coupled a protein GPT model with a custom discriminator, which enabled selecting candidates of low identity to natural sequences while highly likely with desired functions. We applied this framework to creat novel antimicrobial peptides (AMPs) and malate dehydrogenases (MDHs). Experimental verification pinpointed four broad-spectrum AMPs from 24 candidates. Comprehensive computational analyses on the prioritized MDHs candidates provided compelling evidence for the anticipated function. During experimental validation, 4/10 and 3/10 natural MDHs and generated-prioritized novel candidates, respectively, were expressed and soluble. All the soluble candidates (3/3) are functional in vitro. In a broader scope, our generator-discriminator framework is seemingly akin to generative adversarial network (GAN)—but they are fundamentally different. Our results suggest that our framework is more data- and time-efficient than GAN-based method in DNPD and may therefore considerably expedite the DNPD process.

]]>

Article activity feed