Generation of antigen-specific paired heavy-light chain antibody sequences using large language models

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Monoclonal antibodies represent a diverse class of proteins that, due to their exquisite specificity for their targets, are broadly and extensively used as therapeutics, diagnostics, and research reagents. Despite the critical importance of antibodies in a wide range of areas of biomedical significance, the traditional process of antibody discovery is limited by inefficiency, high costs and failure rates, fundamental logistical hurdles, limited scalability potential, and long turnaround times. Recent computational biology efforts, including machine learning/artificial intelligence (AI) approaches, have proven to have critical capabilities and advantages over purely experimental techniques, disrupting a number of areas that have traditionally met with limited success, such as protein structure modeling, enzyme engineering, drug design, and others ( 1–3 ). In the context of antibodies, AI-based approaches have been developed to optimize existing antibodies and generate novel antibody sequences in a target-agnostic manner ( 4–6 ). However, the ability of AI approaches to generate novel paired antibody sequences against a specific target (antigen) of interest has not been validated. In this work, we present MAGE (Monoclonal Antibody GEnerator), a sequence-based protein Large Language Model (LLM) fine-tuned for the task of generating paired variable heavy and light chain antibody sequences against antigens of interest. Here we show that MAGE is capable of efficiently generating diverse antibody sequences that are distinct from the antibody sequences found in the training datasets, with experimentally validated binding specificity against SARS- CoV-2 receptor-binding domain (RBD), an emerging avian influenza H5N1 viral hemagglutinin (H5), and respiratory syncytial virus A (RSV-A) prefusion F. MAGE is trained on protein sequence alone and requires only an antigen sequence as input for antibody design, with no need for a preexisting antibody template. MAGE represents a first-in-class model capable of designing human antibodies with demonstrated functionality against multiple targets. AI models for generation of human antibodies will provide unique and disruptive capabilities in the far-reaching field of antibody science.

Article activity feed