Lab-in-the-loop therapeutic antibody design with deep learning

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Therapeutic antibody design is a complex multi-property optimization problem that traditionally relies on expensive search through sequence space. Here, we introduce “Lab-in-the-loop,” a new approach to antibody design that orchestrates generative machine learning models, multi-task property predictors, active learning ranking and selection, and in vitro experimentation in a semi-autonomous, iterative optimization loop. By automating the design of antibody variants, property prediction, ranking and selection of designs to assay in the lab, and ingestion of in vitro data, we enable a holistic, end-to-end approach to antibody optimization. We apply lab-in-the-loop to four clinically relevant antigen targets: EGFR, IL-6, HER2, and OSM. Over 1,800 unique antibody variants are designed and tested, derived from lead molecule candidates obtained via animal immunization and state-of-the-art immune repertoire mining techniques. Four lead candidate and four design crystal structures are solved to reveal mechanistic insights into the effects of mutations. We perform four rounds of iterative optimization and report 3–100 × better binding variants for every target and ten candidate lead molecules, with the best binders in a therapeutically relevant 100 pM range.

Article activity feed

  1. In the first round of design, a maximum edit distance cap of 6 edits444from the lead is enforced. In the second round, this cap is increased to 8, and in the445third round, it is increased to 12.

    This seems still quite close in sequence space to the original sequences -- how do the results from this method compare to other protein engineering efforts on the same targets? are there cases where the same residues were mutated?

  2. DCS uses a likelihood under a joint density of statistical properties, includ-431ing log-probability under a protein language model, and sequence-based properties432like hydrophobicity and molecular weight, calculated with BioPython

    it seems from a first pass to be a little bit circular to use OOD detection on PLM-generated sequences. Presumably PLMs have "learned" various aspects of what makes a good (in-distribution) protein and already incorporate (implicitly) some concept of hydrophobicity, MW, etc. In other words, it would be interesting to see for cases of major disagreement between the generative model and the OOD-detection model, which one is correct?

  3. Our results demonstrate the powerful generalization capabilities of LitL to perform102antibody design across diverse antigen targets and epitopes, without human inter-103vention, while producing real therapeutic antibodies that are viable candidates to104progress in the drug discovery pipeline.

    how does this connect to ultimate success/failure and profitability in the clinic? i.e. if we apply this (powerful!) approach across all drug targets from now on, will we get a significant boost in clinical performance? or are failures in the clinic caused by other factors not addressed by this method e.g. incomplete understanding of the disease states, selecting the wrong targets, choosing the wrong patient population etc?