Lab-in-the-loop therapeutic antibody design with deep learning
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Arcadia Science)
Abstract
Therapeutic antibody design is a complex multi-property optimization problem with substantial promise for improvement with the application of machine-learning methods. Towards realizing that promise, we introduce “Lab-in-the-loop,” a new approach that orchestrates state-of-the-art repertoire mining methods, generative machine learning models, multi-task property predictors, active learning ranking and selection, and in vitro experimentation in a semi-autonomous, iterative optimization loop. By automating the design of antibody variants, property prediction, ranking and selection of designs to assay in the lab, and ingestion of in vitro data, we enable an end-to-end approach to developing computationally-informed therapeutic antibody design pipelines. We apply lab-in-the-loop to eleven seed antibodies obtained via animal immunization with four clinically relevant antigen targets: EGFR, IL-6, HER2, and OSM. Over 1,800 unique antibody variants are tested throughout four rounds of iterative optimization identifying 3–100× better binding variants for all targets and 10/11 seeds, with the best binders exceeding 100 pM affinity, demonstrating a process by which end-to-end machine learning can be developed for therapeutic antibody development.
Article activity feed
-
In the first round of design, a maximum edit distance cap of 6 edits444from the lead is enforced. In the second round, this cap is increased to 8, and in the445third round, it is increased to 12.
This seems still quite close in sequence space to the original sequences -- how do the results from this method compare to other protein engineering efforts on the same targets? are there cases where the same residues were mutated?
-
DCS uses a likelihood under a joint density of statistical properties, includ-431ing log-probability under a protein language model, and sequence-based properties432like hydrophobicity and molecular weight, calculated with BioPython
it seems from a first pass to be a little bit circular to use OOD detection on PLM-generated sequences. Presumably PLMs have "learned" various aspects of what makes a good (in-distribution) protein and already incorporate (implicitly) some concept of hydrophobicity, MW, etc. In other words, it would be interesting to see for cases of major disagreement between the generative model and the OOD-detection model, which one is correct?
-
3
how was this metric chosen?
-
Our results demonstrate the powerful generalization capabilities of LitL to perform102antibody design across diverse antigen targets and epitopes, without human inter-103vention, while producing real therapeutic antibodies that are viable candidates to104progress in the drug discovery pipeline.
how does this connect to ultimate success/failure and profitability in the clinic? i.e. if we apply this (powerful!) approach across all drug targets from now on, will we get a significant boost in clinical performance? or are failures in the clinic caused by other factors not addressed by this method e.g. incomplete understanding of the disease states, selecting the wrong targets, choosing the wrong patient population etc?
-