Active learning for improving out-of-distribution lab-in-the-loop experimental design

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The accurate prediction of antibody-antigen binding is crucial for developing antibody-based therapeutics and advancing immunological research. Library-on-library approaches, where many antigens are probed against many antibodies, can identify specific interacting pairs. Machine learning models can predict target binding by analyzing many-to-many relationships between antibodies and antigens. However, these models face challenges when predicting interactions when test antibodies and antigens are not represented in the training data, a scenario known as out-of-distribution prediction. Generating experimental binding data is costly, limiting the availability of comprehensive datasets. Active learning can reduce costs by starting with a small labeled subset of data and iteratively expanding the labeled dataset. Few active learning approaches are available to handle data with many-to-many relationships as, for example, obtained from library-on-library screening approaches. In this study, we developed and evaluated fourteen novel active learning strategies for antibody-antigen binding prediction in a library-on-library setting and evaluated their out-of-distribution performance using the Absolut! simulation framework. We found that three of the fourteen algorithms tested significantly outperformed the baseline where random data are iteratively labeled. The best algorithm reduced the number of required antigen mutant variants by up to 35%, and sped up the learning process by 28 steps compared to the random baseline. These findings demonstrate that active learning can improve experimental efficiency in a library-on-library setting and advance antibody-antigen binding prediction.

Article activity feed