A high-throughput platform for biophysical antibody developability assessment to enable AI/ML model training

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Antibodies must bind their targets with high affinity and specificity to achieve useful therapeutic activity. They must also possess suitable developability properties (e.g., thermostability, solubility, viscosity, polyreactivity) to ensure favorable manufacturing, formulation, and in vivo performance. Both binding and developability properties are inherent to a given antibody amino acid sequence. Identification or selection of antibodies possessing suitable binding characteristics is now routine, and de novo computational design models, trained on extensive complementarity-determining region sequence and structural data, are rapidly improving.

Developability properties, however, remain difficult to predict largely due to insufficient training data, with empirical testing being heavily used to avoid challenges in late-stage antibody development. To fill this gap, we built a high-throughput antibody developability assay platform designed to generate the large datasets needed to train improved machine learning (ML) models. We optimized and automated known developability assays [Jain et al., 2017], and developed a robust integrated data analytics pipeline. Here we report data on 246 antibodies—representing 106 approved, 135 clinical-stage, and 5 preregistration/withdrawn molecules—across a panel of 10 developability assays, in a “tidy data” format suitable for AI/ML modeling. We used these data to develop an XGBoost [Chen et al., 2016] ML model that better predicts similarity to approved antibodies compared to conventional use of developability warning thresholds. Additionally, we confirm that preliminary predictive models do improve with more training data. Our high-throughput PROPHET-Ab platform enables data generation at the scale needed to develop improved ML models to predict antibody developability.

Significance

Successful antibody drugs exhibit important “developability” properties, beyond tight and specific binding to their target, including high expressibility, high stability and solubility, low aggregation propensity, low viscosity, low polyreactivity, and long in vivo half-life. Collectively, developability properties predict favorable manufacturing, storage, administration, and safety, and deficiencies in these properties increase risk for clinical failure. Despite progress in developing machine learning models to predict structure and binding, antibody developability models lag, largely due to a lack of sufficiently large training datasets. We have built a high-throughput platform, PROPHET-Ab, that enables data generation at the scale needed to train improved AI/ML models to predict antibody developability.

Article activity feed