Protein Language Model Based Structure-guided Antibody Screening for Disordered Protein Targets

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A crucial step in the pathogenesis of Parkinson’s disease involves cell-to-cell transmission of α -Synuclein proto-fibrils via endocytosis, driven primarily by the interaction of its disordered C-terminal peptide with domain 1 of Lymphocyte Activation Gene 3 (LAG3) neuronal receptors. High-affinity antibodies have been proposed as therapeutic modalities to delay this progression and subsequent amyloid formation. In our work, we develop an end-to-end computational pipeline to enable rapid screening of antibody sequences that have a high-affinity for the disordered C-terminal peptide of α -Synuclein using no information of known binders. This de novo screening was enabled by a structural bioinformatics based in silico data generation pipeline combined with a deep learning framework. Our simple feed forward network model built upon sequence embeddings from a protein language model ranked the binding affinities (ΔG) of antibodies to α -Synuclein with a high accuracy (Spearman ρ = 0.86) when the training and the evaluation datasets contained sequences having some overlap in the complementarity determining regions (CDRs). However, for vastly different CDR sequences, a transformer encoder model trained using the antibody sequence embeddings showed a low Spearman rank correlation of ρ = 0.18. The models have a mean Precision@100 of 38 and 12 respectively, significantly outperforming a random process. Overall, our work demonstrates a computational protocol for generating a high quality dataset of antibody-antigen complexes spanning a very large diversity in antibody sequences followed by training of a deep learning model for prediction of high-affinity antibody sequences for a specific protein target with no known binders.

Article activity feed