GatorAffinity: Boosting Protein-Ligand Binding Affinity Prediction with Large-Scale Synthetic Structural Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Protein-ligand binding affinity prediction is a fundamental task in computational drug discovery. Although substantial efforts have been made to enhance prediction accuracy using data-driven approaches, progress remains limited by persistent data scarcity. The widely used PDBbind dataset, for example, contains fewer than 20,000 experimental structures with annotated binding affinities, while a vast number of affinity measurements remain underutilized due to missing structural data. Here, we investigate this untapped potential by curating more than 450,000 synthetic protein-ligand complexes annotated with Kd and Ki values using the Boltz-1 structure prediction model. Building on this unprecedented scale of synthetic data, further augmented with over 1 million synthetic complexes from the recently released SAIR database annotated with IC50 values, we develop GatorAffinity, a geometric deep learning-based scoring function pre-trained on large-scale synthetic data and fine-tuned using high-quality experimental structures from PDBbind. Extensive evaluation on a leak-proof benchmark demonstrates that GatorAffinity significantly outperforms state-of-the-art affinity prediction methods, offering superior accuracy and generalizability. Our findings show that augmenting available experimental data with synthetic complexes can effectively address the data scarcity challenge while maintaining strong predictive reliability. By releasing the pretrained GatorAffinity model and the large-scale synthetic dataset GatorAffinity-DB, we provide a scalable and reproducible foundation for affinity prediction, virtual screening, and broader structure-based drug design applications (https://github.com/AIDD-LiLab/GatorAffinity).

Article activity feed