SNAC-DB: An ML-Ready Database for Antibody and NANO-BODY® VHH–Antigen Complexes with Expanded Structural Diversity and Real-World Benchmarking

Abhinav Gupta
Bryan Munoz Rivero
Ruijiang Li
Jorge Roel-Touris
Yves Fomekong Nanfack
Maria Wendt
Yu Qiu
Norbert Furtmann

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Predicting antibody and NANOBODY ^® VHH–antigen complexes remains a critical challenge for state-of-the-art structure prediction models, limiting their impact in therapeutic discovery pipelines. We introduce SNAC-DB, an ML-ready database and curation pipeline enriched with structural biology expertise, designed to accelerate model accuracy and generalization by providing 31–37% expanded structural diversity over existing resources like SAbDab through comprehensive re-curation that extracts maximum value from available experimental structures. SNAC-DB expands coverage by capturing often-overlooked complexes and accurately identifying complete multi-chain epitopes through improved biological-assembly-based logic. Built for ML practitioners, SNAC-DB provides standardized formats with multi-threshold structure-based clustering to enable principled sample weighting during training. Using a rigorous benchmark of public PDB entries deposited post-May 2024 plus confidential therapeutic structures, we evaluate seven leading models (Protenix-v1, OpenFold-3p2, RosettaFold-3, Boltz-2, Boltz-1x, Chai-1, and AlphaFold2.3-multimer) with evaluation methodology tailored to antibody/NAN-OBODY ^® VHH–antigen complexes to ensure correct handling of multi-chain epitopes, revealing systematic performance gaps: success rates rarely exceed 25%, confidence-based ranking fails to identify best predictions even when accurate structures exist in ensembles, and all models consistently struggle with therapeutically relevant NANOBODY ^® VHHs. Systematic evaluation of sampling strategies demonstrates that while generating 1000 samples per target substantially increases the likelihood of producing accurate structures (oracle selection improves from 11.9% to 50.5%), confidence-based ranking remains nearly flat (between 10.9% and 14.9%), revealing that improved ranking mechanisms represent a more tractable path to performance gains. Finally, fine-tuning GeoDock on SNAC-DB yields higher success rates than training on SAbDab (11.0% vs. 7.1% for antibodies; 7.0% vs. 4.0% for NANOBODY ^® VHHs), suggesting that SNAC-DB’s expanded structural diversity translates to improved model generalization.

Significance Statement

Computational antibody/NANOBODY ^® VHH design shows promise but remains unreliable for therapeutic development. SNAC-DB provides 31–37% expanded structural diversity through comprehensive data curation, immediately accelerating model development. Benchmarking seven leading AI models reveals accuracy rarely exceeds 25% on therapeutic targets, with confidence-based ranking failing to identify correct structures even when they exist in model outputs. Training on SNAC-DB increases prediction accuracy, validating that high-quality, diverse training data is critical for advancing computational methods toward clinical impact.

Version published to 10.64898/2026.04.22.720253 on bioRxiv
Apr 26, 2026

NativeReady: an open benchmark and sequence-based triage model for native mass spectrometry suitability

This article has 2 authors:
1. Brhanu F. Znabu
2. Zohaib Atif
This article has no evaluationsLatest version May 6, 2026
Benchmarking Boltz-2 for Screening of Therapeutic Antibody-Antigen Interactions

This article has 10 authors:
1. Alexandra Fieux-Castagnet
2. Julian Waton
3. Alina Glukhonemykh
4. Eric Snow
5. Roshini Ashokkumar
6. Jess Fleming
7. David Champagne
8. Thomas Devenyns
9. Alex Peluffo
10. Chris Anagnostopoulos
This article has no evaluationsLatest version May 14, 2026
Computational Design and Atomistic Validation of a High-Affinity VHH Nanobody Targeting the PI/RuvC Interface of Streptococcus pyogenes Cas9: A Bivalent Hub Strategy for CRISPR-Cas9 Enhancement

This article has 3 authors:
1. Nitanshu Kumar
2. Dinky Dalal
3. Vishakha Sharma
This article has no evaluationsLatest version Mar 25, 2026

Discuss this preprint

Listed in

Abstract

Significance Statement

Article activity feed

Related articles

NativeReady: an open benchmark and sequence-based triage model for native mass spectrometry suitability

Benchmarking Boltz-2 for Screening of Therapeutic Antibody-Antigen Interactions

Computational Design and Atomistic Validation of a High-Affinity VHH Nanobody Targeting the PI/RuvC Interface of Streptococcus pyogenes Cas9: A Bivalent Hub Strategy for CRISPR-Cas9 Enhancement