Precise CDR Position Control in Antibody Sequence Generation Using Conditional Deep Generative Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Controllable full-length antibody sequence generation requires explicit localization of complementarity-determining regions (CDRs), but most autoregressive pipelines optimize global likelihood without machine-readable boundary guarantees. We formulate CDR position control as a sequence modeling objective by inserting explicit CDR1/2/3 boundary tokens and optional property-conditioning tokens. On top of this representation, we introduce CDBO (CDR Boundary-Order constrained decoding), which enforces legal boundary-token progression during decoding, and an auxiliary training objective, L = L_lm + λ1 L_boundary + λ2 L_property, for boundary and condition supervision. The full workflow is recomputed from source repertoire data with deterministic quality control, marking validation, generated-sequence composition analysis, and leakage auditing. From 11,228,600 raw records, 11,078,824 pass quality control (98.67% retention), and marker insertion validation reaches 100.00% success on 5,000 samples. In three-seed four-way ablation, the Full model (Aux + CDBO) achieves the highest CDR boundary-order fidelity (0.9333 +/- 0.0764) versus Base (0.3500 +/- 0.0000), while maintaining strong sequence validity. These results support explicit boundary-aware control as a practical route for reproducible and biologically aligned antibody generation.

Article activity feed