Precise CDR Position Control in Antibody Sequence Generation Using Conditional Deep Generative Models

Pan Jiang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Controllable full-length antibody sequence generation requires explicit localization of complementarity-determining regions (CDRs), but most autoregressive pipelines optimize global likelihood without machine-readable boundary guarantees. We formulate CDR position control as a sequence modeling objective by inserting explicit CDR1/2/3 boundary tokens and optional property-conditioning tokens. On top of this representation, we introduce CDBO (CDR Boundary-Order constrained decoding), which enforces legal boundary-token progression during decoding, and an auxiliary training objective, L = L_lm + λ1 L_boundary + λ2 L_property, for boundary and condition supervision. The full workflow is recomputed from source repertoire data with deterministic quality control, marking validation, generated-sequence composition analysis, and leakage auditing. From 11,228,600 raw records, 11,078,824 pass quality control (98.67% retention), and marker insertion validation reaches 100.00% success on 5,000 samples. In three-seed four-way ablation, the Full model (Aux + CDBO) achieves the highest CDR boundary-order fidelity (0.9333 +/- 0.0764) versus Base (0.3500 +/- 0.0000), while maintaining strong sequence validity. These results support explicit boundary-aware control as a practical route for reproducible and biologically aligned antibody generation.

Version published to 10.21203/rs.3.rs-9260205/v1 on Research Square
Mar 31, 2026

Beyond Random Splits: A Critical Evaluation of Graph Learning Models in Predicting Mutation-Induced Drug Resistance

This article has 3 authors:
1. Zongrui Cheng
2. Haoxin Wu
3. Dengming Ming
This article has no evaluationsLatest version Apr 2, 2026
DeepCas12a: A hybrid deep learning framework for accurate Cas12a efficiency prediction from sequence and epigenetic information

This article has 6 authors:
1. Yiming Shi
2. Junkai Yin
3. Shurui Ning
4. Jinling Yuan
5. Degang Yang
6. Guohui Chuai
This article has no evaluationsLatest version Feb 9, 2026
FASA: Feature-Agnostic Stacked Autoencoders for Accurate Adverse Drug Reaction Prediction

This article has 3 authors:
1. Martin Gustavo Perez Bonany Torrealva
2. Edward Jorge Yuri Cayllahua Cahuina
3. Rensso Victor Hugo Mora Colque
This article has no evaluationsLatest version Apr 1, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Beyond Random Splits: A Critical Evaluation of Graph Learning Models in Predicting Mutation-Induced Drug Resistance

DeepCas12a: A hybrid deep learning framework for accurate Cas12a efficiency prediction from sequence and epigenetic information

FASA: Feature-Agnostic Stacked Autoencoders for Accurate Adverse Drug Reaction Prediction