Benchmarking inverse folding models for antibody CDR sequence design

Yifan Li
Yuxiang Lang
Chenrui Xu
Yi Zhou
Ziwei Pang
Per Jr. Greisen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Antibody-based therapies are at the forefront of modern medicine, addressing diverse challenges across oncology, autoimmune diseases, infectious diseases, and beyond. The ability to design antibodies with enhanced functionality and specificity is critical for advancing next-generation therapeutics. Recent advances in artificial intelligence (AI) have propelled the field of antibody engineering, particularly through inverse folding models for Complementarity-Determining Region (CDR) sequence design. These models aim to generate novel antibody sequences that fold into desired structures with high antigen-binding affinity. However, current evaluation metrics, such as amino acid recovery rates, are limited in their ability to assess the structural and functional accuracy of designed sequences. This study benchmarks state-of-the-art inverse folding models—ProteinMPNN, ESM-IF, LM-Design, and AntiFold—using comprehensive datasets and alternative evaluation metrics like sequence similarity. By systematically analyzing recovery rates, mutation prediction capabilities, and amino acid composition biases, we identify strengths and limitations across models. AntiFold exhibits superior performance in Fab antibody design, whereas LM-Design demonstrates adaptability across diverse antibody types, including VHH antibodies. In contrast, models trained on general protein datasets (e.g., ProteinMPNN and ESM-IF) struggle with antibody-specific nuances. Key insights include the models’ varying reliance on antigen structure and their distinct capabilities in capturing critical residues for antigen binding. Our findings highlight the need for enhanced training datasets, integration of functional data, and refined evaluation metrics to advance antibody design tools. By addressing these challenges, future models can unlock the full potential of AI-driven antibody engineering, paving the way for innovative therapeutic applications.

Version published to 10.1371/journal.pone.0324566
Jun 4, 2025
Version published to 10.1101/2024.12.16.628614 on bioRxiv
Dec 19, 2024

Multi-Target Gene Therapy for Osteoarthritis: A Computational and Structural Analysis Framework

This article has 1 author:
1. Po-Sung(Sinclair)Huang
This article has no evaluationsLatest version Feb 4, 2026
Drug discovery guided by maximum drug likeness

This article has 3 authors:
1. Hao-Yu Zhu
2. Lu Xu
3. Wei Shi
This article has no evaluationsLatest version Dec 31, 2025
Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multi-Target Gene Therapy for Osteoarthritis: A Computational and Structural Analysis Framework

Drug discovery guided by maximum drug likeness

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model