Evaluating Expert Specialization in Mixture-of-Experts Antibody Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Antibody language models (AbLMs) show an impressive aptitude for learning antibody features, but tend to struggle learning the highly diverse, non-templated regions of antibodies. Existing AbLMs use dense architectures, where all model parameters attend to each amino acid token. We hypothesized that the modular nature of antibodies could benefit from a sparse mixture-of-experts (MoE) architecture, allowing specific parameters (referred to as ‘experts’) to specialize in distinct antibody features. While MoE architectures are widely adopted and optimized in natural language processing domains, they are less common in biological modeling. To this end, we assess existing MoE routing strategies and find that token-choice routing strategies outperform expert-choice routing, presumably due to their specialization in CDRH3 residues. We further optimized the token-choice router for AbLMs, by minimizing the routing of padding tokens to enable pre-training with varying sequence lengths. Finally, we show that a large-scale baseline antibody language model with a Top-2 MoE architecture (BALM-MoE), trained on a mixture of unpaired and paired antibody sequences, outperforms its dense counterpart with the same number of active parameters.