Can Large Language Models “Read” Biological Sequences? A Systematic Evaluation of In-Context Learning for Antibody Characterization
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) can learn new tasks by in-context learning (ICL), but it is unknown whether this ability reliably transfers to biological sequence classification. Here, we systematically evaluate how demonstration selection, shot count, and prompting strategies affect performance across 20 general-purpose LLMs. Using antibody characterization as a representative test case, we compare zero-shot, few-shot, and chain-of-thought (CoT) ICL on three classification tasks: humanness, antigen specificity, and isotype. Our results reveal a clear performance hierarchy: while zero-shot prompting performs near chance, few-shot prompting with randomly selected demonstrations improves performance, showing that LLMs can perform ICL using biological sequences from minimal supervision. However, matching protein-language model (pLM)-based classifier accuracy is only achieved when using label-diverse demonstrations drawn from antibodies similar to the query sequence. To leverage this insight, we introduce Sim-ICL, a framework that automatically retrieves such demonstrations. Using only 32-shot prompting, Sim-ICL achieves performance competitive with pLM-based classifiers, matching or outperforming them in two of the three tasks. Furthermore, reasoning-oriented prompts yield marginal gains and often produce fluent but biologically incorrect rationales, suggesting that current CoT explanations function as after-the-fact rationalizations rather than capturing mechanistic determinants of antibody properties. From these experiments, we derive practical design principles for ICL on biological sequences: use similarity-based, label-diverse demonstrations and modest shot counts, and treat reasoning prompts primarily as post hoc narratives rather than drivers of performance. Sim-ICL implements these principles in a streamlined, prompt-based framework for antibody sequence classification and, in principle, could be adapted to other biological sequence tasks.