Bayesian Optimization of ASCII Structural Anchors for Improving Large Language Model Performance in Biomedical Knowledge Mining

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Prompt engineering for large language models (LLMs) in biomedical knowledge discovery requires extensive manual optimization and domain expertise. We present the Bayesian Optimization-based Prompt (BOP) framework, an automated approach for optimizing discrete ASCII structural anchor-based prompts using tree-structured Parzen estimator sampling. BOP systematically explores short ASCII string configurations as soft prompt tokens across role, task, instruction, and question dimensions without requiring model fine-tuning or gradient access. We evaluated BOP on gene-gene interaction extraction across five LLMs: GPT-4, GPT-4o, GPT-3.5, Cohere Command, and LLaMA-3 8B, using datasets from eleven KEGG signaling pathways encompassing activation, inhibition, and phosphorylation relationships. BOP-optimized prompts converged within 35-45 iterations across all tested architectures and achieved macro-average F1 scores of 0.80 for GPT-4 and GPT-4o, 0.66 for LLaMA-3 8B, and 0.62 for Cohere Command, representing substantial improvements over manual prompt engineering baselines. Internal representation analysis demonstrated enhanced separability of biological interaction classes (activation, inhibition, and phosphorylation), as evidenced by an increase in the Calinski–Harabasz index, increasing from approximately 68.9 under baseline prompting to 86.9 with structural anchor prompting. Meanwhile, cosine similarity between hidden-state embeddings generated with and without structural anchor prompting remained above 0.98 across transformer layers, confirming representational stability under prompt variation. These results demonstrate that automated Bayesian optimization can significantly improve knowledge extraction accuracy in biomedical natural language processing (NLP) tasks, providing a scalable alternative to manual prompt engineering for specialized domain applications. While demonstrated on gene interactions, the architecture-agnostic design suggests broader applicability across NLP tasks.

Article activity feed