Optimizing Large Language Models forOntology-Based Annotation: A Study on Gene Ontology in Biomedical Texts

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Automated ontology annotation of scientific literature plays a critical role in knowledge management, particularly in fields like biology and biomedicine, where accurate concept tagging can enhance information retrieval, semantic search, and knowledge integration. Traditional models for ontology annotation, such as Recurrent Neural Networks (RNNs) and Bidirectional Gated Recurrent Units (Bi-GRUs), have been effective but limited in handling complex biomedical terminologies and semantic nuances. This study explores the potential of large language models (LLMs), including MPT-7B, Phi, BiomedLM, and Meditron, for improving ontology annotation, specifically with Gene Ontology (GO) concepts. We fine-tuned these models on the CRAFT dataset, assessing their performance in terms of F1 score, semantic similarity, memory usage, and inference speed. Experimental results show that fine-tuned LLMs can achieve higher semantic accuracy than traditional RNN models, capturing nuanced relationships within biomedical text. However, resource requirements for LLMs are notably high, raising considerations about computational efficiency. Techniques like parameter-efficient fine-tuning (PEFT) and advanced prompting were explored to address these challenges, demonstrating potential in reducing computational demands while maintaining performance. Our findings suggest that while LLMs offer advantages in annotation accuracy, practical deployment should balance these benefits with resource costs. This research highlights the need for further optimization and domain-specific training to make LLMs a feasible choice for real-world biomedical ontology annotation tasks.

Article activity feed