Database-Augmented Transformer-Based Large Language Models Achieve High Accuracy in Mapping Gene-Phenotype Relationships
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Transformer-based large language models (LLMs) have demonstrated significant potential in the biological and medical fields due to their ability to effectively learn from large-scale, diverse datasets and perform a wide range of downstream tasks. However, LLMs are limited by issues such as information processing inaccuracies and data confabulation, which hinder their utility for literature searches and other tasks requiring accurate and comprehensive extraction of information from extensive scientific literature. In this study, we evaluated the performance of various LLMs in accurately retrieving peer-reviewed literature and mapping correlations between 102 genes and four phenotypes: bone formation, cartilage formation, fibrosis, and cell proliferation. Our analysis included standard transformer-based LLMs (ChatGPT4o and Gemini1.5 Pro), fine-tuned LLMs with dedicated custom databases containing peer-reviewed articles (SciSpace and ScholarAI), and fine-tuned LLMs without dedicated databases (PubMedGPT and ScholarGPT). Using human-curated gene-to-phenotype mappings as the ground truth, we found that fine-tuned LLMs with dedicated databases (SciSpace and ScholarAI) achieved high accuracy (>80%) in gene-to-phenotype mapping. Additionally, these models were able to provide relevant peer-reviewed publications supporting each gene-to-phenotype correlation. These findings underscore the importance of database augmentation and finetuning in enhancing the reliability and utility of LLMs for biomedical research applications.