A Comparative Survey on Large Language Models for Biological Data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The development of large language models (LLMs) has grown exponentially since the release of ChatGPT. Large language models have gained attention for their robust performance across various tasks. The ability of LLMs to understand and produce general-purpose language is achieved by training billions of parameters. These models have emerged as a transformative force in increasing natural language understanding, representing an important step toward general artificial intelligence(AI). LLMs have become powerful tools for various tasks, including natural language processing (NLP), machine translation(MT), vision applications, and question-answering(QA). The expanded reach of LLMs goes beyond the conventional linguistic bounds and includes specialized languages created in different scientific disciplines. The intensification of interest in this new subclass of scientifically oriented LLMs has led to the birth of the scientific LLMs. These scientific LLMs are gradually gaining a foothold as an exciting research area for science study. Theoretically, they share a structure in common with general LLMs. In practice, however, they differ regarding input and usage. This paper undertakes an exhaustive effort to study all the scientific LLMs, the types of structures offered, the datasets, the parameters, and the context of use. Our analysis uses a focused lens that focuses on the biological and chemical domains, which enables an in-depth examination of LLMs for textual knowledge, small molecules, macromolecules, proteins, genomic sequences, and combinations. By providing an overview of the technical advances in the field, this survey is a valuable resource for researchers navigating the complex landscape of scientific LLMs.

Article activity feed