On Using Large Language Models to Understand the Language of Life

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The application of machine learning to large datasets has ushered in a new era, exemplified by large language models like ChatGPT that represent accurate statistical representations of human languages. With advances in high throughput methods for assaying living systems, such as DNA sequencing, there is a growing number of applications of machine learning and AI in biology. Despite this progress, our understanding of biological systems remains limited, and we are still far from predicting human biology. Here, we argue that decoding the functioning of the human genome and its gene products on a large scale would enable the creation of predictive models for human biology. By modeling the interactions of gene products over time and space, leading to cell functions that collectively contribute to tissues and a functional organism, we could potentially predict human biological functions, processes and phenotypes. This approach has the potential to revolutionize biology and biomedical research, offering computational models for development, human physiology, and diseases. To understand human biology and disease, however, biological time is a key variable, and we discuss the need to decode the principles of cellular transitions. A predictive model of the language of life, with temporal and spatial resolution, is ambitious yet, in theory, technologically feasible and would have profound implications for comprehending human biology in health and disease.

Article activity feed