Bioinformatics Copilot 1.0: A Large Language Model-powered Software for the Analysis of Transcriptomic Data

Yongheng Wang
Weidi Zhang
Siyu Lin
Matthew S. Farruggio
Aijun Wang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The field of single-cell transcriptomics has been producing extensive datasets, advancing our understanding of cellular functions in various tissues, and empowering diagnosis, prognosis, and drug development. However, parsing through this data has been a monumental task, often stretching weeks to months. This bottleneck arises due to the sheer volume of data generated—ranging from hundreds of gigabytes to tens of terabytes—that demands extensive time for analysis. Moreover, the data analysis involves an intricate series of steps utilizing various software packages, creating a steep learning curve for biologists. Additionally, the iterative nature of data analysis in this domain necessitates a deep biological insight to formulate relevant questions, conduct analysis, interpret results, and refine hypotheses. This iterative loop has required close collaboration between biologists and bioinformaticians, which is hampered by protracted communication cycles. To address these challenges, we present a large language model-powered software, Bioinformatics Copilot 1.0. It allows users to analyze data through an intuitive natural language interface, without requiring proficiency in programming languages such as Python or R. It is engineered for cross-platform functionality, with support for Mac, Windows, and Linux. Importantly, it facilitates local data analysis, ensuring adherence to stringent data management regulations that govern the use of patient samples in medical and research institutions. We anticipate that this tool will expedite the data analysis workflow in numerous research endeavors, thereby accelerating advancements in the biomedical sciences.

Version published to 10.1101/2024.04.11.588958 on bioRxiv
Apr 15, 2024

Discuss this preprint

Listed in

Abstract

Article activity feed