Large Language Models for Accessible Reporting of Bioinformatics Analyses in Interdisciplinary Contexts

Lijia Yu
Daniel Kim
Yue Cao
Matthew Wei Shun Shu
Maya Shen
Xiaoqi Liang
Jasmine Gu
Rojashree Jayakumar
Wenze Ding
Fei Yang
Xumou Zhang
Jinman Kim
Pengyi Yang
Jean Yee Hwa Yang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Health and life scientists routinely collaborate with quantitative scientists for data analysis and interpretation, yet miscommunication often obscures the interpretation of complex results. Large Language Models (LLMs) offer a promising way to bridge this gap, but their cross-discipline interpretative skill remains limited on real-word bioinformatics analyses. We therefore benchmarked four state-of-the-art LLMs: GPT-4o, o1, Claude 3.7 Sonnet, and Gemini 2.0 Flash, using automated and human evaluation frameworks to ensure holistic evaluation. Automated assessment employed multiple choice questions designed using Bloom’s taxonomy to assess multiple levels of understanding, while human evaluation tasked scientists to score summaries for factual consistency, lack of harmfulness, comprehensiveness, and coherence. All generally produced readable and largely safe summaries, confirming their value for first-pass translation of technical analyses, however frequently misinterpreted visualisations, produced verbose summaries and rarely offered novel insights beyond what was already contained in the analytics. Our findings suggest that LLMs are best suited for easing interdisciplinary communication rather than replacing domain expertise and human oversight remains essential to guarantee accuracy, interpretative depth, and the generation of genuinely novel scientific insights.

Version published to 10.1101/2025.11.09.687479 on bioRxiv
Nov 11, 2025

Understanding Pathways in Bioinformatics, Genomics, and Health Applications

This article has 1 author:
1. Diptarup Mallick
This article has no evaluationsLatest version Jan 19, 2026
LLMAgent4Bio: LLM Agents for Biological Intelligence Across Genomics, Proteomics, Spatial Biology, and Biomedicine

This article has 9 authors:
1. Sajib Acharjee Dip
2. Dipanwita Mallick
3. Uddip Acharjee Shuvo
4. Shovito Barua Soummo
5. Fazle Rafsani
6. Bikash Kumar Paul
7. Nazifa Ahmed Moumi
8. Shafayat Ahmed
9. Liqing Zhang
This article has no evaluationsLatest version Dec 16, 2025
TaxoFlow: The Tutorial. An Educational Nextflow Pipeline for Metagenomics Taxonomic Profiling

This article has 2 authors:
1. Jeferyd Yepes-García
2. Laurent Falquet
This article has no evaluationsLatest version Dec 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Understanding Pathways in Bioinformatics, Genomics, and Health Applications

LLMAgent4Bio: LLM Agents for Biological Intelligence Across Genomics, Proteomics, Spatial Biology, and Biomedicine

TaxoFlow: The Tutorial. An Educational Nextflow Pipeline for Metagenomics Taxonomic Profiling