Large Language Models for Accessible Reporting of Bioinformatics Analyses in Interdisciplinary Contexts

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Health and life scientists routinely collaborate with quantitative scientists for data analysis and interpretation, yet miscommunication often obscures the interpretation of complex results. Large Language Models (LLMs) offer a promising way to bridge this gap, but their cross-discipline interpretative skill remains limited on real-word bioinformatics analyses. We therefore benchmarked four state-of-the-art LLMs: GPT-4o, o1, Claude 3.7 Sonnet, and Gemini 2.0 Flash, using automated and human evaluation frameworks to ensure holistic evaluation. Automated assessment employed multiple choice questions designed using Bloom’s taxonomy to assess multiple levels of understanding, while human evaluation tasked scientists to score summaries for factual consistency, lack of harmfulness, comprehensiveness, and coherence. All generally produced readable and largely safe summaries, confirming their value for first-pass translation of technical analyses, however frequently misinterpreted visualisations, produced verbose summaries and rarely offered novel insights beyond what was already contained in the analytics. Our findings suggest that LLMs are best suited for easing interdisciplinary communication rather than replacing domain expertise and human oversight remains essential to guarantee accuracy, interpretative depth, and the generation of genuinely novel scientific insights.

Article activity feed