Evaluating large language models in biomedical data science challenges through a classroom experiment

Huifang Ma
BIOSTAT 824 Student Consortium
Zhicheng Ji

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models have shown remarkable capabilities in algorithm design, but their effectiveness in solving data science challenges remains poorly understood. We conducted a classroom experiment in which graduate students used large language models (LLMs) to solve biomedical data science challenges on Kaggle. While their submissions did not top the leaderboards, their prediction scores were often close to those of leading human participants. LLMs frequently recommended gradient boosting methods, which were associated with better performance. Among prompting strategies, self-refinement, where the LLM improves its own initial solution, was the most effective, a result validated using additional LLMs. These findings demonstrate that LLMs can design competitive machine learning solutions, even when used by non-experts.

Version published to 10.1101/2025.07.12.664517 on bioRxiv
Jul 17, 2025

CLEVER: Clinical Large Language Model Evaluationby Expert Review

This article has 4 authors:
1. Veysel Kocaman
2. Mustafa Kaya
3. Andrei Ferrer
4. David Talby
This article has no evaluationsLatest version Jul 23, 2025
BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining

This article has 6 authors:
1. Baqer M. Merzah
2. Tania Taami
3. Salman Asoudeh
4. Amir reza Hossein pour
5. Saeed Mirzaee
6. Amir Ali Bengari
This article has no evaluationsLatest version Jul 21, 2025
Accelerating Insight Discovery in Large Biomedical Text with Scalable Processing Framework

This article has 3 authors:
1. Dongeun Kim
2. Megan Hauptman
3. Matthew T. Patrick
This article has no evaluationsLatest version Aug 19, 2025

Listed in

Abstract

Article activity feed

Related articles

CLEVER: Clinical Large Language Model Evaluationby Expert Review

BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining

Accelerating Insight Discovery in Large Biomedical Text with Scalable Processing Framework