Evaluating large language models in biomedical data science challenges through a classroom experiment

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models have shown remarkable capabilities in algorithm design, but their effectiveness in solving data science challenges remains poorly understood. We conducted a classroom experiment in which graduate students used large language models (LLMs) to solve biomedical data science challenges on Kaggle. While their submissions did not top the leaderboards, their prediction scores were often close to those of leading human participants. LLMs frequently recommended gradient boosting methods, which were associated with better performance. Among prompting strategies, self-refinement, where the LLM improves its own initial solution, was the most effective, a result validated using additional LLMs. These findings demonstrate that LLMs can design competitive machine learning solutions, even when used by non-experts.

Article activity feed