How does DeepSeek-R1 perform on USMLE?

Lisle Faray de Paiva
Gijs Luijten
Behrus Puladi
Jan Egger

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

DeepSeek, a Chinese artificial intelligence company, released its first free chatbot app based on its DeepSeek-R1 model. DeepSeek provides its models, algorithms, and training details to ensure transparency and reproducibility. Their new model is trained with reinforcement learning, allowing it to learn through interactions and feedback rather than relying solely on supervised learning. Reports showcase that DeepSeek’s model shows competitive performances against established large language models (LLMs) such as Anthropic’s Claude and OpenAI’s GPT-4o on established benchmarks in language understanding, mathematics (AIME 2024) and programming (Codeforces) while trained at a fraction of the costs. Additionally, running inference shows significantly lower costs, leading to DeepSeek surpassing ChatGPT as the most downloaded free app on the American iOS App Store. This development contributed to a nearly 17% drop in Nvidia’s share price, resulting in the most significant one-day loss in U.S. history, amounting to nearly $600 billion. The open-source models also bring a significant shift in the healthcare system, allowing cost-efficient medical LLMs to be deployed within hospital networks. To understand its performance in the healthcare sector, we analyse the new DeepSeek-R1 model on the United States Medical Licensing Examination (USMLE) and compare it to ChatGPT.

Version published to 10.1101/2025.02.06.25321749v1 on medRxiv
Feb 10, 2025

ARWKV: Pretraining Is Not What We Need – An RNN-Attention-Based Language Model Born From Transformer

This article has 4 authors:
1. Yueyu Lin
2. Zhiyuan Li
3. Peter Yue
4. Xiao Liu
This article has no evaluationsLatest version Feb 5, 2025
The Circle of Life for LLMs. Was the Reaction to DeepSeek Justified?

This article has 1 author:
1. Stephane Maes
This article has no evaluationsLatest version Feb 19, 2025
Optimizing AI Language Models: A Study of ChatGPT-4 vs. ChatGPT-4o

This article has 5 authors:
1. Md Nurul Absar Siddiky
2. Muhammad Enayetur Rahman
3. MD Fayaz Bin Hossen
4. Muhammad Rezaur Rahman
5. Md. Shahadat Jaman
This article has no evaluationsLatest version Feb 3, 2025

Listed in

Abstract

Article activity feed

Related articles

ARWKV: Pretraining Is Not What We Need – An RNN-Attention-Based Language Model Born From Transformer

The Circle of Life for LLMs. Was the Reaction to DeepSeek Justified?

Optimizing AI Language Models: A Study of ChatGPT-4 vs. ChatGPT-4o