How does DeepSeek-R1 perform on USMLE?
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
DeepSeek, a Chinese artificial intelligence company, released its first free chatbot app based on its DeepSeek-R1 model. DeepSeek provides its models, algorithms, and training details to ensure transparency and reproducibility. Their new model is trained with reinforcement learning, allowing it to learn through interactions and feedback rather than relying solely on supervised learning. Reports showcase that DeepSeek’s model shows competitive performances against established large language models (LLMs) such as Anthropic’s Claude and OpenAI’s GPT-4o on established benchmarks in language understanding, mathematics (AIME 2024) and programming (Codeforces) while trained at a fraction of the costs. Additionally, running inference shows significantly lower costs, leading to DeepSeek surpassing ChatGPT as the most downloaded free app on the American iOS App Store. This development contributed to a nearly 17% drop in Nvidia’s share price, resulting in the most significant one-day loss in U.S. history, amounting to nearly $600 billion. The open-source models also bring a significant shift in the healthcare system, allowing cost-efficient medical LLMs to be deployed within hospital networks. To understand its performance in the healthcare sector, we analyse the new DeepSeek-R1 model on the United States Medical Licensing Examination (USMLE) and compare it to ChatGPT.