ChatGPT vs DeepSeek: A Comparative Evaluation on the International Computer Science Benchmark – ACM ICPC

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The effectiveness of two leading Gen AI models, ChatGPT and DeepSeek, is evaluated in addressing complex programming problems based on the ACM International Collegiate Programming Contest (ICPC), a widely accepted standard in competitive coding. The evaluation of both models, as far as readability, error handling, speed of computation, accuracy of code, and educational value, is given in the study. In a two-trial experimental setup, both models are evaluated on 145 different ICPC problems from data structures, algorithms, mathematics, geometry, advanced optimization, etc. The prompts for all these problems were standardized, and the evaluation took place across two iterations, mimicking iterative learning. The results indicate that both DeepSeek and ChatGPT improved their performance over time. Results show that DeepSeek consistently outperformed ChatGPT in code accuracy (88.28% vs. 84.14%), both generated more efficient algorithms for linear time complexity (41 vs. 19), and had lower logical error rates (7.58% vs. 15.86%). DeepSeek and ChatGPT performed almost the same in code quality scores (37.79 vs. 37.85). Approximately 46.90% of the solutions generated by DeepSeek were fully insightful, surpassing ChatGPT’s 42.07%. However, ChatGPT demonstrated significant improvement across trials, particularly drastically reducing syntax errors from 4.83–0.69%. This comparative analysis suggests that DeepSeek may be a more suitable option for high-stakes programming tasks. The findings offer valuable guidance for integrating GenAI tools into advanced programming education.

Article activity feed