Systematic Evaluation of AI-Generated Python Code: A Comparative Study across Progressive Programming Tasks

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: AI-based code assistants are on the rise in software development as powerful technologies offering streamlining of code generation and better-quality code. However, their effectiveness is very variable, and understanding their pros and cons becomes very important in regard to using them optimally. Introduction: This study evaluates the capabilities of four prominent AI-based code assistants: GitHub Copilot, Microsoft Copilot, Tabnine, and ChatGPT. This study addresses whether the quality of produced code is functional, efficient, and maintainable and whether it has areas for improvement. Methodology: The AI-generated code was compared in terms of correctness, McCabe complexity (cyclomatic complexity), efficiency, and code size. The correctness percentage was the portion of code without errors, McCabe complexity was used for measuring structural complexity, execution performance represented efficiency, and the size of the code was just by lines of code. All AI tools were benchmarked against a standard set of 100 prompts to ensure like-for-like assessment. Results: GitHub Copilot had the highest correctness at 42%, and ChatGPT generated the most complex code—which was measured with a McCabe complexity score of 2.92. Efficiencywise, ChatGPT also topped with the highest number of codes that meet the "good" criteria. On average, Tabnine produced the shortest code, whereas GitHub Copilot and ChatGPT were among the most verbose. The analysis revealed that although AI-based assistants can generate high-quality code, they usually produce code far different from solutions written by developers themselves, and it is difficult for them to cope with dependencies between classes. Conclusion: AI-based code assistants have substantial potential for improvingcode generation and software development efficiency. However, challenges remain, particularly in handling complex dependencies and producing ready-to-use code. The study suggests that leveraging the strengths of different assistants and focusing on enhancing their ability to manage complex coding scenarios could lead to significant advancements. Ongoing research and development are essential to address these limitations and fully harness the potential of AI-based code assistants in software development.

Article activity feed