From Automation to Certification: Benchmarking AI Chatbots in Software Testing

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Artificial intelligence (AI) chatbots, powered by large language models (LLMs), are reshaping software testing by automating critical tasks and enhancing productivity. This study evaluates the performance of two state-of-the-art LLMs, GPT-4o and Gemini 2.0 Flash Experimental, on practice exams for four industry-recognized software testing certifications: A4Q-SDET, ISTQB Certified Tester Foundation Level (CTFL), Advanced Level Test Manager (CTAL-TM), and Expert Level Test Manager (CTEL-TM). Both models demonstrated substantial competency, achieving passing scores across all exams. GPT-4o excelled in foundational and advanced managerial tasks, while Gemini outperformed in technical and practical test scenarios. An analysis of their performance across cognitive levels (K1–K4) reveals complementary strengths, with GPT-4o showing superior analytical capabilities (K4) and Gemini maintaining consistent performance across all levels. These findings highlight the potential of LLMs as tools for bridging knowledge gaps and enhancing software testing processes. Future research should explore real-world testing applications and the integration of LLMs into software testing workflows.

Article activity feed