From Automation to Certification: Benchmarking AI Chatbots in Software Testing

Niklas Retzlaff

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Artificial intelligence (AI) chatbots, powered by large language models (LLMs), are reshaping software testing by automating critical tasks and enhancing productivity. This study evaluates the performance of two state-of-the-art LLMs, GPT-4o and Gemini 2.0 Flash Experimental, on practice exams for four industry-recognized software testing certifications: A4Q-SDET, ISTQB Certified Tester Foundation Level (CTFL), Advanced Level Test Manager (CTAL-TM), and Expert Level Test Manager (CTEL-TM). Both models demonstrated substantial competency, achieving passing scores across all exams. GPT-4o excelled in foundational and advanced managerial tasks, while Gemini outperformed in technical and practical test scenarios. An analysis of their performance across cognitive levels (K1–K4) reveals complementary strengths, with GPT-4o showing superior analytical capabilities (K4) and Gemini maintaining consistent performance across all levels. These findings highlight the potential of LLMs as tools for bridging knowledge gaps and enhancing software testing processes. Future research should explore real-world testing applications and the integration of LLMs into software testing workflows.

Version published to 10.20944/preprints202501.0770.v1
Jan 10, 2025

Requirements-Driven Automated Software Testing: A Systematic Review

This article has 5 authors:
1. Fanyu Wang
2. Chetan Arora
3. Chakkrit Tantithamthavorn
4. Kaicheng Huang
5. Aldeida Aleti
This article has no evaluationsLatest version Feb 10, 2025
Systems Engineering of Large Language Models for Enterprise Applications

This article has 1 author:
1. Shubham Singh
This article has no evaluationsLatest version Jan 9, 2025
From Informal to Formal – Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs

This article has 2 authors:
1. Jialun Cao
2. Yaojie Lu
This article has no evaluationsLatest version Feb 18, 2025

Listed in

Abstract

Article activity feed

Related articles

Requirements-Driven Automated Software Testing: A Systematic Review

Systems Engineering of Large Language Models for Enterprise Applications

From Informal to Formal – Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs