AI-Assisted Test Scope Recommendation for Manual QA: A Framework and Evaluation

Arbaz Surti

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Determining the scope of testing required for a software change is among the most consequential yet least structured decisions in manual quality assurance. In many enterprise teams, this decision relies on practitioner experience and informal negotiation, producing outcomes that vary with team composition rather than change complexity. This paper presents the AI-Assisted Test Scope Recommender (ATSR), a framework that applies large language models to generate structured test scope recommendations from feature specifications. ATSR produces multi-dimensional recommendations spanning five test type categories—functional, smoke, regression, integration, and exploratory— alongside a calibrated coverage depth (LOW, MEDIUM, or HIGH) based on change type classification, an indicative effort estimate, and actionable clarification flags identifying information gaps that would materially affect scope decisions. The framework was evaluated against 18 synthetic specifications derived from enterprise restaurant technology quality assurance practice, using a five-dimension rubric measuring Completeness, Specificity, Risk Calibration, Integration Awareness, and Clarification Flag Quality. ATSR achieved an overall score of 84.4% (380 out of 450 points), with near-perfect coverage depth accuracy (97.8%) and strong completeness (91.1%). To isolate the contribution of the framework design from the underlying model capability, a naive baseline using the same model and output schema with an unstructured prompt achieved 67.8% overall and correct depth calibration in only 33.3% of cases. An independent quality assurance practitioner scored ATSR outputs for six representative specifications at 91.3%, providing evidence that primary evaluator scores are conservative. The results demonstrate that structured prompting frameworks contribute materially and measurably over unstructured large language model querying for test scope planning.

Version published to 10.21203/rs.3.rs-9193663/v1 on Research Square
Apr 8, 2026

Bridging Developer–QA Gaps Using Large Language Models and Automation: A Pilot Evaluation of AutoVisQA

This article has 1 author:
1. Tanvir Hasan
This article has no evaluationsLatest version Apr 17, 2026
Large Language Models for Material Science: A Systematic Review

This article has 2 authors:
1. Cecília Coelho
2. Oliver Niggemann
This article has no evaluationsLatest version Apr 14, 2026
A Taylorizable Process for Textual Detector Development

This article has 5 authors:
1. Ryan Shaun Baker
2. Caitlin Mills
3. Caitlin Mills
4. Andrew Lan
5. Amanda Barany
This article has no evaluationsLatest version Apr 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Bridging Developer–QA Gaps Using Large Language Models and Automation: A Pilot Evaluation of AutoVisQA

Large Language Models for Material Science: A Systematic Review

A Taylorizable Process for Textual Detector Development