AI-Assisted Test Scope Recommendation for Manual QA: A Framework and Evaluation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Determining the scope of testing required for a software change is among the most consequential yet least structured decisions in manual quality assurance. In many enterprise teams, this decision relies on practitioner experience and informal negotiation, producing outcomes that vary with team composition rather than change complexity. This paper presents the AI-Assisted Test Scope Recommender (ATSR), a framework that applies large language models to generate structured test scope recommendations from feature specifications. ATSR produces multi-dimensional recommendations spanning five test type categories—functional, smoke, regression, integration, and exploratory— alongside a calibrated coverage depth (LOW, MEDIUM, or HIGH) based on change type classification, an indicative effort estimate, and actionable clarification flags identifying information gaps that would materially affect scope decisions. The framework was evaluated against 18 synthetic specifications derived from enterprise restaurant technology quality assurance practice, using a five-dimension rubric measuring Completeness, Specificity, Risk Calibration, Integration Awareness, and Clarification Flag Quality. ATSR achieved an overall score of 84.4% (380 out of 450 points), with near-perfect coverage depth accuracy (97.8%) and strong completeness (91.1%). To isolate the contribution of the framework design from the underlying model capability, a naive baseline using the same model and output schema with an unstructured prompt achieved 67.8% overall and correct depth calibration in only 33.3% of cases. An independent quality assurance practitioner scored ATSR outputs for six representative specifications at 91.3%, providing evidence that primary evaluator scores are conservative. The results demonstrate that structured prompting frameworks contribute materially and measurably over unstructured large language model querying for test scope planning.