Exploring the Use of LLMs for Requirements Extraction from User Stories
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose: This study examines the application of Large Language Models (LLMs) for generating software requirements from user stories, with a focus on how structured prompt patterns impact output quality. The aim is to evaluate the effectiveness of prompt engineering (PE) in enhancing the clarity, completeness, and semantic accuracy of generated requirements. Methods: Five leading LLMs, ChatGPT, Gemini, DeepSeek, Claude, and Qwen, were tested using a common dataset of user stories. Each model was evaluated with and without the application of a structured prompt pattern, using key metrics including ambiguity, completeness, semantic similarity, readability, and textual complexity. Results: The use of structured prompts substantially improved output quality scores relative to the unstructured baseline, most notably for ChatGPT (+59.17%) and Gemini (+26.07%). These prompts yielded requirements that were clearer, more complete, and less ambiguous. On the contrary, Qwen’s performance was robust and largely invariant to prompt structure. DeepSeek demonstrated higher readability without structure, yet its passive-voice metric improved by 17.33% with the introduction of structured prompts. Claude consistently scored the lowest across both conditions, producing outputs with greater ambiguity and syntactic complexity. Conclusion: The findings highlight the critical role of prompt engineering in improving LLM performance in generation of requirements. They also reveal model-specific sensitivity to prompt structure, with some models benefiting greatly, while others show limited responsiveness. For practitioners, this means software teams can adopt structured prompting to create clearer and more consistent requirements, reduce rework caused by ambiguity, and improve communication between roles. In practice, this can streamline workflows, improve collaboration, and support higher-quality project outcomes.