Exploring the Use of LLMs for Requirements Extraction from User Stories

Melis Tuğba Karalar
Ali Yazıcı
Selma Nazlıoğlu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose: This study examines the application of Large Language Models (LLMs) for generating software requirements from user stories, with a focus on how structured prompt patterns impact output quality. The aim is to evaluate the effectiveness of prompt engineering (PE) in enhancing the clarity, completeness, and semantic accuracy of generated requirements. Methods: Five leading LLMs, ChatGPT, Gemini, DeepSeek, Claude, and Qwen, were tested using a common dataset of user stories. Each model was evaluated with and without the application of a structured prompt pattern, using key metrics including ambiguity, completeness, semantic similarity, readability, and textual complexity. Results: The use of structured prompts substantially improved output quality scores relative to the unstructured baseline, most notably for ChatGPT (+59.17%) and Gemini (+26.07%). These prompts yielded requirements that were clearer, more complete, and less ambiguous. On the contrary, Qwen’s performance was robust and largely invariant to prompt structure. DeepSeek demonstrated higher readability without structure, yet its passive-voice metric improved by 17.33% with the introduction of structured prompts. Claude consistently scored the lowest across both conditions, producing outputs with greater ambiguity and syntactic complexity. Conclusion: The findings highlight the critical role of prompt engineering in improving LLM performance in generation of requirements. They also reveal model-specific sensitivity to prompt structure, with some models benefiting greatly, while others show limited responsiveness. For practitioners, this means software teams can adopt structured prompting to create clearer and more consistent requirements, reduce rework caused by ambiguity, and improve communication between roles. In practice, this can streamline workflows, improve collaboration, and support higher-quality project outcomes.

Version published to 10.21203/rs.3.rs-7932682/v1 on Research Square
Dec 1, 2025

Stakeholder Involvement and Planning Based on Large Language Models

This article has 2 authors:
1. Lachlan
2. Haoran
This article has no evaluationsLatest version Jan 9, 2026
An Empirical Evaluation of LLM-Assisted Sketch-Based Requirements Elicitation and Prototyping

This article has 6 authors:
1. Hamdan Alabsi
2. Sriram Srinivasan
3. Rand Obeidat
4. Nega Lakew
5. Mangle Andrew
6. Azene Zenebe
This article has no evaluationsLatest version Jan 30, 2026
LLM Aspect Prediction: Reviewing Academic Papers from Different Aspects with Large Language Model

This article has 3 authors:
1. Zihao Hu
2. Fumiyo Fukumoto
3. Dongjin Yu
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Stakeholder Involvement and Planning Based on Large Language Models

An Empirical Evaluation of LLM-Assisted Sketch-Based Requirements Elicitation and Prototyping

LLM Aspect Prediction: Reviewing Academic Papers from Different Aspects with Large Language Model