RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Requirements elicitation is still one of the most challenging activities of the requirements engineering process due to the difficulty requirements analysts face in understanding and translating complex needs into concrete requirements that directly impact the quality of the software to be developed. Although automated tools allow for assessing the syntactic quality of requirements, evaluating semantic metrics (e.g., language clarity, internal consistency) remains a manual and time-consuming activity. This paper explores how LLMs can help automate requirements elicitation within agile frameworks, where requirements are defined as user stories. We used 10 state-of-the-art LLMs to investigate their ability to generate user stories automatically by emulating customer interviews. We evaluated the quality of user stories generated by LLMs, comparing it with the quality of user stories generated by humans (domain experts and students). We also explored whether and how LLMs can be used to automatically evaluate the semantic quality of user stories. Our results indicate that LLMs can generate user stories similar to humans in terms of coverage and stylistic quality, but exhibit lower diversity and creativity. Although LLM-generated user stories are generally comparable in quality to those created by humans, they tend to meet the acceptance quality criteria less frequently, regardless of the scale of the LLM model. Finally, LLMs can reliably assess the semantic quality of user stories when provided with clear evaluation criteria and have the potential to reduce human effort in large-scale assessments.

Can LLMs Generate User Stories and Assess Their Quality?

Quattrocchi, Giovanni;Pasquale, Liliana;Spoletini, Paola;Baresi, Luciano

2026-01-01

Abstract

Requirements elicitation is still one of the most challenging activities of the requirements engineering process due to the difficulty requirements analysts face in understanding and translating complex needs into concrete requirements that directly impact the quality of the software to be developed. Although automated tools allow for assessing the syntactic quality of requirements, evaluating semantic metrics (e.g., language clarity, internal consistency) remains a manual and time-consuming activity. This paper explores how LLMs can help automate requirements elicitation within agile frameworks, where requirements are defined as user stories. We used 10 state-of-the-art LLMs to investigate their ability to generate user stories automatically by emulating customer interviews. We evaluated the quality of user stories generated by LLMs, comparing it with the quality of user stories generated by humans (domain experts and students). We also explored whether and how LLMs can be used to automatically evaluate the semantic quality of user stories. Our results indicate that LLMs can generate user stories similar to humans in terms of coverage and stylistic quality, but exhibit lower diversity and creativity. Although LLM-generated user stories are generally comparable in quality to those created by humans, they tend to meet the acceptance quality criteria less frequently, regardless of the scale of the LLM model. Finally, LLMs can reliably assess the semantic quality of user stories when provided with clear evaluation criteria and have the potential to reduce human effort in large-scale assessments.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2026
			
	Titolo della rivista
	
				IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
			
	Parole chiave
	
				agile methods
Large language models
requirement elicitation
requirement engineering
requirement quality
user stories
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1316166

Citazioni

ND

0

0

ND

social impact