SimAC: simulating agile collaboration to generate acceptance criteria in user story elaboration

IF 2 2区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Automated Software Engineering Pub Date : 2024-06-21 DOI:10.1007/s10515-024-00448-7
Yishu Li, Jacky Keung, Zhen Yang, Xiaoxue Ma, Jingyu Zhang, Shuo Liu
{"title":"SimAC: simulating agile collaboration to generate acceptance criteria in user story elaboration","authors":"Yishu Li,&nbsp;Jacky Keung,&nbsp;Zhen Yang,&nbsp;Xiaoxue Ma,&nbsp;Jingyu Zhang,&nbsp;Shuo Liu","doi":"10.1007/s10515-024-00448-7","DOIUrl":null,"url":null,"abstract":"<div><p>In agile requirements engineering, Generating Acceptance Criteria (GAC) to elaborate user stories plays a pivotal role in the sprint planning phase, which provides a reference for delivering functional solutions. GAC requires extensive collaboration and human involvement. However, the lack of labeled datasets tailored for User Story attached with Acceptance Criteria (US-AC) poses significant challenges for supervised learning techniques attempting to automate this process. Recent advancements in Large Language Models (LLMs) have showcased their remarkable text-generation capabilities, bypassing the need for supervised fine-tuning. Consequently, LLMs offer the potential to overcome the above challenge. Motivated by this, we propose SimAC, a framework leveraging LLMs to simulate agile collaboration, with three distinct role groups: requirement analyst, quality analyst, and others. Initiated by role-based prompts, LLMs act in these roles sequentially, following a create-update-update paradigm in GAC. Owing to the unavailability of ground truths, we invited practitioners to build a gold standard serving as a benchmark to evaluate the completeness and validity of auto-generated US-AC against human-crafted ones. Additionally, we invited eight experienced agile practitioners to evaluate the quality of US-AC using the INVEST framework. The results demonstrate consistent improvements across all tested LLMs, including the LLaMA and GPT-3.5 series. Notably, SimAC significantly enhances the ability of gpt-3.5-turbo in GAC, achieving improvements of 29.48% in completeness and 15.56% in validity, along with the highest INVEST satisfaction score of 3.21/4. Furthermore, this study also provides case studies to illustrate SimAC’s effectiveness and limitations, shedding light on the potential of LLMs in automated agile requirements engineering.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-024-00448-7","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

In agile requirements engineering, Generating Acceptance Criteria (GAC) to elaborate user stories plays a pivotal role in the sprint planning phase, which provides a reference for delivering functional solutions. GAC requires extensive collaboration and human involvement. However, the lack of labeled datasets tailored for User Story attached with Acceptance Criteria (US-AC) poses significant challenges for supervised learning techniques attempting to automate this process. Recent advancements in Large Language Models (LLMs) have showcased their remarkable text-generation capabilities, bypassing the need for supervised fine-tuning. Consequently, LLMs offer the potential to overcome the above challenge. Motivated by this, we propose SimAC, a framework leveraging LLMs to simulate agile collaboration, with three distinct role groups: requirement analyst, quality analyst, and others. Initiated by role-based prompts, LLMs act in these roles sequentially, following a create-update-update paradigm in GAC. Owing to the unavailability of ground truths, we invited practitioners to build a gold standard serving as a benchmark to evaluate the completeness and validity of auto-generated US-AC against human-crafted ones. Additionally, we invited eight experienced agile practitioners to evaluate the quality of US-AC using the INVEST framework. The results demonstrate consistent improvements across all tested LLMs, including the LLaMA and GPT-3.5 series. Notably, SimAC significantly enhances the ability of gpt-3.5-turbo in GAC, achieving improvements of 29.48% in completeness and 15.56% in validity, along with the highest INVEST satisfaction score of 3.21/4. Furthermore, this study also provides case studies to illustrate SimAC’s effectiveness and limitations, shedding light on the potential of LLMs in automated agile requirements engineering.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SimAC:模拟敏捷协作,在用户故事阐述中生成验收标准
在敏捷需求工程中,生成验收标准(GAC)以阐述用户故事在冲刺计划阶段起着关键作用,它为交付功能解决方案提供了参考。GAC 需要广泛的协作和人工参与。然而,由于缺乏为附有验收标准的用户故事(US-AC)量身定制的标记数据集,这给试图将这一过程自动化的监督学习技术带来了巨大挑战。大型语言模型(LLMs)的最新进展展示了其卓越的文本生成能力,绕过了监督微调的需要。因此,LLM 具备克服上述挑战的潜力。受此启发,我们提出了 SimAC,一个利用 LLM 模拟敏捷协作的框架,其中包含三个不同的角色组:需求分析师、质量分析师和其他。在基于角色的提示启动下,LLMs 按照 GAC 中的创建-更新-再创建-再更新模式依次扮演这些角色。由于无法获得基本事实,我们邀请从业人员建立了一个黄金标准,作为评估自动生成的 US-AC 与人工创建的 US-AC 的完整性和有效性的基准。此外,我们还邀请了八位经验丰富的敏捷实践者使用 INVEST 框架评估 US-AC 的质量。结果表明,所有测试过的 LLM(包括 LLaMA 和 GPT-3.5 系列)都得到了一致的改进。值得注意的是,SimAC 显著增强了 GPT-3.5-turbo 在 GAC 中的能力,在完整性和有效性方面分别提高了 29.48% 和 15.56%,INVEST 满意度得分最高,分别为 3.21/4。此外,本研究还通过案例分析说明了 SimAC 的有效性和局限性,揭示了 LLM 在自动化敏捷需求工程中的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Automated Software Engineering
Automated Software Engineering 工程技术-计算机:软件工程
CiteScore
4.80
自引率
11.80%
发文量
51
审稿时长
>12 weeks
期刊介绍: This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.
期刊最新文献
Evoattack: suppressive adversarial attacks against object detection models using evolutionary search Multi-objective improvement of Android applications Contractsentry: a static analysis tool for smart contract vulnerability detection Exploring the impact of code review factors on the code review comment generation A holistic approach to software fault prediction with dynamic classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1