DRIP: Segmenting individual requirements from software requirement documents

Ziyan Zhao, Li Zhang, Xiaoli Lian, Heyang Lv
{"title":"DRIP: Segmenting individual requirements from software requirement documents","authors":"Ziyan Zhao, Li Zhang, Xiaoli Lian, Heyang Lv","doi":"10.1002/spe.3303","DOIUrl":null,"url":null,"abstract":"Numerous academic research projects and industrial tasks related to software engineering require individual requirements as input. Unfortunately, according to our observation, several requirements may be packed in one paragraph without explicit boundaries in specification documents. To understand this problem's prevalence, we performed a preliminary study on the open requirement documents widely used in the academic community over the last 10 years, and found that 26% of them include this phenomenon. Several text segmentation approaches have been reported; however, they tend to identify topically coherent units which may contain more than one requirement. What is more, they do not take the constitutions of semantic units of requirements into consideration. Here we report a two-phase learning-based approach named DRIP to segment individual requirements from paragraphs. To be specific, we first propose a Requirement Segmentation Siamese framework, which models the similarity of sentences and their conjunction relations, and then detects the initial boundaries between individual requirements. Then, we optimize the boundaries heuristically based on the semantic completeness validation of the segments. Experiments with 1132 paragraphs and 6826 sentences show that DRIP outperforms the popular unsupervised and supervised text segmentation algorithms with respect to processing different documents (with accuracy gains of 57.65%–187.53%) and processing paragraphs of different complexity (with average accuracy gains of 54.46%–158.68%). We also show the importance of each component of DRIP to the segmentation.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Numerous academic research projects and industrial tasks related to software engineering require individual requirements as input. Unfortunately, according to our observation, several requirements may be packed in one paragraph without explicit boundaries in specification documents. To understand this problem's prevalence, we performed a preliminary study on the open requirement documents widely used in the academic community over the last 10 years, and found that 26% of them include this phenomenon. Several text segmentation approaches have been reported; however, they tend to identify topically coherent units which may contain more than one requirement. What is more, they do not take the constitutions of semantic units of requirements into consideration. Here we report a two-phase learning-based approach named DRIP to segment individual requirements from paragraphs. To be specific, we first propose a Requirement Segmentation Siamese framework, which models the similarity of sentences and their conjunction relations, and then detects the initial boundaries between individual requirements. Then, we optimize the boundaries heuristically based on the semantic completeness validation of the segments. Experiments with 1132 paragraphs and 6826 sentences show that DRIP outperforms the popular unsupervised and supervised text segmentation algorithms with respect to processing different documents (with accuracy gains of 57.65%–187.53%) and processing paragraphs of different complexity (with average accuracy gains of 54.46%–158.68%). We also show the importance of each component of DRIP to the segmentation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DRIP:从软件需求文档中分离出个性化需求
许多与软件工程相关的学术研究项目和工业任务都需要将单个需求作为输入。遗憾的是,根据我们的观察,在规范文档中,多个需求可能被打包在一个段落中,而没有明确的界限。为了了解这一问题的普遍性,我们对过去 10 年中学术界广泛使用的开放式需求文档进行了初步研究,发现其中 26% 的文档存在这种现象。目前已经报道了几种文本分割方法,但这些方法倾向于识别可能包含一个以上需求的拓扑连贯单元。此外,它们也没有考虑到需求语义单元的构成。在此,我们报告了一种名为 DRIP 的基于学习的两阶段方法,用于从段落中分割出单个需求。具体来说,我们首先提出了一个 "需求分割连体框架"(Requirement Segmentation Siamese Framework),该框架对句子及其连接关系的相似性进行建模,然后检测单个需求之间的初始边界。然后,我们根据分段的语义完整性验证,启发式地优化边界。对 1132 个段落和 6826 个句子的实验表明,在处理不同文档(准确率提高了 57.65%-187.53%)和处理不同复杂度段落(平均准确率提高了 54.46%-158.68%)方面,DRIP 优于流行的无监督和有监督文本分割算法。我们还展示了 DRIP 各组成部分对分段的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Algorithms for generating small random samples A comprehensive survey of UPPAAL‐assisted formal modeling and verification Large scale system design aided by modelling and DES simulation: A Petri net approach Empowering software startups with agile methods and practices: A design science research Space‐efficient data structures for the inference of subsumption and disjointness relations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1