DRIP: Segmenting individual requirements from software requirement documents

Software: Practice and Experience Pub Date : 2023-12-19 DOI:10.1002/spe.3303

Ziyan Zhao, Li Zhang, Xiaoli Lian, Heyang Lv

{"title":"DRIP: Segmenting individual requirements from software requirement documents","authors":"Ziyan Zhao, Li Zhang, Xiaoli Lian, Heyang Lv","doi":"10.1002/spe.3303","DOIUrl":null,"url":null,"abstract":"Numerous academic research projects and industrial tasks related to software engineering require individual requirements as input. Unfortunately, according to our observation, several requirements may be packed in one paragraph without explicit boundaries in specification documents. To understand this problem's prevalence, we performed a preliminary study on the open requirement documents widely used in the academic community over the last 10 years, and found that 26% of them include this phenomenon. Several text segmentation approaches have been reported; however, they tend to identify topically coherent units which may contain more than one requirement. What is more, they do not take the constitutions of semantic units of requirements into consideration. Here we report a two-phase learning-based approach named DRIP to segment individual requirements from paragraphs. To be specific, we first propose a Requirement Segmentation Siamese framework, which models the similarity of sentences and their conjunction relations, and then detects the initial boundaries between individual requirements. Then, we optimize the boundaries heuristically based on the semantic completeness validation of the segments. Experiments with 1132 paragraphs and 6826 sentences show that DRIP outperforms the popular unsupervised and supervised text segmentation algorithms with respect to processing different documents (with accuracy gains of 57.65%–187.53%) and processing paragraphs of different complexity (with average accuracy gains of 54.46%–158.68%). We also show the importance of each component of DRIP to the segmentation.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":"238 2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Numerous academic research projects and industrial tasks related to software engineering require individual requirements as input. Unfortunately, according to our observation, several requirements may be packed in one paragraph without explicit boundaries in specification documents. To understand this problem's prevalence, we performed a preliminary study on the open requirement documents widely used in the academic community over the last 10 years, and found that 26% of them include this phenomenon. Several text segmentation approaches have been reported; however, they tend to identify topically coherent units which may contain more than one requirement. What is more, they do not take the constitutions of semantic units of requirements into consideration. Here we report a two-phase learning-based approach named DRIP to segment individual requirements from paragraphs. To be specific, we first propose a Requirement Segmentation Siamese framework, which models the similarity of sentences and their conjunction relations, and then detects the initial boundaries between individual requirements. Then, we optimize the boundaries heuristically based on the semantic completeness validation of the segments. Experiments with 1132 paragraphs and 6826 sentences show that DRIP outperforms the popular unsupervised and supervised text segmentation algorithms with respect to processing different documents (with accuracy gains of 57.65%–187.53%) and processing paragraphs of different complexity (with average accuracy gains of 54.46%–158.68%). We also show the importance of each component of DRIP to the segmentation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DRIP：从软件需求文档中分离出个性化需求

许多与软件工程相关的学术研究项目和工业任务都需要将单个需求作为输入。遗憾的是，根据我们的观察，在规范文档中，多个需求可能被打包在一个段落中，而没有明确的界限。为了了解这一问题的普遍性，我们对过去 10 年中学术界广泛使用的开放式需求文档进行了初步研究，发现其中 26% 的文档存在这种现象。目前已经报道了几种文本分割方法，但这些方法倾向于识别可能包含一个以上需求的拓扑连贯单元。此外，它们也没有考虑到需求语义单元的构成。在此，我们报告了一种名为 DRIP 的基于学习的两阶段方法，用于从段落中分割出单个需求。具体来说，我们首先提出了一个 "需求分割连体框架"（Requirement Segmentation Siamese Framework），该框架对句子及其连接关系的相似性进行建模，然后检测单个需求之间的初始边界。然后，我们根据分段的语义完整性验证，启发式地优化边界。对 1132 个段落和 6826 个句子的实验表明，在处理不同文档（准确率提高了 57.65%-187.53%）和处理不同复杂度段落（平均准确率提高了 54.46%-158.68%）方面，DRIP 优于流行的无监督和有监督文本分割算法。我们还展示了 DRIP 各组成部分对分段的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Software: Practice and Experience

自引率

0.00%

发文量