Extracting features from requirements: Achieving accuracy and automation with neural networks

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2018-03-20 DOI:10.1109/SANER.2018.8330243

Y. Li, Sandro Schulze, G. Saake

{"title":"Extracting features from requirements: Achieving accuracy and automation with neural networks","authors":"Y. Li, Sandro Schulze, G. Saake","doi":"10.1109/SANER.2018.8330243","DOIUrl":null,"url":null,"abstract":"Analyzing and extracting features and variability from different artifacts is an indispensable activity to support systematic integration of single software systems and Software Product Line (SPL). Beyond manually extracting variability, a variety of approaches, such as feature location in source code and feature extraction in requirements, has been proposed for automating the identification of features and their variation points. While requirements contain more complete variability information and provide traceability links to other artifacts, current techniques exhibit a lack of accuracy as well as a limited degree of automation. In this paper, we propose an unsupervised learning structure to overcome the abovementioned limitations. In particular, our technique consists of two steps: First, we apply Laplacian Eigenmaps, an unsupervised dimensionality reduction technique, to embed text requirements into compact binary codes. Second, requirements are transformed into a matrix representation by looking up a pre-trained word embedding. Then, the matrix is fed into CNN to learn linguistic characteristics of the requirements. Furthermore, we train CNN by matching the output of CNN with the pre-trained binary codes. Initial results show that accuracy is still limited, but that our approach allows to automate the entire process.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"16 1","pages":"477-481"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2018.8330243","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Analyzing and extracting features and variability from different artifacts is an indispensable activity to support systematic integration of single software systems and Software Product Line (SPL). Beyond manually extracting variability, a variety of approaches, such as feature location in source code and feature extraction in requirements, has been proposed for automating the identification of features and their variation points. While requirements contain more complete variability information and provide traceability links to other artifacts, current techniques exhibit a lack of accuracy as well as a limited degree of automation. In this paper, we propose an unsupervised learning structure to overcome the abovementioned limitations. In particular, our technique consists of two steps: First, we apply Laplacian Eigenmaps, an unsupervised dimensionality reduction technique, to embed text requirements into compact binary codes. Second, requirements are transformed into a matrix representation by looking up a pre-trained word embedding. Then, the matrix is fed into CNN to learn linguistic characteristics of the requirements. Furthermore, we train CNN by matching the output of CNN with the pre-trained binary codes. Initial results show that accuracy is still limited, but that our approach allows to automate the entire process.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从需求中提取特征:用神经网络实现准确性和自动化

分析并从不同的工件中提取特征和可变性是支持单个软件系统和软件产品线(SPL)的系统集成必不可少的活动。除了手动提取可变性之外，已经提出了各种方法，例如源代码中的特征定位和需求中的特征提取，用于自动识别特征及其变异点。虽然需求包含更完整的可变性信息，并提供到其他工件的可追溯性链接，但当前的技术显示出缺乏准确性以及有限程度的自动化。在本文中，我们提出了一种无监督学习结构来克服上述限制。特别是，我们的技术包括两个步骤:首先，我们应用拉普拉斯特征映射，一种无监督降维技术，将文本需求嵌入到紧凑的二进制代码中。其次，通过查找预训练的词嵌入，将需求转换为矩阵表示。然后，将矩阵输入CNN学习语言特征的需求。此外，我们通过将CNN的输出与预训练的二进制代码进行匹配来训练CNN。初步结果表明，准确性仍然有限，但我们的方法允许整个过程自动化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量

期刊最新文献

Exploring the integration of user feedback in automated testing of Android applications The Statechart Workbench: Enabling scalable software event log analysis using process mining Detecting code smells using machine learning techniques: Are we there yet? Classifying stack overflow posts on API issues Re-evaluating method-level bug prediction