Semantically Enhanced Software Traceability Using Deep Learning Techniques

2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE) Pub Date : 2017-05-20 DOI:10.1109/ICSE.2017.9

Jin Guo, Jinghui Cheng, J. Cleland-Huang

{"title":"Semantically Enhanced Software Traceability Using Deep Learning Techniques","authors":"Jin Guo, Jinghui Cheng, J. Cleland-Huang","doi":"10.1109/ICSE.2017.9","DOIUrl":null,"url":null,"abstract":"In most safety-critical domains the need for traceability is prescribed by certifying bodies. Trace links are generally created among requirements, design, source code, test cases and other artifacts, however, creating such links manually is time consuming and error prone. Automated solutions use information retrieval and machine learning techniques to generate trace links, however, current techniques fail to understand semantics of the software artifacts or to integrate domain knowledge into the tracing process and therefore tend to deliver imprecise and inaccurate results. In this paper, we present a solution that uses deep learning to incorporate requirements artifact semantics and domain knowledge into the tracing solution. We propose a tracing network architecture that utilizes Word Embedding and Recurrent Neural Network (RNN) models to generate trace links. Word embedding learns word vectors that represent knowledge of the domain corpus and RNN uses these word vectors to learn the sentence semantics of requirements artifacts. We trained 360 different configurations of the tracing network using existing trace links in the Positive Train Control domain and identified the Bidirectional Gated Recurrent Unit (BI-GRU) as the best model for the tracing task. BI-GRU significantly out-performed state-of-the-art tracing methods including the Vector Space Model and Latent Semantic Indexing.","PeriodicalId":6505,"journal":{"name":"2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)","volume":"11 1","pages":"3-14"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"204","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2017.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 204

Abstract

In most safety-critical domains the need for traceability is prescribed by certifying bodies. Trace links are generally created among requirements, design, source code, test cases and other artifacts, however, creating such links manually is time consuming and error prone. Automated solutions use information retrieval and machine learning techniques to generate trace links, however, current techniques fail to understand semantics of the software artifacts or to integrate domain knowledge into the tracing process and therefore tend to deliver imprecise and inaccurate results. In this paper, we present a solution that uses deep learning to incorporate requirements artifact semantics and domain knowledge into the tracing solution. We propose a tracing network architecture that utilizes Word Embedding and Recurrent Neural Network (RNN) models to generate trace links. Word embedding learns word vectors that represent knowledge of the domain corpus and RNN uses these word vectors to learn the sentence semantics of requirements artifacts. We trained 360 different configurations of the tracing network using existing trace links in the Positive Train Control domain and identified the Bidirectional Gated Recurrent Unit (BI-GRU) as the best model for the tracing task. BI-GRU significantly out-performed state-of-the-art tracing methods including the Vector Space Model and Latent Semantic Indexing.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用深度学习技术在语义上增强软件可追溯性

在大多数安全关键领域，对可追溯性的需求是由认证机构规定的。跟踪链接通常是在需求、设计、源代码、测试用例和其他工件之间创建的，然而，手动创建这样的链接既耗时又容易出错。自动化解决方案使用信息检索和机器学习技术来生成跟踪链接，然而，当前的技术无法理解软件工件的语义或将领域知识集成到跟踪过程中，因此倾向于提供不精确和不准确的结果。在本文中，我们提出了一个使用深度学习将需求工件语义和领域知识合并到跟踪解决方案中的解决方案。我们提出了一种利用词嵌入和递归神经网络(RNN)模型来生成跟踪链接的跟踪网络架构。词嵌入学习表示领域语料库知识的词向量，RNN使用这些词向量来学习需求工件的句子语义。我们使用正列车控制域中现有的跟踪链路训练了360种不同的跟踪网络配置，并确定双向门控循环单元(BI-GRU)是跟踪任务的最佳模型。BI-GRU显著优于最先进的跟踪方法，包括向量空间模型和潜在语义索引。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量

期刊最新文献

Adaptive Unpacking of Android Apps Symbolic Model Extraction for Web Application Verification On Cross-Stack Configuration Errors Syntactic and Semantic Differencing for Combinatorial Models of Test Designs Fuzzy Fine-Grained Code-History Analysis