基于Bug和软件变更库的故障定位词嵌入模型

Q4 Environmental Science Iranian Journal of Botany Pub Date : 2020-07-22 DOI:10.33897/fujeas.v1i1.201

Aqib Rehman

{"title":"基于Bug和软件变更库的故障定位词嵌入模型","authors":"Aqib Rehman","doi":"10.33897/fujeas.v1i1.201","DOIUrl":null,"url":null,"abstract":"Software developed and then deployed in a real world environment is inevitable to exhibit some undesirable behavior. Therefore, developers need to provide maintenance facilities to enable the bugs causing the undesirable behavior to be fixed. However, prior to fixing the bug, the suspicious part of the code needs to be identified. For this purpose, they usually perform fault localization. This can be done manually as well as automatically. Several techniques exist in the literature for fault localization. However, most of them are static based techniques because they do not depend on a specific programming language along with the possibility to work on underdeveloped software and some other benefits. These techniques are largely based on lexical matching of terms which leads to mismatch of terms, large precision value because of limited vocabulary of a programming language and some techniques consider the semantics but it is computationally expensive to localize faults through this. In this paper we have proposed a fault localization technique which is based on the machine learning concept of word embedding. Our proposed approach aims at looking at the relatedness between the bug terms and source code artifact. We mined the bug repositories and software change repositories to train the word embedding model on the mined repositories data. On the arrival of a new bug, the cluster of the bugs from the model is searched and the files from the software change repositories are retrieved which are used for fixing those bugs. We have compared the results of our approach with the latest technique proposed in year 2018 Pointwise Mutual Information (PMI) and Normalized Google Distance (NGD) which consider the context and also with existing lexical techniques Vector Space Model (VSM) and the semantic based method Latent Semantic Indexing (LSI). We have used the benchmark dataset “MoreBugs” which has been widely used in this domain. The results show that our approach outperforms other techniques.","PeriodicalId":36255,"journal":{"name":"Iranian Journal of Botany","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Word Embedding Model for Fault Localization using Bug and Software Change Repositories\",\"authors\":\"Aqib Rehman\",\"doi\":\"10.33897/fujeas.v1i1.201\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software developed and then deployed in a real world environment is inevitable to exhibit some undesirable behavior. Therefore, developers need to provide maintenance facilities to enable the bugs causing the undesirable behavior to be fixed. However, prior to fixing the bug, the suspicious part of the code needs to be identified. For this purpose, they usually perform fault localization. This can be done manually as well as automatically. Several techniques exist in the literature for fault localization. However, most of them are static based techniques because they do not depend on a specific programming language along with the possibility to work on underdeveloped software and some other benefits. These techniques are largely based on lexical matching of terms which leads to mismatch of terms, large precision value because of limited vocabulary of a programming language and some techniques consider the semantics but it is computationally expensive to localize faults through this. In this paper we have proposed a fault localization technique which is based on the machine learning concept of word embedding. Our proposed approach aims at looking at the relatedness between the bug terms and source code artifact. We mined the bug repositories and software change repositories to train the word embedding model on the mined repositories data. On the arrival of a new bug, the cluster of the bugs from the model is searched and the files from the software change repositories are retrieved which are used for fixing those bugs. We have compared the results of our approach with the latest technique proposed in year 2018 Pointwise Mutual Information (PMI) and Normalized Google Distance (NGD) which consider the context and also with existing lexical techniques Vector Space Model (VSM) and the semantic based method Latent Semantic Indexing (LSI). We have used the benchmark dataset “MoreBugs” which has been widely used in this domain. The results show that our approach outperforms other techniques.\",\"PeriodicalId\":36255,\"journal\":{\"name\":\"Iranian Journal of Botany\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Iranian Journal of Botany\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33897/fujeas.v1i1.201\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Environmental Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Iranian Journal of Botany","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33897/fujeas.v1i1.201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Environmental Science","Score":null,"Total":0}

引用次数: 0

摘要

开发并部署到现实环境中的软件不可避免地会出现一些不良行为。因此，开发人员需要提供维护工具来修复导致不良行为的错误。但是，在修复错误之前，需要识别代码的可疑部分。为此，他们通常执行故障定位。这可以手动完成，也可以自动完成。文献中存在几种故障定位技术。然而，它们中的大多数都是基于静态的技术，因为它们不依赖于特定的编程语言，并且可以在未开发的软件上工作，并具有其他一些好处。这些技术主要基于术语的词汇匹配，这导致了术语的不匹配，由于编程语言的词汇量有限，精度值很大，一些技术考虑了语义，但通过这种方法来定位错误的计算成本很高。本文提出了一种基于词嵌入机器学习概念的故障定位技术。我们建议的方法旨在查看bug术语和源代码工件之间的关系。我们挖掘bug库和软件变更库，在挖掘的库数据上训练词嵌入模型。在出现新错误时，将搜索模型中的错误集群，并检索用于修复这些错误的软件变更存储库中的文件。我们将我们的方法与2018年提出的考虑上下文的点互信息(PMI)和归一化谷歌距离(NGD)的最新技术以及现有的词汇技术向量空间模型(VSM)和基于语义的潜在语义索引(LSI)方法进行了比较。我们使用了在该领域广泛使用的基准数据集“MoreBugs”。结果表明，我们的方法优于其他技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Word Embedding Model for Fault Localization using Bug and Software Change Repositories

Software developed and then deployed in a real world environment is inevitable to exhibit some undesirable behavior. Therefore, developers need to provide maintenance facilities to enable the bugs causing the undesirable behavior to be fixed. However, prior to fixing the bug, the suspicious part of the code needs to be identified. For this purpose, they usually perform fault localization. This can be done manually as well as automatically. Several techniques exist in the literature for fault localization. However, most of them are static based techniques because they do not depend on a specific programming language along with the possibility to work on underdeveloped software and some other benefits. These techniques are largely based on lexical matching of terms which leads to mismatch of terms, large precision value because of limited vocabulary of a programming language and some techniques consider the semantics but it is computationally expensive to localize faults through this. In this paper we have proposed a fault localization technique which is based on the machine learning concept of word embedding. Our proposed approach aims at looking at the relatedness between the bug terms and source code artifact. We mined the bug repositories and software change repositories to train the word embedding model on the mined repositories data. On the arrival of a new bug, the cluster of the bugs from the model is searched and the files from the software change repositories are retrieved which are used for fixing those bugs. We have compared the results of our approach with the latest technique proposed in year 2018 Pointwise Mutual Information (PMI) and Normalized Google Distance (NGD) which consider the context and also with existing lexical techniques Vector Space Model (VSM) and the semantic based method Latent Semantic Indexing (LSI). We have used the benchmark dataset “MoreBugs” which has been widely used in this domain. The results show that our approach outperforms other techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊