一个增量更新框架，用于从软件库中有效检索bug定位

2013 20th Working Conference on Reverse Engineering (WCRE) Pub Date : 1900-01-01 DOI:10.1109/WCRE.2013.6671281

Shivani Rao, Henry Medeiros, A. Kak

{"title":"一个增量更新框架，用于从软件库中有效检索bug定位","authors":"Shivani Rao, Henry Medeiros, A. Kak","doi":"10.1109/WCRE.2013.6671281","DOIUrl":null,"url":null,"abstract":"Information Retrieval (IR) based bug localization techniques use a bug reports to query a software repository to retrieve relevant source files. These techniques index the source files in the software repository and train a model which is then queried for retrieval purposes. Much of the current research is focused on improving the retrieval effectiveness of these methods. However, little consideration has been given to the efficiency of such approaches for software repositories that are constantly evolving. As the software repository evolves, the index creation and model learning have to be repeated to ensure accuracy of retrieval for each new bug. In doing so, the query latency may be unreasonably high, and also, re-computing the index and the model for files that did not change is computationally redundant. We propose an incremental update framework to continuously update the index and the model using the changes made at each commit. We demonstrate that the same retrieval accuracy can be achieved but with a fraction of the time needed by current approaches. Our results are based on two basic IR modeling techniques - Vector Space Model (VSM) and Smoothed Unigram Model (SUM). The dataset we used in our validation experiments was created by tracking commit history of AspectJ and JodaTime software libraries over a span of 10 years.","PeriodicalId":275092,"journal":{"name":"2013 20th Working Conference on Reverse Engineering (WCRE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"An incremental update framework for efficient retrieval from software libraries for bug localization\",\"authors\":\"Shivani Rao, Henry Medeiros, A. Kak\",\"doi\":\"10.1109/WCRE.2013.6671281\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information Retrieval (IR) based bug localization techniques use a bug reports to query a software repository to retrieve relevant source files. These techniques index the source files in the software repository and train a model which is then queried for retrieval purposes. Much of the current research is focused on improving the retrieval effectiveness of these methods. However, little consideration has been given to the efficiency of such approaches for software repositories that are constantly evolving. As the software repository evolves, the index creation and model learning have to be repeated to ensure accuracy of retrieval for each new bug. In doing so, the query latency may be unreasonably high, and also, re-computing the index and the model for files that did not change is computationally redundant. We propose an incremental update framework to continuously update the index and the model using the changes made at each commit. We demonstrate that the same retrieval accuracy can be achieved but with a fraction of the time needed by current approaches. Our results are based on two basic IR modeling techniques - Vector Space Model (VSM) and Smoothed Unigram Model (SUM). The dataset we used in our validation experiments was created by tracking commit history of AspectJ and JodaTime software libraries over a span of 10 years.\",\"PeriodicalId\":275092,\"journal\":{\"name\":\"2013 20th Working Conference on Reverse Engineering (WCRE)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 20th Working Conference on Reverse Engineering (WCRE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WCRE.2013.6671281\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 20th Working Conference on Reverse Engineering (WCRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WCRE.2013.6671281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

基于信息检索(IR)的缺陷定位技术使用缺陷报告来查询软件存储库以检索相关的源文件。这些技术索引软件存储库中的源文件，并训练一个模型，然后为检索目的查询该模型。目前的研究主要集中在提高这些方法的检索效率上。然而，对于不断发展的软件存储库，很少考虑这种方法的效率。随着软件存储库的发展，必须重复索引创建和模型学习，以确保检索每个新错误的准确性。这样做，查询延迟可能会高得不合理，而且，为未更改的文件重新计算索引和模型在计算上是冗余的。我们提出了一个增量更新框架，使用每次提交时所做的更改来持续更新索引和模型。我们证明了相同的检索精度可以实现，但与目前的方法所需的时间的一小部分。我们的研究结果是基于两种基本的红外建模技术——向量空间模型(VSM)和平滑单格图模型(SUM)。我们在验证实验中使用的数据集是通过跟踪AspectJ和JodaTime软件库在10年内的提交历史创建的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An incremental update framework for efficient retrieval from software libraries for bug localization

Information Retrieval (IR) based bug localization techniques use a bug reports to query a software repository to retrieve relevant source files. These techniques index the source files in the software repository and train a model which is then queried for retrieval purposes. Much of the current research is focused on improving the retrieval effectiveness of these methods. However, little consideration has been given to the efficiency of such approaches for software repositories that are constantly evolving. As the software repository evolves, the index creation and model learning have to be repeated to ensure accuracy of retrieval for each new bug. In doing so, the query latency may be unreasonably high, and also, re-computing the index and the model for files that did not change is computationally redundant. We propose an incremental update framework to continuously update the index and the model using the changes made at each commit. We demonstrate that the same retrieval accuracy can be achieved but with a fraction of the time needed by current approaches. Our results are based on two basic IR modeling techniques - Vector Space Model (VSM) and Smoothed Unigram Model (SUM). The dataset we used in our validation experiments was created by tracking commit history of AspectJ and JodaTime software libraries over a span of 10 years.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助