2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)最新文献

英文中文

Mining energy-greedy API usage patterns in Android apps: an empirical study 在Android应用中挖掘能源贪婪的API使用模式:一项实证研究

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597085

M. Vásquez, G. Bavota, Carlos Bernal-Cárdenas, R. Oliveto, M. D. Penta, D. Poshyvanyk

Energy consumption of mobile applications is nowadays a hot topic, given the widespread use of mobile devices. The high demand for features and improved user experience, given the available powerful hardware, tend to increase the apps’ energy consumption. However, excessive energy consumption in mobile apps could also be a consequence of energy greedy hardware, bad programming practices, or particular API usage patterns. We present the largest to date quantitative and qualitative empirical investigation into the categories of API calls and usage patterns that—in the context of the Android development framework—exhibit particularly high energy consumption profiles. By using a hardware power monitor, we measure energy consumption of method calls when executing typical usage scenarios in 55 mobile apps from different domains. Based on the collected data, we mine and analyze energy-greedy APIs and usage patterns. We zoom in and discuss the cases where either the anomalous energy consumption is unavoidable or where it is due to suboptimal usage or choice of APIs. Finally, we synthesize our findings into actionable knowledge and recipes for developers on how to reduce energy consumption while using certain categories of Android APIs and patterns

随着移动设备的广泛使用，移动应用的能耗问题已经成为一个热门话题。鉴于现有的强大硬件，对功能和改进的用户体验的高要求往往会增加应用程序的能耗。然而，移动应用程序的过度能耗也可能是硬件耗能、不良编程实践或特定API使用模式的结果。我们对API调用和使用模式的类别进行了迄今为止最大规模的定量和定性实证调查，在Android开发框架的上下文中，它们表现出特别高的能耗概况。通过使用硬件电源监视器，我们测量了来自不同领域的55个移动应用程序在执行典型使用场景时方法调用的能耗。基于收集到的数据，我们挖掘和分析了能耗大的api和使用模式。我们将放大并讨论异常能耗不可避免的情况，或者由于使用或api选择不理想而导致的情况。最后，我们将我们的发现综合成可操作的知识和食谱，为开发者提供如何在使用特定类别的Android api和模式时减少能耗的方法

{"title":"Mining energy-greedy API usage patterns in Android apps: an empirical study","authors":"M. Vásquez, G. Bavota, Carlos Bernal-Cárdenas, R. Oliveto, M. D. Penta, D. Poshyvanyk","doi":"10.1145/2597073.2597085","DOIUrl":"https://doi.org/10.1145/2597073.2597085","url":null,"abstract":"Energy consumption of mobile applications is nowadays a hot topic, given the widespread use of mobile devices. The high demand for features and improved user experience, given the available powerful hardware, tend to increase the apps’ energy consumption. However, excessive energy consumption in mobile apps could also be a consequence of energy greedy hardware, bad programming practices, or particular API usage patterns. We present the largest to date quantitative and qualitative empirical investigation into the categories of API calls and usage patterns that—in the context of the Android development framework—exhibit particularly high energy consumption profiles. By using a hardware power monitor, we measure energy consumption of method calls when executing typical usage scenarios in 55 mobile apps from different domains. Based on the collected data, we mine and analyze energy-greedy APIs and usage patterns. We zoom in and discuss the cases where either the anomalous energy consumption is unavoidable or where it is due to suboptimal usage or choice of APIs. Finally, we synthesize our findings into actionable knowledge and recipes for developers on how to reduce energy consumption while using certain categories of Android APIs and patterns","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"48 10 1","pages":"2-11"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77307654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 227

Improving the accuracy of duplicate bug report detection using textual similarity measures 使用文本相似性度量提高重复错误报告检测的准确性

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597088

A. Lazar, Sarah Ritchey, Bonita Sharif

The paper describes an improved method for automatic duplicate bug report detection based on new textual similarity features and binary classification. Using a set of new textual features, inspired from recent text similarity research, we train several binary classification models. A case study was conducted on three open source systems: Eclipse, Open Office, and Mozilla to determine the effectiveness of the improved method. A comparison is also made with current state-of-the-art approaches highlighting similarities and differences. Results indicate that the accuracy of the proposed method is better than previously reported research with respect to all three systems.

本文提出了一种基于文本相似度特征和二元分类的重复错误自动检测方法。利用一组新的文本特征，从最近的文本相似度研究中获得灵感，我们训练了几个二元分类模型。在三个开源系统:Eclipse、open Office和Mozilla上进行了一个案例研究，以确定改进方法的有效性。还与当前最先进的方法进行了比较，突出了相似性和差异性。结果表明，所提出的方法的准确性优于先前报道的研究，就所有三个系统而言。

引用次数: 71

The bug catalog of the maven ecosystem maven生态系统的bug目录

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597123

Dimitris Mitropoulos, Vassilios Karakoidas, P. Louridas, Georgios Gousios, D. Spinellis

Examining software ecosystems can provide the research community with data regarding artifacts, processes, and communities. We present a dataset obtained from the Maven central repository ecosystem (approximately 265GB of data) by statically analyzing the repository to detect potential software bugs. For our analysis we used FindBugs, a tool that examines Java bytecode to detect numerous types of bugs. The dataset contains the metrics results that FindBugs reports for every project version (a JAR) included in the ecosystem. For every version we also stored specific metadata such as the JAR's size, its dependencies and others. Our dataset can be used to produce interesting research results, as we show in specific examples.

检查软件生态系统可以为研究团体提供有关工件、过程和团体的数据。我们展示了一个从Maven中央存储库生态系统获得的数据集(大约265GB的数据)，通过静态分析存储库来检测潜在的软件错误。对于我们的分析，我们使用了FindBugs，这是一种检查Java字节码以检测多种错误类型的工具。数据集包含FindBugs为生态系统中包含的每个项目版本(JAR)报告的度量结果。对于每个版本，我们还存储了特定的元数据，例如JAR的大小、依赖项等。我们的数据集可以用来产生有趣的研究结果，正如我们在具体示例中所展示的那样。

引用次数: 32

Modern code reviews in open-source projects: which problems do they fix? 开源项目中的现代代码审查:它们修复了哪些问题?

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597082

M. Beller, Alberto Bacchelli, A. Zaidman, E. Juergens

Code review is the manual assessment of source code by humans, mainly intended to identify defects and quality problems. Modern Code Review (MCR), a lightweight variant of the code inspections investigated since the 1970s, prevails today both in industry and open-source software (OSS) systems. The objective of this paper is to increase our understanding of the practical benefits that the MCR process produces on reviewed source code. To that end, we empirically explore the problems fixed through MCR in OSS systems. We manually classified over 1,400 changes taking place in reviewed code from two OSS projects into a validated categorization scheme. Surprisingly, results show that the types of changes due to the MCR process in OSS are strikingly similar to those in the industry and academic systems from literature, featuring the similar 75:25 ratio of maintainability-related to functional problems. We also reveal that 7–35% of review comments are discarded and that 10–22% of the changes are not triggered by an explicit review comment. Patterns emerged in the review data; we investigated them revealing the technical factors that influence the number of changes due to the MCR process. We found that bug-fixing tasks lead to fewer changes and tasks with more altered files and a higher code churn have more changes. Contrary to intuition, the person of the reviewer had no impact on the number of changes.

代码审查是人类对源代码的手工评估，主要是为了识别缺陷和质量问题。现代代码审查(Modern Code Review, MCR)是自20世纪70年代以来研究的代码检查的一种轻量级变体，今天在工业和开源软件(OSS)系统中都很流行。本文的目标是增加我们对MCR过程在审查源代码上产生的实际好处的理解。为此，我们从经验上探讨了通过MCR在OSS系统中修复的问题。我们手动将两个OSS项目中审查代码中发生的1400多个更改分类到一个经过验证的分类方案中。令人惊讶的是，结果显示，由于OSS中MCR过程引起的变化类型与文献中工业和学术系统中的变化类型惊人地相似，具有与功能问题相关的可维护性的相似75:25的比例。我们还发现，7-35%的评审意见被丢弃，10-22%的变更不是由明确的评审意见触发的。回顾数据中出现的模式;我们对它们进行了调查，揭示了由于MCR过程而影响更改数量的技术因素。我们发现，bug修复任务导致的更改更少，而具有更多更改文件和更高代码变动的任务则有更多更改。与直觉相反，审稿人对更改的数量没有影响。

{"title":"Modern code reviews in open-source projects: which problems do they fix?","authors":"M. Beller, Alberto Bacchelli, A. Zaidman, E. Juergens","doi":"10.1145/2597073.2597082","DOIUrl":"https://doi.org/10.1145/2597073.2597082","url":null,"abstract":"Code review is the manual assessment of source code by humans, mainly intended to identify defects and quality problems. Modern Code Review (MCR), a lightweight variant of the code inspections investigated since the 1970s, prevails today both in industry and open-source software (OSS) systems. The objective of this paper is to increase our understanding of the practical benefits that the MCR process produces on reviewed source code. To that end, we empirically explore the problems fixed through MCR in OSS systems. We manually classified over 1,400 changes taking place in reviewed code from two OSS projects into a validated categorization scheme. Surprisingly, results show that the types of changes due to the MCR process in OSS are strikingly similar to those in the industry and academic systems from literature, featuring the similar 75:25 ratio of maintainability-related to functional problems. We also reveal that 7–35% of review comments are discarded and that 10–22% of the changes are not triggered by an explicit review comment. Patterns emerged in the review data; we investigated them revealing the technical factors that influence the number of changes due to the MCR process. We found that bug-fixing tasks lead to fewer changes and tasks with more altered files and a higher code churn have more changes. Contrary to intuition, the person of the reviewer had no impact on the number of changes.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"1 1","pages":"202-211"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76218463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 203

A dictionary to translate change tasks to source code 将变更任务转换为源代码的字典

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597095

Katja Kevic, Thomas Fritz

At the beginning of a change task, software developers spend a substantial amount of their time searching and navigating to locate relevant parts in the source code. Current approaches to support developers in this initial code search predominantly use information retrieval techniques that leverage the similarity between task descriptions and the identifiers of code elements to recommend relevant elements. However, the vocabulary or language used in source code often differs from the one used for describing change tasks, especially since the people developing the code are not the same as the ones reporting bugs or defining new features to be implemented. In our work, we investigate the creation of a dictionary that maps the different vocabularies using information from change sets and interaction histories stored with previously completed tasks. In an empirical analysis on four open source projects, our approach substantially improved upon the results of traditional information retrieval techniques for recommending relevant code elements.

在变更任务的开始，软件开发人员花费大量的时间搜索和导航以定位源代码中的相关部分。当前支持开发人员进行初始代码搜索的方法主要使用信息检索技术，这种技术利用任务描述和代码元素标识符之间的相似性来推荐相关元素。然而，源代码中使用的词汇表或语言通常与用于描述变更任务的词汇表或语言不同，特别是因为开发代码的人与报告bug或定义要实现的新特性的人不同。在我们的工作中，我们研究了一个字典的创建，该字典使用来自更改集的信息和存储在先前完成的任务中的交互历史记录来映射不同的词汇表。在对四个开源项目的实证分析中，我们的方法在推荐相关代码元素的传统信息检索技术的结果上有了实质性的改进。

引用次数: 17

Kataribe: a hosting service of historage repositories Kataribe:历史存储库的托管服务

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597125

Kenji Fujiwara, Hideaki Hata, Erina Makihara, Yusuke Fujihara, Naoki Nakayama, Hajimu Iida, Ken-ichi Matsumoto

In the research of Mining Software Repositories, code repository is one of the core source since it contains the product of software development. Code repository stores the versions of files, and makes it possible to browse the histories of files, such as modification dates, authors, messages, etc. Although such rich information of file histories is easily available, extracting the histories of methods, which are elements of source code files, is not easy from general code repositories. To tackle this difficulty, we have developed Historage, a fine-grained version control system. Historage repository is a Git repository which is built upon original Git repository. Therefore, similar mining techniques for general Git repositories are applicable to Historage repositories. Kataribe is a hosting service of Historage repositories, which enables researchers and developers to browse method histories on the web and clone Historage repositories to local. The Kataribe project aims to maintain and expand the datasets and features.

在挖掘软件存储库的研究中，代码存储库是软件开发成果的核心来源之一。代码存储库存储文件的版本，并使浏览文件的历史成为可能，例如修改日期、作者、消息等。尽管这些丰富的文件历史信息很容易获得，但是从一般的代码存储库中提取方法的历史(它们是源代码文件的元素)并不容易。为了解决这个困难，我们开发了Historage，一个细粒度的版本控制系统。历史存储库是一个Git存储库，它建立在原来的Git存储库之上。因此，针对一般Git存储库的类似挖掘技术也适用于历史存储库。Kataribe是一个历史存储库的托管服务，它使研究人员和开发人员能够在网络上浏览方法历史，并将历史存储库克隆到本地。Kataribe项目旨在维护和扩展数据集和功能。

引用次数: 18

Magnet or sticky? an OSS project-by-project typology 磁铁还是粘性?OSS项目对项目的类型学

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597116

Kazuhiro Yamashita, Shane McIntosh, Yasutaka Kamei, Naoyasu Ubayashi

For Open Source Software (OSS) projects, retaining existing contributors and attracting new ones is a major concern. In this paper, we expand and adapt a pair of population migration metrics to analyze migration trends in a collection of open source projects. Namely, we study: (1) project stickiness, i.e., its tendency to retain existing contributors and (2) project magnetism, i.e., its tendency to attract new contributors. Using quadrant plots, we classify projects as attractive (highly magnetic and sticky), stagnant (highly sticky, weakly magnetic), fluctuating (highly magnetic, weakly sticky), or terminal (weakly magnetic and sticky). Through analysis of the MSR challenge dataset, we find that: (1) quadrant plots can effectively identify at-risk projects, (2) stickiness is often motivated by professional activity and (3) transitions among quadrants as a project ages often coincides with interesting events in the evolution history of a project.

对于开源软件(OSS)项目，保留现有的贡献者并吸引新的贡献者是一个主要问题。在本文中，我们扩展并调整了一对人口迁移度量来分析开源项目集合中的迁移趋势。也就是说，我们研究:(1)项目粘性，即其保留现有贡献者的倾向;(2)项目磁性，即其吸引新贡献者的倾向。使用象限图，我们将项目分类为有吸引力(高磁性和粘性)，停滞(高粘性，弱磁性)，波动(高磁性，弱粘性)或终端(弱磁性和粘性)。通过对MSR挑战数据集的分析，我们发现:(1)象限图可以有效地识别有风险的项目;(2)粘性通常是由专业活动驱动的;(3)象限之间的过渡，因为项目年龄通常与项目发展史上的有趣事件相吻合。

引用次数: 30

Finding patterns in static analysis alerts: improving actionable alert ranking 在静态分析警报中发现模式:改进可操作警报排名

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597100

Quinn Hanam, Lin Tan, Reid Holmes, Patrick Lam

Static analysis (SA) tools that find bugs by inferring programmer beliefs (e.g., FindBugs) are commonplace in today's software industry. While they find a large number of actual defects, they are often plagued by high rates of alerts that a developer would not act on (unactionable alerts) because they are incorrect, do not significantly affect program execution, etc. High rates of unactionable alerts decrease the utility of static analysis tools in practice. We present a method for differentiating actionable and unactionable alerts by finding alerts with similar code patterns. To do so, we create a feature vector based on code characteristics at the site of each SA alert. With these feature vectors, we use machine learning techniques to build an actionable alert prediction model that is able to classify new SA alerts. We evaluate our technique on three subject programs using the FindBugs static analysis tool and the Faultbench benchmark methodology. For a developer inspecting the top 5% of all alerts for three sample projects, our approach is able to identify 57 of 211 actionable alerts, which is 38 more than the FindBugs priority measure. Combined with previous actionable alert identification techniques, our method finds 75 actionable alerts in the top 5%, which is four more actionable alerts (a 6% improvement) than previous actionable alert identification techniques.

通过推断程序员的想法(例如FindBugs)来发现bug的静态分析(SA)工具在今天的软件行业中很常见。虽然他们发现了大量的实际缺陷，但他们经常受到开发人员不采取行动的高比率警报(不可操作警报)的困扰，因为它们是不正确的，不会显著影响程序执行，等等。不可操作警报的高比率降低了静态分析工具在实践中的实用性。我们提出了一种通过寻找具有相似代码模式的警报来区分可操作警报和不可操作警报的方法。为此，我们基于每个SA警报所在位置的代码特征创建一个特征向量。利用这些特征向量，我们使用机器学习技术来构建一个可操作的警报预测模型，该模型能够对新的SA警报进行分类。我们使用FindBugs静态分析工具和Faultbench基准测试方法在三个主题程序上评估了我们的技术。对于检查三个示例项目中所有警报的前5%的开发人员，我们的方法能够识别211个可操作警报中的57个，比FindBugs优先级度量多38个。结合以前的可操作警报识别技术，我们的方法在前5%中发现了75个可操作警报，比以前的可操作警报识别技术多了4个可操作警报(提高了6%)。

{"title":"Finding patterns in static analysis alerts: improving actionable alert ranking","authors":"Quinn Hanam, Lin Tan, Reid Holmes, Patrick Lam","doi":"10.1145/2597073.2597100","DOIUrl":"https://doi.org/10.1145/2597073.2597100","url":null,"abstract":"Static analysis (SA) tools that find bugs by inferring programmer beliefs (e.g., FindBugs) are commonplace in today's software industry. While they find a large number of actual defects, they are often plagued by high rates of alerts that a developer would not act on (unactionable alerts) because they are incorrect, do not significantly affect program execution, etc. High rates of unactionable alerts decrease the utility of static analysis tools in practice. \u0000 We present a method for differentiating actionable and unactionable alerts by finding alerts with similar code patterns. To do so, we create a feature vector based on code characteristics at the site of each SA alert. With these feature vectors, we use machine learning techniques to build an actionable alert prediction model that is able to classify new SA alerts. \u0000 We evaluate our technique on three subject programs using the FindBugs static analysis tool and the Faultbench benchmark methodology. For a developer inspecting the top 5% of all alerts for three sample projects, our approach is able to identify 57 of 211 actionable alerts, which is 38 more than the FindBugs priority measure. Combined with previous actionable alert identification techniques, our method finds 75 actionable alerts in the top 5%, which is four more actionable alerts (a 6% improvement) than previous actionable alert identification techniques.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"1 1","pages":"152-161"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89955436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 68

GreenMiner: a hardware based mining software repositories software energy consumption framework GreenMiner:一个基于硬件的挖矿软件库软件能耗框架

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597097

Abram Hindle, Alex Wilson, Kent Rasmussen, E. J. Barlow, Hazel Victoria Campbell, Stephen Romansky

Green Mining is a field of MSR that studies software energy consumption and relies on software performance data. Unfortunately there is a severe lack of publicly available software power use performance data. This means that green mining researchers must generate this data themselves by writing tests, building multiple revisions of a product, and then running these tests multiple times (10+) for each software revision while measuring power use. Then, they must aggregate these measurements to estimate the energy consumed by the tests for each software revision. This is time consuming and is made more difficult by the constraints of mobile devices and their OSes. In this paper we propose, implement, and demonstrate Green Miner: the first dedicated hardware mining software repositories testbed. The Green Miner physically measures the energy consumption of mobile devices (Android phones) and automates the testing of applications, and the reporting of measurements back to developers and researchers. The Green Miner has already produced valuable results for commercial Android application developers, and has been shown to replicate other power studies' results.

绿色采矿是MSR研究软件能耗并依赖软件性能数据的一个领域。不幸的是，目前严重缺乏公开可用的软件电源使用性能数据。这意味着绿色挖掘研究人员必须通过编写测试，构建产品的多个版本，然后为每个软件版本运行这些测试多次(10次以上)来生成这些数据，同时测量功耗。然后，他们必须汇总这些度量来估计每个软件修订的测试所消耗的能量。这是非常耗时的，并且由于移动设备及其操作系统的限制而变得更加困难。在本文中，我们提出、实现并演示了Green Miner:第一个专用的硬件挖掘软件存储库测试平台。绿色矿工物理测量移动设备(Android手机)的能源消耗，自动测试应用程序，并将测量结果报告给开发人员和研究人员。绿色矿工已经为商业Android应用程序开发人员产生了有价值的结果，并被证明可以复制其他电力研究的结果。

{"title":"GreenMiner: a hardware based mining software repositories software energy consumption framework","authors":"Abram Hindle, Alex Wilson, Kent Rasmussen, E. J. Barlow, Hazel Victoria Campbell, Stephen Romansky","doi":"10.1145/2597073.2597097","DOIUrl":"https://doi.org/10.1145/2597073.2597097","url":null,"abstract":"Green Mining is a field of MSR that studies software energy consumption and relies on software performance data. Unfortunately there is a severe lack of publicly available software power use performance data. This means that green mining researchers must generate this data themselves by writing tests, building multiple revisions of a product, and then running these tests multiple times (10+) for each software revision while measuring power use. Then, they must aggregate these measurements to estimate the energy consumed by the tests for each software revision. This is time consuming and is made more difficult by the constraints of mobile devices and their OSes. In this paper we propose, implement, and demonstrate Green Miner: the first dedicated hardware mining software repositories testbed. The Green Miner physically measures the energy consumption of mobile devices (Android phones) and automates the testing of applications, and the reporting of measurements back to developers and researchers. The Green Miner has already produced valuable results for commercial Android application developers, and has been shown to replicate other power studies' results.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"66 1","pages":"12-21"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90291627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 149

New features for duplicate bug detection 重复错误检测的新特性

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597090

Nathan Klein, Christopher S. Corley, Nicholas A. Kraft

Issue tracking software of large software projects receive a large volume of issue reports each day. Each of these issues is typically triaged by hand, a time consuming and error prone task. Additionally, issue reporters lack the necessary understanding to know whether their issue has previously been reported. This leads to issue trackers containing a lot of duplicate reports, adding complexity to the triaging task. Duplicate bug report detection is designed to aid developers by automatically grouping bug reports concerning identical issues. Previous work by Alipour et al. has shown that the textual, categorical, and contextual information of an issue report are effective measures in duplicate bug report detection. In our work, we extend previous work by introducing a range of metrics based on the topic distribution of the issue reports, relying only on data taken directly from bug reports. In particular, we introduce a novel metric that measures the first shared topic between two topic-document distributions. This paper details the evaluation of this group of pair-based metrics with a range of machine learning classifiers, using the same issues used by Alipour et al. We demonstrate that the proposed metrics show a significant improvement over previous work, and conclude that the simple metrics we propose should be considered in future studies on bug report deduplication, as well as for more general natural language processing applications.

大型软件项目的问题跟踪软件每天都会收到大量的问题报告。这些问题通常都是手工分类的，这是一项耗时且容易出错的任务。此外，问题报告者缺乏必要的理解，无法知道他们的问题之前是否被报告过。这将导致问题跟踪器包含大量重复的报告，从而增加了分类任务的复杂性。重复错误报告检测的目的是帮助开发人员自动分组有关相同问题的错误报告。Alipour等人之前的工作表明，问题报告的文本、分类和上下文信息是检测重复错误报告的有效措施。在我们的工作中，我们通过引入一系列基于问题报告的主题分布的度量来扩展之前的工作，仅依赖于直接从bug报告中获取的数据。特别地，我们引入了一个新的度量，用于度量两个主题文档分布之间的第一个共享主题。本文使用Alipour等人使用的相同问题，详细介绍了使用一系列机器学习分类器对这组基于成对的指标的评估。我们证明了所建议的度量标准比以前的工作有了显著的改进，并得出结论，我们提出的简单度量标准应该在未来关于bug报告重复数据删除的研究以及更一般的自然语言处理应用程序中加以考虑。

{"title":"New features for duplicate bug detection","authors":"Nathan Klein, Christopher S. Corley, Nicholas A. Kraft","doi":"10.1145/2597073.2597090","DOIUrl":"https://doi.org/10.1145/2597073.2597090","url":null,"abstract":"Issue tracking software of large software projects receive a large volume of issue reports each day. Each of these issues is typically triaged by hand, a time consuming and error prone task. Additionally, issue reporters lack the necessary understanding to know whether their issue has previously been reported. This leads to issue trackers containing a lot of duplicate reports, adding complexity to the triaging task. \u0000 Duplicate bug report detection is designed to aid developers by automatically grouping bug reports concerning identical issues. Previous work by Alipour et al. has shown that the textual, categorical, and contextual information of an issue report are effective measures in duplicate bug report detection. In our work, we extend previous work by introducing a range of metrics based on the topic distribution of the issue reports, relying only on data taken directly from bug reports. In particular, we introduce a novel metric that measures the first shared topic between two topic-document distributions. This paper details the evaluation of this group of pair-based metrics with a range of machine learning classifiers, using the same issues used by Alipour et al. We demonstrate that the proposed metrics show a significant improvement over previous work, and conclude that the simple metrics we propose should be considered in future studies on bug report deduplication, as well as for more general natural language processing applications.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"13 1","pages":"324-327"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87438778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀