首页 > 最新文献

2012 9th IEEE Working Conference on Mining Software Repositories (MSR)最新文献

英文 中文
Can we predict types of code changes? An empirical analysis 我们能预测代码更改的类型吗?实证分析
Pub Date : 2012-06-02 DOI: 10.1109/MSR.2012.6224284
E. Giger, M. Pinzger, H. Gall
There exist many approaches that help in pointing developers to the change-prone parts of a software system. Although beneficial, they mostly fall short in providing details of these changes. Fine-grained source code changes (SCC) capture such detailed code changes and their semantics on the statement level. These SCC can be condition changes, interface modifications, inserts or deletions of methods and attributes, or other kinds of statement changes. In this paper, we explore prediction models for whether a source file will be affected by a certain type of SCC. These predictions are computed on the static source code dependency graph and use social network centrality measures and object-oriented metrics. For that, we use change data of the Eclipse platform and the Azureus 3 project. The results show that Neural Network models can predict categories of SCC types. Furthermore, our models can output a list of the potentially change-prone files ranked according to their change-proneness, overall and per change type category.
有许多方法可以帮助开发人员指出软件系统中易于更改的部分。尽管它们是有益的,但它们大多无法提供这些变化的细节。细粒度源代码更改(SCC)在语句级别捕获这些详细的代码更改及其语义。这些SCC可以是条件更改、接口修改、方法和属性的插入或删除,或者其他类型的语句更改。在本文中,我们探讨了源文件是否会受到某种类型SCC影响的预测模型。这些预测是在静态源代码依赖图上计算的,并使用社会网络中心性度量和面向对象的度量。为此,我们使用了Eclipse平台和Azureus 3项目的变更数据。结果表明,神经网络模型可以预测SCC类型的分类。此外,我们的模型可以根据它们的变更倾向、总体和每个变更类型类别,输出潜在的易变更文件列表。
{"title":"Can we predict types of code changes? An empirical analysis","authors":"E. Giger, M. Pinzger, H. Gall","doi":"10.1109/MSR.2012.6224284","DOIUrl":"https://doi.org/10.1109/MSR.2012.6224284","url":null,"abstract":"There exist many approaches that help in pointing developers to the change-prone parts of a software system. Although beneficial, they mostly fall short in providing details of these changes. Fine-grained source code changes (SCC) capture such detailed code changes and their semantics on the statement level. These SCC can be condition changes, interface modifications, inserts or deletions of methods and attributes, or other kinds of statement changes. In this paper, we explore prediction models for whether a source file will be affected by a certain type of SCC. These predictions are computed on the static source code dependency graph and use social network centrality measures and object-oriented metrics. For that, we use change data of the Eclipse platform and the Azureus 3 project. The results show that Neural Network models can predict categories of SCC types. Furthermore, our models can output a list of the potentially change-prone files ranked according to their change-proneness, overall and per change type category.","PeriodicalId":383774,"journal":{"name":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128208075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
An empirical investigation of changes in some software properties over time 对一些软件属性随时间变化的经验调查
Pub Date : 2012-06-02 DOI: 10.1109/MSR.2012.6224285
J. Gil, M. Goldstein, Dany Moshkovich
Software metrics are easy to define, but not so easy to justify. It is hard to prove that a metric is valid, i.e., that measured numerical values imply anything on the vaguely defined, yet crucial software properties such as complexity and maintainability. This paper employs statistical analysis and tests to check some plausible assumptions on the behavior of software and metrics measured for this software in retrospective on its versions evolution history. Among those are the reliability assumption implicit in the application of any code metric, and the assumption that the magnitude of change, i.e., increase or decrease of its size, in a software artifact is correlated with changes to its version number. Putting a suite of 36 metrics to the trial, we confirm most of the assumptions on a large repository of software artifacts. Surprisingly, we show that a substantial portion of the reliability of some metrics can be observed even in random changes to architecture. Another surprising result is that Boolean-valued metrics tend to flip their values more often in minor software version increments than in major increments.
软件度量很容易定义,但不容易证明。很难证明度量是有效的,也就是说,测量的数值暗示了模糊定义上的任何东西,但重要的软件属性,如复杂性和可维护性。本文采用统计分析和测试来检查一些关于软件行为的合理假设,以及在回顾其版本演变历史时为该软件测量的度量。其中包括在任何代码度量的应用中隐含的可靠性假设,以及软件工件中变化的幅度,即其大小的增加或减少与其版本号的变化相关的假设。将一套36个度量标准放入试验中,我们确认了大型软件工件存储库中的大多数假设。令人惊讶的是,我们表明,即使在体系结构的随机更改中,也可以观察到一些度量的可靠性的很大一部分。另一个令人惊讶的结果是,布尔值度量在较小的软件版本增量中比在较大的软件版本增量中更经常地翻转它们的值。
{"title":"An empirical investigation of changes in some software properties over time","authors":"J. Gil, M. Goldstein, Dany Moshkovich","doi":"10.1109/MSR.2012.6224285","DOIUrl":"https://doi.org/10.1109/MSR.2012.6224285","url":null,"abstract":"Software metrics are easy to define, but not so easy to justify. It is hard to prove that a metric is valid, i.e., that measured numerical values imply anything on the vaguely defined, yet crucial software properties such as complexity and maintainability. This paper employs statistical analysis and tests to check some plausible assumptions on the behavior of software and metrics measured for this software in retrospective on its versions evolution history. Among those are the reliability assumption implicit in the application of any code metric, and the assumption that the magnitude of change, i.e., increase or decrease of its size, in a software artifact is correlated with changes to its version number. Putting a suite of 36 metrics to the trial, we confirm most of the assumptions on a large repository of software artifacts. Surprisingly, we show that a substantial portion of the reliability of some metrics can be observed even in random changes to architecture. Another surprising result is that Boolean-valued metrics tend to flip their values more often in minor software version increments than in major increments.","PeriodicalId":383774,"journal":{"name":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124426530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Incorporating version histories in Information Retrieval based bug localization 在基于信息检索的bug定位中引入版本历史
Pub Date : 2012-06-02 DOI: 10.1109/MSR.2012.6224299
Bunyamin Sisman, A. Kak
Fast and accurate localization of software defects continues to be a difficult problem since defects can emanate from a large variety of sources and can often be intricate in nature. In this paper, we show how version histories of a software project can be used to estimate a prior probability distribution for defect proneness associated with the files in a given version of the project. Subsequently, these priors are used in an IR (Information Retrieval) framework to determine the posterior probability of a file being the cause of a bug. We first present two models to estimate the priors, one from the defect histories and the other from the modification histories, with both types of histories as stored in the versioning tools. Referring to these as the base models, we then extend them by incorporating a temporal decay into the estimation of the priors. We show that by just including the base models, the mean average precision (MAP) for bug localization improves by as much as 30%. And when we also factor in the time decay in the estimates of the priors, the improvements in MAP can be as large as 80%.
快速而准确地定位软件缺陷仍然是一个困难的问题,因为缺陷可能来自各种各样的来源,并且通常在本质上是复杂的。在本文中,我们展示了如何使用软件项目的版本历史来估计与项目的给定版本中的文件相关的缺陷倾向的先验概率分布。随后,这些先验在IR(信息检索)框架中使用,以确定文件是导致错误的原因的后验概率。我们首先提出了两个模型来评估先验,一个来自缺陷历史,另一个来自修改历史,这两种类型的历史都存储在版本控制工具中。参考这些作为基本模型,然后我们通过将时间衰减纳入先验估计来扩展它们。我们发现,通过只包含基本模型,bug定位的平均精度(MAP)提高了30%。当我们在先验估计中也考虑到时间衰减时,MAP的改进可以高达80%。
{"title":"Incorporating version histories in Information Retrieval based bug localization","authors":"Bunyamin Sisman, A. Kak","doi":"10.1109/MSR.2012.6224299","DOIUrl":"https://doi.org/10.1109/MSR.2012.6224299","url":null,"abstract":"Fast and accurate localization of software defects continues to be a difficult problem since defects can emanate from a large variety of sources and can often be intricate in nature. In this paper, we show how version histories of a software project can be used to estimate a prior probability distribution for defect proneness associated with the files in a given version of the project. Subsequently, these priors are used in an IR (Information Retrieval) framework to determine the posterior probability of a file being the cause of a bug. We first present two models to estimate the priors, one from the defect histories and the other from the modification histories, with both types of histories as stored in the versioning tools. Referring to these as the base models, we then extend them by incorporating a temporal decay into the estimation of the priors. We show that by just including the base models, the mean average precision (MAP) for bug localization improves by as much as 30%. And when we also factor in the time decay in the estimates of the priors, the improvements in MAP can be as large as 80%.","PeriodicalId":383774,"journal":{"name":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","volume":" 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132125066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 101
An empirical study of supplementary bug fixes 补充错误修复的实证研究
Pub Date : 2012-06-02 DOI: 10.1109/MSR.2012.6224298
Jihun Park, Miryung Kim, Baishakhi Ray, Doo-Hwan Bae
A recent study finds that errors of omission are harder for programmers to detect than errors of commission. While several change recommendation systems already exist to prevent or reduce omission errors during software development, there have been very few studies on why errors of omission occur in practice and how such errors could be prevented. In order to understand the characteristics of omission errors, this paper investigates a group of bugs that were fixed more than once in open source projects - those bugs whose initial patches were later considered incomplete and to which programmers applied supplementary patches. Our study on Eclipse JDT core, Eclipse SWT, and Mozilla shows that a significant portion of resolved bugs (22% to 33%) involves more than one fix attempt. Our manual inspection shows that the causes of omission errors are diverse, including missed porting changes, incorrect handling of conditional statements, or incomplete refactorings, etc. While many consider that missed updates to code clones often lead to omission errors, only a very small portion of supplementary patches (12% in JDT, 25% in SWT, and 9% in Mozilla) have a content similar to their initial patches. This implies that supplementary change locations cannot be predicted by code clone analysis alone. Furthermore, 14% to 15% of files in supplementary patches are beyond the scope of immediate neighbors of their initial patch locations - they did not overlap with the initial patch locations nor had direct structural dependencies on them (e.g. calls, accesses, subtyping relations, etc.). These results call for new types of omission error prevention approaches that complement existing change recommendation systems.
最近的一项研究发现,对于程序员来说,遗漏错误比委托错误更难发现。虽然已经存在一些变更推荐系统来防止或减少软件开发过程中的遗漏错误,但对于实践中为什么会出现遗漏错误以及如何预防此类错误的研究却很少。为了理解遗漏错误的特征,本文研究了一组在开源项目中被修复了不止一次的错误——这些错误的初始补丁后来被认为是不完整的,程序员给它们打了补充补丁。我们对Eclipse JDT核心、Eclipse SWT和Mozilla的研究表明,解决的bug中有很大一部分(22%到33%)涉及多次修复尝试。我们的手工检查显示,遗漏错误的原因是多种多样的,包括遗漏的移植更改、不正确的条件语句处理或不完整的重构等。虽然许多人认为遗漏的代码克隆更新经常会导致遗漏错误,但只有很小一部分补充补丁(在JDT中为12%,在SWT中为25%,在Mozilla中为9%)具有与其初始补丁相似的内容。这意味着补充变更位置不能仅通过代码克隆分析来预测。此外,补充补丁中14%至15%的文件超出了其初始补丁位置的近邻范围-它们与初始补丁位置没有重叠,也没有直接的结构依赖关系(例如调用,访问,子类型关系等)。这些结果需要新型的遗漏错误预防方法来补充现有的变更推荐系统。
{"title":"An empirical study of supplementary bug fixes","authors":"Jihun Park, Miryung Kim, Baishakhi Ray, Doo-Hwan Bae","doi":"10.1109/MSR.2012.6224298","DOIUrl":"https://doi.org/10.1109/MSR.2012.6224298","url":null,"abstract":"A recent study finds that errors of omission are harder for programmers to detect than errors of commission. While several change recommendation systems already exist to prevent or reduce omission errors during software development, there have been very few studies on why errors of omission occur in practice and how such errors could be prevented. In order to understand the characteristics of omission errors, this paper investigates a group of bugs that were fixed more than once in open source projects - those bugs whose initial patches were later considered incomplete and to which programmers applied supplementary patches. Our study on Eclipse JDT core, Eclipse SWT, and Mozilla shows that a significant portion of resolved bugs (22% to 33%) involves more than one fix attempt. Our manual inspection shows that the causes of omission errors are diverse, including missed porting changes, incorrect handling of conditional statements, or incomplete refactorings, etc. While many consider that missed updates to code clones often lead to omission errors, only a very small portion of supplementary patches (12% in JDT, 25% in SWT, and 9% in Mozilla) have a content similar to their initial patches. This implies that supplementary change locations cannot be predicted by code clone analysis alone. Furthermore, 14% to 15% of files in supplementary patches are beyond the scope of immediate neighbors of their initial patch locations - they did not overlap with the initial patch locations nor had direct structural dependencies on them (e.g. calls, accesses, subtyping relations, etc.). These results call for new types of omission error prevention approaches that complement existing change recommendation systems.","PeriodicalId":383774,"journal":{"name":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130817630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
Co-evolution of logical couplings and commits for defect estimation 用于缺陷估计的逻辑耦合和提交的协同演化
Pub Date : 2012-06-02 DOI: 10.1109/MSR.2012.6224283
Maximilian Steff, B. Russo
Logical couplings between files in the commit history of a software repository are instances of files being changed together. The evolution of couplings over commits' history has been used for the localization and prediction of software defects in software reliability. Couplings have been represented in class graphs and change histories on the class-level have been used to identify defective modules. Our new approach inverts this perspective and constructs graphs of ordered commits coupled by common changed classes. These graphs, thus, represent the co-evolution of commits, structured by the change patterns among classes. We believe that co-evolutionary graphs are a promising new instrument for detecting defective software structures. As a first result, we have been able to correlate the history of logical couplings to the history of defects for every commit in the graph and to identify sub-structures of bug-fixing commits over sub-structures of normal commits.
软件存储库提交历史中文件之间的逻辑耦合是一起更改的文件实例。在软件可靠性中,耦合在提交历史上的演化已被用于软件缺陷的定位和预测。耦合用类图表示,类级别的变更历史被用来识别有缺陷的模块。我们的新方法颠倒了这个透视图,并构造了由常见更改类耦合的有序提交图。因此,这些图表示提交的共同演化,由类之间的变更模式构成。我们相信协同进化图是一种很有前途的检测缺陷软件结构的新工具。作为第一个结果,我们已经能够将逻辑耦合的历史与图中每个提交的缺陷的历史联系起来,并在正常提交的子结构上识别bug修复提交的子结构。
{"title":"Co-evolution of logical couplings and commits for defect estimation","authors":"Maximilian Steff, B. Russo","doi":"10.1109/MSR.2012.6224283","DOIUrl":"https://doi.org/10.1109/MSR.2012.6224283","url":null,"abstract":"Logical couplings between files in the commit history of a software repository are instances of files being changed together. The evolution of couplings over commits' history has been used for the localization and prediction of software defects in software reliability. Couplings have been represented in class graphs and change histories on the class-level have been used to identify defective modules. Our new approach inverts this perspective and constructs graphs of ordered commits coupled by common changed classes. These graphs, thus, represent the co-evolution of commits, structured by the change patterns among classes. We believe that co-evolutionary graphs are a promising new instrument for detecting defective software structures. As a first result, we have been able to correlate the history of logical couplings to the history of defects for every commit in the graph and to identify sub-structures of bug-fixing commits over sub-structures of normal commits.","PeriodicalId":383774,"journal":{"name":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115441328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Analysis of customer satisfaction survey data 顾客满意度调查数据分析
Pub Date : 2012-06-02 DOI: 10.1109/MSR.2012.6224304
Pete Rotella, S. Chulani
Cisco Systems, Inc., conducts a customer satisfaction survey (CSAT) each year to gauge customer sentiment regarding Cisco products, technical support, partner- and Cisco-provided technical services, order fulfillment, and a number of other aspects of the companys business. The results of the analysis of this data are used for several purposes, including ascertaining the viability of new products, determining if customer support objectives are being met, setting engineering in-process and customer experience yearly metrics goals, and assessing, indirectly, the success of engineering initiatives. Analyzing this data, which includes 110,000 yearly sets of survey responses that address over 100 product and services categories, is in many respects complicated. For example, skip logic is an integral part of the survey mechanics, and forming aggregate views of customer sentiment is statistically challenging in this data environment. In this paper, we describe several of the various analysis approaches currently used, pointing out some situations where a high level of precision is not easily achieved, and some situations in which it is possible to easily end up with erroneous results. The analysis and statistical territory covered in this paper is in parts well-known and straightforward, but other parts, which we address, are susceptible to large inaccuracies and errors. We address several of these difficulties and develop reasonable solutions for two known issues, high missing value levels and high colinearity of independent variables.
思科系统公司每年都会进行一次客户满意度调查(CSAT),以评估客户对思科产品、技术支持、合作伙伴和思科提供的技术服务、订单履行以及公司业务的许多其他方面的看法。这些数据的分析结果用于几个目的,包括确定新产品的可行性,确定是否满足客户支持目标,设置工程过程和客户体验年度指标目标,以及间接评估工程计划的成功。这些数据包括每年11万套调查反馈,涉及100多种产品和服务类别,分析这些数据在很多方面都很复杂。例如,跳过逻辑是调查机制的一个组成部分,在这种数据环境中,形成客户情绪的汇总视图在统计上具有挑战性。在本文中,我们描述了目前使用的几种不同的分析方法,指出了一些不容易达到高精度的情况,以及一些很容易得到错误结果的情况。本文所涉及的分析和统计领域在某些方面是众所周知的和直截了当的,但是我们所讨论的其他部分容易受到很大的不准确和错误的影响。我们解决了其中的一些困难,并针对两个已知的问题,高缺失值水平和自变量的高共线性,制定了合理的解决方案。
{"title":"Analysis of customer satisfaction survey data","authors":"Pete Rotella, S. Chulani","doi":"10.1109/MSR.2012.6224304","DOIUrl":"https://doi.org/10.1109/MSR.2012.6224304","url":null,"abstract":"Cisco Systems, Inc., conducts a customer satisfaction survey (CSAT) each year to gauge customer sentiment regarding Cisco products, technical support, partner- and Cisco-provided technical services, order fulfillment, and a number of other aspects of the companys business. The results of the analysis of this data are used for several purposes, including ascertaining the viability of new products, determining if customer support objectives are being met, setting engineering in-process and customer experience yearly metrics goals, and assessing, indirectly, the success of engineering initiatives. Analyzing this data, which includes 110,000 yearly sets of survey responses that address over 100 product and services categories, is in many respects complicated. For example, skip logic is an integral part of the survey mechanics, and forming aggregate views of customer sentiment is statistically challenging in this data environment. In this paper, we describe several of the various analysis approaches currently used, pointing out some situations where a high level of precision is not easily achieved, and some situations in which it is possible to easily end up with erroneous results. The analysis and statistical territory covered in this paper is in parts well-known and straightforward, but other parts, which we address, are susceptible to large inaccuracies and errors. We address several of these difficulties and develop reasonable solutions for two known issues, high missing value levels and high colinearity of independent variables.","PeriodicalId":383774,"journal":{"name":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","volume":"73 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115714788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Who? Where? What? Examining distributed development in two large open source projects 谁?在哪里?怎么啦?研究两个大型开源项目中的分布式开发
Pub Date : 2012-06-02 DOI: 10.1109/MSR.2012.6224286
C. Bird, Nachiappan Nagappan
To date, a large body of knowledge has been built up around understanding open source software development. However, there is limited research on examining levels of geographic and organizational distribution within open source software projects, despite many studies examining these same aspects in commercial contexts. We set out to fill this gap in OSS knowledge by manually collecting data for two large, mature, successful projects in an effort to assess how distributed they are, both geographically and organizationally. Both Firefox and Eclipse have been the subject of many studies and are ubiquitous in the areas of software development and internet usage respectively. We identified the top contributors that made 95% of the changes over multiple major releases of Firefox and Eclipse and determined their geographic locations and organizational affiliations. We examine the distribution in each project's constituent subsystems and report the relationship of pre- and post-release defects with distribution levels.
迄今为止,围绕理解开源软件开发已经建立了大量的知识体系。然而,尽管有许多研究在商业环境中检查了这些相同的方面,但在开源软件项目中检查地理和组织分布级别的研究有限。我们开始通过手动收集两个大型的、成熟的、成功的项目的数据来努力评估它们在地理上和组织上是如何分布的,以此来填补OSS知识中的这一空白。Firefox和Eclipse都是许多研究的主题,并且分别在软件开发和internet使用领域中无处不在。我们确定了在Firefox和Eclipse的多个主要版本中做出95%更改的主要贡献者,并确定了他们的地理位置和组织隶属关系。我们检查每个项目组成子系统中的分布,并报告发布前和发布后缺陷与分布级别的关系。
{"title":"Who? Where? What? Examining distributed development in two large open source projects","authors":"C. Bird, Nachiappan Nagappan","doi":"10.1109/MSR.2012.6224286","DOIUrl":"https://doi.org/10.1109/MSR.2012.6224286","url":null,"abstract":"To date, a large body of knowledge has been built up around understanding open source software development. However, there is limited research on examining levels of geographic and organizational distribution within open source software projects, despite many studies examining these same aspects in commercial contexts. We set out to fill this gap in OSS knowledge by manually collecting data for two large, mature, successful projects in an effort to assess how distributed they are, both geographically and organizationally. Both Firefox and Eclipse have been the subject of many studies and are ubiquitous in the areas of software development and internet usage respectively. We identified the top contributors that made 95% of the changes over multiple major releases of Firefox and Eclipse and determined their geographic locations and organizational affiliations. We examine the distribution in each project's constituent subsystems and report the relationship of pre- and post-release defects with distribution levels.","PeriodicalId":383774,"journal":{"name":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124413834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Characterizing verification of bug fixes in two open source IDEs 描述两个开源ide中错误修复的验证
Pub Date : 2012-06-02 DOI: 10.1109/MSR.2012.6224301
Rodrigo R. G. Souza, C. Chavez
Data from bug repositories have been used to enable inquiries about software product and process quality. Unfortunately, such repositories often contain inaccurate, inconsistent, or missing data, which can originate misleading results. In this paper, we investigate how well data from bug repositories support the discovery of details about the software verification process in two open source projects, Eclipse and NetBeans. We have been able do identify quality assurance teams in NetBeans and to detect a well-defined verification phase in Eclipse. A major challenge, however, was to identify the verification techniques used in the projects. Moreover, we found cases in which a large batch of bug fixes is simultaneously reported to be verified, although no software verification was actually done. Such mass verifications, if not acknowledged, threatens analyses that rely on information about software verification reported on bug repositories. Therefore, we recommend that the exploratory analyses presented in this paper precede inferences based on reported verifications.
来自bug存储库的数据被用于查询软件产品和过程质量。不幸的是,这样的存储库通常包含不准确、不一致或缺失的数据,这可能导致误导性的结果。在本文中,我们调查了在两个开源项目Eclipse和NetBeans中,来自bug存储库的数据如何很好地支持软件验证过程细节的发现。我们已经能够在NetBeans中识别质量保证团队,并在Eclipse中检测定义良好的验证阶段。然而,一个主要的挑战是确定项目中使用的验证技术。此外,我们还发现了一些情况,其中大量的bug修复同时被报告并被验证,尽管实际上并没有完成软件验证。这种大规模的验证,如果不被承认,将威胁到依赖于在bug存储库中报告的软件验证信息的分析。因此,我们建议本文提出的探索性分析优先于基于报告验证的推论。
{"title":"Characterizing verification of bug fixes in two open source IDEs","authors":"Rodrigo R. G. Souza, C. Chavez","doi":"10.1109/MSR.2012.6224301","DOIUrl":"https://doi.org/10.1109/MSR.2012.6224301","url":null,"abstract":"Data from bug repositories have been used to enable inquiries about software product and process quality. Unfortunately, such repositories often contain inaccurate, inconsistent, or missing data, which can originate misleading results. In this paper, we investigate how well data from bug repositories support the discovery of details about the software verification process in two open source projects, Eclipse and NetBeans. We have been able do identify quality assurance teams in NetBeans and to detect a well-defined verification phase in Eclipse. A major challenge, however, was to identify the verification techniques used in the projects. Moreover, we found cases in which a large batch of bug fixes is simultaneously reported to be verified, although no software verification was actually done. Such mass verifications, if not acknowledged, threatens analyses that rely on information about software verification reported on bug repositories. Therefore, we recommend that the exploratory analyses presented in this paper precede inferences based on reported verifications.","PeriodicalId":383774,"journal":{"name":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","volume":"26 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132532380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Error mining: Bug detection through comparison with large code databases 错误挖掘:通过与大型代码数据库的比较来检测Bug
Pub Date : 2012-06-02 DOI: 10.1109/MSR.2012.6224278
Alexander Breckel
Bugs are hard to find. Static analysis tools are capable of systematically detecting predefined sets of errors, but extending them to find new error types requires a deep understanding of the underlying programming language. Manual reviews on the other hand, while being able to reveal more individual errors, require much more time. We present a new approach to automatically detect bugs through comparison with a large code database. The source file is analyzed for similar but slightly different code fragments in the database. Frequent occurrences of common differences indicate a potential bug that can be fixed by applying the modification back to the original source file. In this paper, we give an overview of the resulting algorithm and some important implementation details. We further evaluate the circumstances under which good detection rates can be achieved. The results demonstrate that consistently high detection rates of up to 50% are possible for certain error types across different programming languages.
bug很难被发现。静态分析工具能够系统地检测预定义的错误集,但是扩展它们以发现新的错误类型需要对底层编程语言有深入的了解。另一方面,手工检查虽然能够发现更多的个别错误,但需要更多的时间。我们提出了一种通过与大型代码库比较来自动检测bug的新方法。分析源文件中数据库中相似但略有不同的代码片段。常见差异的频繁出现表明存在潜在的错误,可以通过将修改应用回原始源文件来修复。在本文中,我们给出了最终算法的概述和一些重要的实现细节。我们进一步评估可以达到良好检出率的情况。结果表明,对于跨不同编程语言的某些错误类型,可以始终保持高达50%的高检测率。
{"title":"Error mining: Bug detection through comparison with large code databases","authors":"Alexander Breckel","doi":"10.1109/MSR.2012.6224278","DOIUrl":"https://doi.org/10.1109/MSR.2012.6224278","url":null,"abstract":"Bugs are hard to find. Static analysis tools are capable of systematically detecting predefined sets of errors, but extending them to find new error types requires a deep understanding of the underlying programming language. Manual reviews on the other hand, while being able to reveal more individual errors, require much more time. We present a new approach to automatically detect bugs through comparison with a large code database. The source file is analyzed for similar but slightly different code fragments in the database. Frequent occurrences of common differences indicate a potential bug that can be fixed by applying the modification back to the original source file. In this paper, we give an overview of the resulting algorithm and some important implementation details. We further evaluate the circumstances under which good detection rates can be achieved. The results demonstrate that consistently high detection rates of up to 50% are possible for certain error types across different programming languages.","PeriodicalId":383774,"journal":{"name":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126328565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Green mining: A methodology of relating software change to power consumption 绿色挖掘:一种将软件变更与能耗联系起来的方法
Pub Date : 2012-06-02 DOI: 10.1109/MSR.2012.6224303
Abram Hindle
Power consumption is becoming more and more important with the increased popularity of smart-phones, tablets and laptops. The threat of reducing a customer's battery-life now hangs over the software developer who asks, “will this next change be the one that causes my software to drain a customer's battery?” One solution is to detect power consumption regressions by measuring the power usage of tests, but this is time-consuming and often noisy. An alternative is to rely on software metrics that allow us to estimate the impact that a change might have on power consumption thus relieving the developer from expensive testing. This paper presents a general methodology for investigating the impact of software change on power consumption, we relate power consumption to software changes, and then investigate the impact of static OO software metrics on power consumption. We demonstrated that software change can effect power consumption using the Firefox web-browser and the Azureus/Vuze BitTorrent client. We found evidence of a potential relationship between some software metrics and power consumption. In conclusion, we explored the effect of software change on power consumption on two projects; and we provide an initial investigation on the impact of software metrics on power consumption.
随着智能手机、平板电脑和笔记本电脑的日益普及,功耗变得越来越重要。缩短客户电池寿命的威胁现在笼罩着软件开发人员,他们会问:“下一个变化会导致我的软件耗尽客户的电池吗?”一种解决方案是通过测量测试的功率使用情况来检测功耗回归,但这既耗时又经常有噪声。另一种选择是依赖软件度量,它允许我们估计更改可能对功耗产生的影响,从而将开发人员从昂贵的测试中解脱出来。本文提出了一种研究软件变更对功耗影响的通用方法,我们将功耗与软件变更联系起来,然后研究静态OO软件度量对功耗的影响。我们演示了软件变更可以影响使用Firefox网络浏览器和Azureus/Vuze BitTorrent客户端的功耗。我们发现了一些软件度量和功耗之间存在潜在关系的证据。综上所述,我们在两个项目中探讨了软件变更对功耗的影响;我们还提供了软件度量对功耗影响的初步调查。
{"title":"Green mining: A methodology of relating software change to power consumption","authors":"Abram Hindle","doi":"10.1109/MSR.2012.6224303","DOIUrl":"https://doi.org/10.1109/MSR.2012.6224303","url":null,"abstract":"Power consumption is becoming more and more important with the increased popularity of smart-phones, tablets and laptops. The threat of reducing a customer's battery-life now hangs over the software developer who asks, “will this next change be the one that causes my software to drain a customer's battery?” One solution is to detect power consumption regressions by measuring the power usage of tests, but this is time-consuming and often noisy. An alternative is to rely on software metrics that allow us to estimate the impact that a change might have on power consumption thus relieving the developer from expensive testing. This paper presents a general methodology for investigating the impact of software change on power consumption, we relate power consumption to software changes, and then investigate the impact of static OO software metrics on power consumption. We demonstrated that software change can effect power consumption using the Firefox web-browser and the Azureus/Vuze BitTorrent client. We found evidence of a potential relationship between some software metrics and power consumption. In conclusion, we explored the effect of software change on power consumption on two projects; and we provide an initial investigation on the impact of software metrics on power consumption.","PeriodicalId":383774,"journal":{"name":"2012 9th IEEE Working Conference on Mining Software Repositories (MSR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116786272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
期刊
2012 9th IEEE Working Conference on Mining Software Repositories (MSR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1