首页 > 最新文献

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)最新文献

英文 中文
Understanding "watchers" on GitHub 理解GitHub上的“观察者”
Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597114
Jyoti Sheoran, Kelly Blincoe, Eirini Kalliamvakou, D. Damian, J. Ell
Users on GitHub can watch repositories to receive notifications about project activity. This introduces a new type of passive project membership. In this paper, we investigate the behavior of watchers and their contribution to the projects they watch. We find that a subset of project watchers begin contributing to the project and those contributors account for a significant percentage of contributors on the project. As contributors, watchers are more confident and contribute over a longer period of time in a more varied way than other contributors. This is likely attributable to the knowledge gained through project notifications.
GitHub上的用户可以观察存储库以接收有关项目活动的通知。这引入了一种新型的被动项目成员。在本文中,我们研究了观察者的行为和他们对他们所观察的项目的贡献。我们发现一部分项目观察者开始为项目做出贡献,这些贡献者占项目贡献者的很大比例。作为贡献者,观察者比其他贡献者更有信心,并且在更长的时间内以更多样化的方式做出贡献。这可能归因于通过项目通知获得的知识。
{"title":"Understanding \"watchers\" on GitHub","authors":"Jyoti Sheoran, Kelly Blincoe, Eirini Kalliamvakou, D. Damian, J. Ell","doi":"10.1145/2597073.2597114","DOIUrl":"https://doi.org/10.1145/2597073.2597114","url":null,"abstract":"Users on GitHub can watch repositories to receive notifications about project activity. This introduces a new type of passive project membership. In this paper, we investigate the behavior of watchers and their contribution to the projects they watch. We find that a subset of project watchers begin contributing to the project and those contributors account for a significant percentage of contributors on the project. As contributors, watchers are more confident and contribute over a longer period of time in a more varied way than other contributors. This is likely attributable to the knowledge gained through project notifications.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"483 1","pages":"336-339"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80146082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
An industrial case study of automatically identifying performance regression-causes 一个自动识别性能退化原因的工业案例研究
Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597092
Thanh H. D. Nguyen, M. Nagappan, A. Hassan, Mohamed N. Nasser, P. Flora
Even the addition of a single extra field or control statement in the source code of a large-scale software system can lead to performance regressions. Such regressions can considerably degrade the user experience. Working closely with the members of a performance engineering team, we observe that they face a major challenge in identifying the cause of a performance regression given the large number of performance counters (e.g., memory and CPU usage) that must be analyzed. We propose the mining of a regression-causes repository (where the results of performance tests and causes of past regressions are stored) to assist the performance team in identifying the regression-cause of a newly-identified regression. We evaluate our approach on an open-source system, and a commercial system for which the team is responsible. The results show that our approach can accurately (up to 80% accuracy) identify performance regression-causes using a reasonably small number of historical test runs (sometimes as few as four test runs per regression-cause).
即使在大型软件系统的源代码中添加一个额外的字段或控制语句也可能导致性能下降。这样的回归会大大降低用户体验。在与性能工程团队的成员密切合作时,我们观察到他们面临着一个主要的挑战,即在必须分析大量性能计数器(例如,内存和CPU使用率)的情况下,确定性能退化的原因。我们建议挖掘一个回归原因存储库(其中存储了性能测试的结果和过去回归的原因),以帮助性能团队识别新识别的回归的回归原因。我们在开源系统和团队负责的商业系统上评估我们的方法。结果表明,我们的方法可以使用相当少量的历史测试运行(有时每个回归原因只有四个测试运行)准确地(高达80%的准确率)识别性能回归原因。
{"title":"An industrial case study of automatically identifying performance regression-causes","authors":"Thanh H. D. Nguyen, M. Nagappan, A. Hassan, Mohamed N. Nasser, P. Flora","doi":"10.1145/2597073.2597092","DOIUrl":"https://doi.org/10.1145/2597073.2597092","url":null,"abstract":"Even the addition of a single extra field or control statement in the source code of a large-scale software system can lead to performance regressions. Such regressions can considerably degrade the user experience. Working closely with the members of a performance engineering team, we observe that they face a major challenge in identifying the cause of a performance regression given the large number of performance counters (e.g., memory and CPU usage) that must be analyzed. We propose the mining of a regression-causes repository (where the results of performance tests and causes of past regressions are stored) to assist the performance team in identifying the regression-cause of a newly-identified regression. We evaluate our approach on an open-source system, and a commercial system for which the team is responsible. The results show that our approach can accurately (up to 80% accuracy) identify performance regression-causes using a reasonably small number of historical test runs (sometimes as few as four test runs per regression-cause).","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"11 1","pages":"232-241"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85069386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Lean GHTorrent: GitHub data on demand 精益GHTorrent:按需提供GitHub数据
Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597126
Georgios Gousios, Bogdan Vasilescu, Alexander Serebrenik, A. Zaidman
In recent years, GitHub has become the largest code host in the world, with more than 5M developers collaborating across 10M repositories. Numerous popular open source projects (such as Ruby on Rails, Homebrew, Bootstrap, Django or jQuery) have chosen GitHub as their host and have migrated their code base to it. GitHub offers a tremendous research potential. For instance, it is a flagship for current open source development, a place for developers to showcase their expertise to peers or potential recruiters, and the platform where social coding features or pull requests emerged. However, GitHub data is, to date, largely underexplored. To facilitate studies of GitHub, we have created GHTorrent, a scalable, queriable, offline mirror of the data offered through the GitHub REST API. In this paper we present a novel feature of GHTorrent designed to offer customisable data dumps on demand. The new GHTorrent data-on-demand service offers users the possibility to request via a web form up-to-date GHTorrent data dumps for any collection of GitHub repositories. We hope that by offering customisable GHTorrent data dumps we will not only lower the "barrier for entry" even further for researchers interested in mining GitHub data (thus encourage researchers to intensify their mining efforts), but also enhance the replicability of GitHub studies (since a snapshot of the data on which the results were obtained can now easily accompany each study).
近年来,GitHub已经成为世界上最大的代码托管平台,有超过500万开发人员在1000万个存储库中进行协作。许多流行的开源项目(如Ruby on Rails、Homebrew、Bootstrap、Django或jQuery)都选择GitHub作为它们的宿主,并将它们的代码库迁移到它上面。GitHub提供了巨大的研究潜力。例如,它是当前开源开发的旗舰,是开发人员向同行或潜在招聘人员展示专业知识的地方,也是社交编码功能或pull请求出现的平台。然而,到目前为止,GitHub数据在很大程度上尚未得到充分开发。为了方便对GitHub的研究,我们创建了GHTorrent,这是一个可扩展的、可查询的、通过GitHub REST API提供的数据的离线镜像。在本文中,我们提出了GHTorrent的一个新特性,旨在提供可定制的数据转储。新的GHTorrent数据按需服务为用户提供了通过web表单请求任何GitHub存储库的最新GHTorrent数据转储的可能性。我们希望通过提供可定制的GHTorrent数据转储,我们不仅可以进一步降低对挖掘GitHub数据感兴趣的研究人员的“进入门槛”(从而鼓励研究人员加强挖掘工作),还可以增强GitHub研究的可复制性(因为获得结果的数据快照现在可以轻松地伴随每项研究)。
{"title":"Lean GHTorrent: GitHub data on demand","authors":"Georgios Gousios, Bogdan Vasilescu, Alexander Serebrenik, A. Zaidman","doi":"10.1145/2597073.2597126","DOIUrl":"https://doi.org/10.1145/2597073.2597126","url":null,"abstract":"In recent years, GitHub has become the largest code host in the world, with more than 5M developers collaborating across 10M repositories. Numerous popular open source projects (such as Ruby on Rails, Homebrew, Bootstrap, Django or jQuery) have chosen GitHub as their host and have migrated their code base to it. GitHub offers a tremendous research potential. For instance, it is a flagship for current open source development, a place for developers to showcase their expertise to peers or potential recruiters, and the platform where social coding features or pull requests emerged. However, GitHub data is, to date, largely underexplored. To facilitate studies of GitHub, we have created GHTorrent, a scalable, queriable, offline mirror of the data offered through the GitHub REST API. In this paper we present a novel feature of GHTorrent designed to offer customisable data dumps on demand. The new GHTorrent data-on-demand service offers users the possibility to request via a web form up-to-date GHTorrent data dumps for any collection of GitHub repositories. We hope that by offering customisable GHTorrent data dumps we will not only lower the \"barrier for entry\" even further for researchers interested in mining GitHub data (thus encourage researchers to intensify their mining efforts), but also enhance the replicability of GitHub studies (since a snapshot of the data on which the results were obtained can now easily accompany each study).","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"70 1","pages":"384-387"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79723880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 143
Mining questions asked by web developers 挖掘web开发人员提出的问题
Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597083
Kartik Bajaj, K. Pattabiraman, A. Mesbah
Modern web applications consist of a significant amount of client- side code, written in JavaScript, HTML, and CSS. In this paper, we present a study of common challenges and misconceptions among web developers, by mining related questions asked on Stack Over- flow. We use unsupervised learning to categorize the mined questions and define a ranking algorithm to rank all the Stack Overflow questions based on their importance. We analyze the top 50 questions qualitatively. The results indicate that (1) the overall share of web development related discussions is increasing among developers, (2) browser related discussions are prevalent; however, this share is decreasing with time, (3) form validation and other DOM related discussions have been discussed consistently over time, (4) web related discussions are becoming more prevalent in mobile development, and (5) developers face implementation issues with new HTML5 features such as Canvas. We examine the implications of the results on the development, research, and standardization communities.
现代web应用程序由大量的客户端代码组成,这些代码用JavaScript、HTML和CSS编写。在本文中,我们通过挖掘有关Stack Over- flow的相关问题,对web开发人员中常见的挑战和误解进行了研究。我们使用无监督学习对挖掘的问题进行分类,并定义一个排序算法,根据它们的重要性对所有堆栈溢出问题进行排序。我们对前50个问题进行定性分析。结果表明:(1)开发人员对web开发相关讨论的总体份额在增加;(2)浏览器相关讨论普遍存在;然而,随着时间的推移,这一比例正在下降,(3)表单验证和其他DOM相关的讨论一直在讨论,(4)web相关的讨论在移动开发中变得越来越普遍,(5)开发者面临着新的HTML5功能(如Canvas)的执行问题。我们研究了这些结果对开发、研究和标准化社区的影响。
{"title":"Mining questions asked by web developers","authors":"Kartik Bajaj, K. Pattabiraman, A. Mesbah","doi":"10.1145/2597073.2597083","DOIUrl":"https://doi.org/10.1145/2597073.2597083","url":null,"abstract":"Modern web applications consist of a significant amount of client- side code, written in JavaScript, HTML, and CSS. In this paper, we present a study of common challenges and misconceptions among web developers, by mining related questions asked on Stack Over- flow. We use unsupervised learning to categorize the mined questions and define a ranking algorithm to rank all the Stack Overflow questions based on their importance. We analyze the top 50 questions qualitatively. The results indicate that (1) the overall share of web development related discussions is increasing among developers, (2) browser related discussions are prevalent; however, this share is decreasing with time, (3) form validation and other DOM related discussions have been discussed consistently over time, (4) web related discussions are becoming more prevalent in mobile development, and (5) developers face implementation issues with new HTML5 features such as Canvas. We examine the implications of the results on the development, research, and standardization communities.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"2001 1","pages":"112-121"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82848658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 141
Mining questions about software energy consumption 挖掘有关软件能耗的问题
Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597110
G. Pinto, F. C. Filho, Yu David Liu
A growing number of software solutions have been proposed to address application-level energy consumption problems in the last few years. However, little is known about how much software developers are concerned about energy consumption, what aspects of energy consumption they consider important, and what solutions they have in mind for improving energy efficiency. In this paper we present the first empirical study on understanding the views of application programmers on software energy consumption problems. Using StackOverflow as our primary data source, we analyze a carefully curated sample of more than 300 questions and 550 answers from more than 800 users. With this data, we observed a number of interesting findings. Our study shows that practitioners are aware of the energy consumption problems: the questions they ask are not only diverse -- we found 5 main themes of questions -- but also often more interesting and challenging when compared to the control question set. Even though energy consumption-related questions are popular when considering a number of different popularity measures, the same cannot be said about the quality of their answers. In addition, we observed that some of these answers are often flawed or vague. We contrast the advice provided by these answers with the state-of-the-art research on energy consumption. Our summary of software energy consumption problems may help researchers focus on what matters the most to software developers and end users.
在过去的几年里,越来越多的软件解决方案被提出来解决应用级的能耗问题。然而,很少有人知道软件开发人员有多关心能源消耗,他们认为能源消耗的哪些方面是重要的,以及他们想到了哪些提高能源效率的解决方案。本文首次对理解应用程序员对软件能耗问题的看法进行了实证研究。使用StackOverflow作为我们的主要数据源,我们分析了来自800多个用户的300多个问题和550个答案的精心策划的样本。通过这些数据,我们观察到了一些有趣的发现。我们的研究表明,从业者意识到了能源消耗问题:他们提出的问题不仅多种多样——我们发现了5个主要主题的问题——而且与对照问题集相比,往往更有趣和更具挑战性。尽管考虑到许多不同的受欢迎程度的衡量标准,与能源消耗相关的问题很受欢迎,但答案的质量却并非如此。此外,我们注意到其中一些答案往往是有缺陷的或模糊的。我们将这些答案提供的建议与最先进的能源消耗研究进行对比。我们对软件能耗问题的总结可以帮助研究人员关注对软件开发人员和最终用户最重要的问题。
{"title":"Mining questions about software energy consumption","authors":"G. Pinto, F. C. Filho, Yu David Liu","doi":"10.1145/2597073.2597110","DOIUrl":"https://doi.org/10.1145/2597073.2597110","url":null,"abstract":"A growing number of software solutions have been proposed to address application-level energy consumption problems in the last few years. However, little is known about how much software developers are concerned about energy consumption, what aspects of energy consumption they consider important, and what solutions they have in mind for improving energy efficiency. In this paper we present the first empirical study on understanding the views of application programmers on software energy consumption problems. Using StackOverflow as our primary data source, we analyze a carefully curated sample of more than 300 questions and 550 answers from more than 800 users. With this data, we observed a number of interesting findings. Our study shows that practitioners are aware of the energy consumption problems: the questions they ask are not only diverse -- we found 5 main themes of questions -- but also often more interesting and challenging when compared to the control question set. Even though energy consumption-related questions are popular when considering a number of different popularity measures, the same cannot be said about the quality of their answers. In addition, we observed that some of these answers are often flawed or vague. We contrast the advice provided by these answers with the state-of-the-art research on energy consumption. Our summary of software energy consumption problems may help researchers focus on what matters the most to software developers and end users.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"208 1","pages":"22-31"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80529852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 176
Models of OSS project meta-information: a dataset of three forges 开源软件项目元信息模型:三个伪造的数据集
Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597132
James R. Williams, D. D. Ruscio, N. Matragkas, Juri Di Rocco, D. Kolovos
The process of selecting open-source software (OSS) for adoption is not straightforward as it involves exploring various sources of information to determine the quality, maturity, activity, and user support of each project. In the context of the OSSMETER project, we have developed a forge-agnostic metamodel that captures the meta-information common to all OSS projects. We specialise this metamodel for popular OSS forges in order to capture forge-specific meta-information. In this paper we present a dataset conforming to these metamodels for over 500,000 OSS projects hosted on three popular OSS forges: Eclipse, SourceForge, and GitHub. The dataset enables different kinds of automatic analysis and supports objective comparisons of cross-forge OSS alternatives with respect to a user's needs and quality requirements.
选择采用开源软件(OSS)的过程并不简单,因为它涉及到探索各种信息来源,以确定每个项目的质量、成熟度、活动和用户支持。在OSSMETER项目的上下文中,我们开发了一个与伪造无关的元模型,它捕获了所有OSS项目共有的元信息。我们将这个元模型专门用于流行的OSS伪造,以便捕获特定于伪造的元信息。在本文中,我们提供了一个符合这些元模型的数据集,用于托管在三个流行的OSS forge上的超过500,000个OSS项目:Eclipse, SourceForge和GitHub。该数据集支持不同类型的自动分析,并支持针对用户需求和质量要求的跨forge OSS备选方案的客观比较。
{"title":"Models of OSS project meta-information: a dataset of three forges","authors":"James R. Williams, D. D. Ruscio, N. Matragkas, Juri Di Rocco, D. Kolovos","doi":"10.1145/2597073.2597132","DOIUrl":"https://doi.org/10.1145/2597073.2597132","url":null,"abstract":"The process of selecting open-source software (OSS) for adoption is not straightforward as it involves exploring various sources of information to determine the quality, maturity, activity, and user support of each project. In the context of the OSSMETER project, we have developed a forge-agnostic metamodel that captures the meta-information common to all OSS projects. We specialise this metamodel for popular OSS forges in order to capture forge-specific meta-information. In this paper we present a dataset conforming to these metamodels for over 500,000 OSS projects hosted on three popular OSS forges: Eclipse, SourceForge, and GitHub. The dataset enables different kinds of automatic analysis and supports objective comparisons of cross-forge OSS alternatives with respect to a user's needs and quality requirements.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"1984 1","pages":"408-411"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89878331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A green miner's dataset: mining the impact of software change on energy consumption 一个绿色矿工的数据集:挖掘软件变化对能源消耗的影响
Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597130
Chenlei Zhang, Abram Hindle
With the advent of mobile computing, the responsibility of software developers to update and ship energy efficient applications has never been more pronounced. Green mining attempts to address this responsibility by examining the impact of software change on energy consumption. One problem with green mining is that power performance data is not readily available, unlike many other forms of MSR research. Green miners have to create tests and run them across numerous versions of a software project because power performance data was either missing or never existed for that particular project. In this paper we describe multiple open green mining datasets used in prior green mining work. The dataset includes numerous power traces and parallel system call and CPU/IO/Memory traces of multiple versions of multiple products. These datasets enable those more interested in data-mining and modeling to work on green mining problems as well.
随着移动计算的出现,软件开发人员更新和发布节能应用程序的责任从未如此明显。绿色采矿试图通过检查软件变化对能源消耗的影响来解决这一责任。绿色采矿的一个问题是,与许多其他形式的MSR研究不同,电力性能数据并不容易获得。绿色矿工必须创建测试并在软件项目的多个版本中运行它们,因为某个特定项目的电源性能数据要么缺失,要么根本不存在。本文描述了之前绿色挖掘工作中使用的多个开放绿色挖掘数据集。该数据集包括多个产品的多个版本的众多电源跟踪和并行系统调用以及CPU/IO/Memory跟踪。这些数据集使那些对数据挖掘和建模更感兴趣的人也可以研究绿色挖掘问题。
{"title":"A green miner's dataset: mining the impact of software change on energy consumption","authors":"Chenlei Zhang, Abram Hindle","doi":"10.1145/2597073.2597130","DOIUrl":"https://doi.org/10.1145/2597073.2597130","url":null,"abstract":"With the advent of mobile computing, the responsibility of software developers to update and ship energy efficient applications has never been more pronounced. Green mining attempts to address this responsibility by examining the impact of software change on energy consumption. One problem with green mining is that power performance data is not readily available, unlike many other forms of MSR research. Green miners have to create tests and run them across numerous versions of a software project because power performance data was either missing or never existed for that particular project. In this paper we describe multiple open green mining datasets used in prior green mining work. The dataset includes numerous power traces and parallel system call and CPU/IO/Memory traces of multiple versions of multiple products. These datasets enable those more interested in data-mining and modeling to work on green mining problems as well.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"88 1","pages":"400-403"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79460811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Impact analysis of change requests on source code based on interaction and commit histories 基于交互和提交历史对源代码的变更请求进行影响分析
Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597096
Motahareh Bahrami Zanjani, George Swartzendruber, Huzefa H. Kagdi
The paper presents an approach to perform impact analysis (IA) of an incoming change request on source code. The approach is based on a combination of interaction (e.g., Mylyn) and commit (e.g., CVS) histories. The source code entities (i.e., files and methods) that were interacted or changed in the resolution of past change requests (e.g., bug fixes) were used. Information retrieval, machine learning, and lightweight source code analysis techniques were employed to form a corpus from these source code entities. Additionally, the corpus was augmented with the textual descriptions of the previously resolved change requests and their associated commit messages. Given a textual description of a change request, this corpus is queried to obtain a ranked list of relevant source code entities that are most likely change prone. Such an approach that combines information from interactions and commits for IA at the change request level was not previously investigated. Furthermore, the approach requires only the entities that were interacted and/or committed in the past, which differs from the previous solutions that require indexing of a complete snapshot (e.g., a release). An empirical study on 3272 interactions and 5093 commits from Mylyn, an open source task management tool, was conducted. The results show that the combined approach outperforms an individual approach based on commits. Moreover, it also outperformed an approach based on indexing a single, complete snapshot of a software system.
本文提出了一种对源代码的传入变更请求进行影响分析(IA)的方法。该方法基于交互(例如Mylyn)和提交(例如CVS)历史记录的组合。源代码实体(例如,文件和方法)在过去的更改请求(例如,错误修复)的解决方案中被交互或更改。利用信息检索、机器学习和轻量级源代码分析技术从这些源代码实体中形成语料库。此外,语料库还增加了先前解决的变更请求及其相关提交消息的文本描述。给定变更请求的文本描述,查询该语料库以获得最有可能发生变更的相关源代码实体的排序列表。这种将来自交互和提交的信息结合在变更请求级别的IA的方法以前没有被研究过。此外,该方法只需要过去交互和/或提交的实体,这与之前需要对完整快照(例如,发布)进行索引的解决方案不同。本文对来自开源任务管理工具Mylyn的3272次交互和5093次提交进行了实证研究。结果表明,组合方法优于基于提交的单个方法。此外,它还优于基于对软件系统的单个完整快照进行索引的方法。
{"title":"Impact analysis of change requests on source code based on interaction and commit histories","authors":"Motahareh Bahrami Zanjani, George Swartzendruber, Huzefa H. Kagdi","doi":"10.1145/2597073.2597096","DOIUrl":"https://doi.org/10.1145/2597073.2597096","url":null,"abstract":"The paper presents an approach to perform impact analysis (IA) of an incoming change request on source code. The approach is based on a combination of interaction (e.g., Mylyn) and commit (e.g., CVS) histories. The source code entities (i.e., files and methods) that were interacted or changed in the resolution of past change requests (e.g., bug fixes) were used. Information retrieval, machine learning, and lightweight source code analysis techniques were employed to form a corpus from these source code entities. Additionally, the corpus was augmented with the textual descriptions of the previously resolved change requests and their associated commit messages. Given a textual description of a change request, this corpus is queried to obtain a ranked list of relevant source code entities that are most likely change prone. Such an approach that combines information from interactions and commits for IA at the change request level was not previously investigated. Furthermore, the approach requires only the entities that were interacted and/or committed in the past, which differs from the previous solutions that require indexing of a complete snapshot (e.g., a release). \u0000 An empirical study on 3272 interactions and 5093 commits from Mylyn, an open source task management tool, was conducted. The results show that the combined approach outperforms an individual approach based on commits. Moreover, it also outperformed an approach based on indexing a single, complete snapshot of a software system.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"152 1","pages":"162-171"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79613730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
A dataset for maven artifacts and bug patterns found in them maven工件和其中发现的错误模式的数据集
Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597134
V. Saini, Hitesh Sajnani, Joel Ossher, C. Lopes
In this paper, we present data downloaded from Maven, one of the most popular component repositories. The data includes the binaries of 186,392 components, along with source code for 161,025. We identify and organize these components into groups where each group contains all the versions of a library. In order to asses the quality of these components, we make available report generated by the FindBugs tool on 64,574 components. The information is also made available in the form of a database which stores total number, type, and priority of bug patterns found in each component, along with its defect density. We also describe how this dataset can be useful in software engineering research.
在本文中,我们展示了从Maven(最流行的组件存储库之一)下载的数据。数据包括186,392个组件的二进制文件,以及161,025个组件的源代码。我们将这些组件识别并组织成组,其中每个组包含一个库的所有版本。为了评估这些组件的质量,我们使用FindBugs工具在64,574个组件上生成报告。该信息还以数据库的形式提供,该数据库存储每个组件中发现的bug模式的总数、类型和优先级,以及缺陷密度。我们还描述了该数据集如何在软件工程研究中发挥作用。
{"title":"A dataset for maven artifacts and bug patterns found in them","authors":"V. Saini, Hitesh Sajnani, Joel Ossher, C. Lopes","doi":"10.1145/2597073.2597134","DOIUrl":"https://doi.org/10.1145/2597073.2597134","url":null,"abstract":"In this paper, we present data downloaded from Maven, one of the most popular component repositories. The data includes the binaries of 186,392 components, along with source code for 161,025. We identify and organize these components into groups where each group contains all the versions of a library. In order to asses the quality of these components, we make available report generated by the FindBugs tool on 64,574 components. The information is also made available in the form of a database which stores total number, type, and priority of bug patterns found in each component, along with its defect density. We also describe how this dataset can be useful in software engineering research.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"201 1","pages":"416-419"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76987370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects 代码审查覆盖率和代码审查参与对软件质量的影响:qt、VTK和ITK项目的一个案例研究
Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597076
Shane McIntosh, Yasutaka Kamei, Bram Adams, A. Hassan
Software code review, i.e., the practice of having third-party team members critique changes to a software system, is a well-established best practice in both open source and proprietary software domains. Prior work has shown that the formal code inspections of the past tend to improve the quality of software delivered by students and small teams. However, the formal code inspection process mandates strict review criteria (e.g., in-person meetings and reviewer checklists) to ensure a base level of review quality, while the modern, lightweight code reviewing process does not. Although recent work explores the modern code review process qualitatively, little research quantitatively explores the relationship between properties of the modern code review process and software quality. Hence, in this paper, we study the relationship between software quality and: (1) code review coverage, i.e., the proportion of changes that have been code reviewed, and (2) code review participation, i.e., the degree of reviewer involvement in the code review process. Through a case study of the Qt, VTK, and ITK projects, we find that both code review coverage and participation share a significant link with software quality. Low code review coverage and participation are estimated to produce components with up to two and five additional post-release defects respectively. Our results empirically confirm the intuition that poorly reviewed code has a negative impact on software quality in large systems using modern reviewing tools.
软件代码审查,也就是让第三方团队成员对软件系统的更改进行评论的做法,是在开放源码和专有软件领域中公认的最佳实践。以前的工作已经表明,过去的正式代码检查倾向于提高学生和小团队交付的软件质量。然而,正式的代码审查过程要求严格的审查标准(例如,面对面的会议和审查人员检查表)来确保基本的审查质量水平,而现代的轻量级代码审查过程则没有这样做。尽管最近的工作定性地探讨了现代代码审查过程,但很少有研究定量地探讨了现代代码审查过程的属性与软件质量之间的关系。因此,在本文中,我们研究了软件质量与以下因素之间的关系:(1)代码审查覆盖率,即已被代码审查的变更比例;(2)代码审查参与度,即审查者在代码审查过程中的参与程度。通过对Qt、VTK和ITK项目的案例研究,我们发现代码审查覆盖率和参与都与软件质量有着重要的联系。较低的代码审查覆盖率和参与率估计会分别产生两个和五个额外的发布后缺陷的组件。我们的结果从经验上证实了这样一种直觉,即在使用现代评审工具的大型系统中,评审不佳的代码对软件质量有负面影响。
{"title":"The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects","authors":"Shane McIntosh, Yasutaka Kamei, Bram Adams, A. Hassan","doi":"10.1145/2597073.2597076","DOIUrl":"https://doi.org/10.1145/2597073.2597076","url":null,"abstract":"Software code review, i.e., the practice of having third-party team members critique changes to a software system, is a well-established best practice in both open source and proprietary software domains. Prior work has shown that the formal code inspections of the past tend to improve the quality of software delivered by students and small teams. However, the formal code inspection process mandates strict review criteria (e.g., in-person meetings and reviewer checklists) to ensure a base level of review quality, while the modern, lightweight code reviewing process does not. Although recent work explores the modern code review process qualitatively, little research quantitatively explores the relationship between properties of the modern code review process and software quality. Hence, in this paper, we study the relationship between software quality and: (1) code review coverage, i.e., the proportion of changes that have been code reviewed, and (2) code review participation, i.e., the degree of reviewer involvement in the code review process. Through a case study of the Qt, VTK, and ITK projects, we find that both code review coverage and participation share a significant link with software quality. Low code review coverage and participation are estimated to produce components with up to two and five additional post-release defects respectively. Our results empirically confirm the intuition that poorly reviewed code has a negative impact on software quality in large systems using modern reviewing tools.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"26 1","pages":"192-201"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74060626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 267
期刊
2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1