首页 > 最新文献

2009 6th IEEE International Working Conference on Mining Software Repositories最新文献

英文 中文
MapReduce as a general framework to support research in Mining Software Repositories (MSR) MapReduce作为支持挖掘软件存储库(MSR)研究的通用框架
Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069477
Weiyi Shang, Z. Jiang, Bram Adams, A. Hassan
Researchers continue to demonstrate the benefits of Mining Software Repositories (MSR) for supporting software development and research activities. However, as the mining process is time and resource intensive, they often create their own distributed platforms and use various optimizations to speed up and scale up their analysis. These platforms are project-specific, hard to reuse, and offer minimal debugging and deployment support. In this paper, we propose the use of MapReduce, a distributed computing platform, to support research in MSR. As a proof-of-concept, we migrate J-REX, an optimized evolutionary code extractor, to run on Hadoop, an open source implementation of MapReduce. Through a case study on the source control repositories of the Eclipse, BIRT and Datatools projects, we demonstrate that the migration effort to MapReduce is minimal and that the benefits are significant, as running time of the migrated J-REX is only 30% to 50% of the original J-REX's. This paper documents our experience with the migration, and highlights the benefits and challenges of the MapReduce framework in the MSR community.
研究人员继续展示挖掘软件存储库(MSR)支持软件开发和研究活动的好处。然而,由于挖掘过程是时间和资源密集型的,他们经常创建自己的分布式平台,并使用各种优化来加速和扩展他们的分析。这些平台是特定于项目的,难以重用,并且只提供很少的调试和部署支持。在本文中,我们提出使用分布式计算平台MapReduce来支持MSR的研究。作为概念验证,我们将J-REX(一个优化的进化代码提取器)迁移到Hadoop (MapReduce的开源实现)上运行。通过对Eclipse、BIRT和Datatools项目的源代码控制存储库的案例研究,我们证明了迁移到MapReduce的工作量是最小的,而且好处是显著的,因为迁移后的J-REX的运行时间仅为原始J-REX的30%到50%。本文记录了我们在迁移方面的经验,并强调了MapReduce框架在MSR社区中的好处和挑战。
{"title":"MapReduce as a general framework to support research in Mining Software Repositories (MSR)","authors":"Weiyi Shang, Z. Jiang, Bram Adams, A. Hassan","doi":"10.1109/MSR.2009.5069477","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069477","url":null,"abstract":"Researchers continue to demonstrate the benefits of Mining Software Repositories (MSR) for supporting software development and research activities. However, as the mining process is time and resource intensive, they often create their own distributed platforms and use various optimizations to speed up and scale up their analysis. These platforms are project-specific, hard to reuse, and offer minimal debugging and deployment support. In this paper, we propose the use of MapReduce, a distributed computing platform, to support research in MSR. As a proof-of-concept, we migrate J-REX, an optimized evolutionary code extractor, to run on Hadoop, an open source implementation of MapReduce. Through a case study on the source control repositories of the Eclipse, BIRT and Datatools projects, we demonstrate that the migration effort to MapReduce is minimal and that the benefits are significant, as running time of the migrated J-REX is only 30% to 50% of the original J-REX's. This paper documents our experience with the migration, and highlights the benefits and challenges of the MapReduce framework in the MSR community.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126929761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Assigning bug reports using a vocabulary-based expertise model of developers 使用基于词汇表的开发人员专业知识模型分配bug报告
Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069491
Do Matter, Adrian Kuhn, Oscar Nierstrasz
For popular software systems, the number of daily submitted bug reports is high. Triaging these incoming reports is a time consuming task. Part of the bug triage is the assignment of a report to a developer with the appropriate expertise. In this paper, we present an approach to automatically suggest developers who have the appropriate expertise for handling a bug report. We model developer expertise using the vocabulary found in their source code contributions and compare this vocabulary to the vocabulary of bug reports. We evaluate our approach by comparing the suggested experts to the persons who eventually worked on the bug. Using eight years of Eclipse development as a case study, we achieve 33.6% top-1 precision and 71.0% top-10 recall.
对于流行的软件系统,每天提交的bug报告数量很高。对这些传入报告进行分类是一项耗时的任务。bug分类的一部分是将报告分配给具有适当专业知识的开发人员。在本文中,我们提出了一种方法来自动推荐具有处理bug报告的适当专业知识的开发人员。我们使用开发人员贡献的源代码中的词汇表对开发人员的专业知识进行建模,并将该词汇表与bug报告中的词汇表进行比较。我们通过比较建议的专家和最终处理bug的人员来评估我们的方法。使用8年的Eclipse开发作为案例研究,我们实现了33.6%的前1名精度和71.0%的前10名召回率。
{"title":"Assigning bug reports using a vocabulary-based expertise model of developers","authors":"Do Matter, Adrian Kuhn, Oscar Nierstrasz","doi":"10.1109/MSR.2009.5069491","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069491","url":null,"abstract":"For popular software systems, the number of daily submitted bug reports is high. Triaging these incoming reports is a time consuming task. Part of the bug triage is the assignment of a report to a developer with the appropriate expertise. In this paper, we present an approach to automatically suggest developers who have the appropriate expertise for handling a bug report. We model developer expertise using the vocabulary found in their source code contributions and compare this vocabulary to the vocabulary of bug reports. We evaluate our approach by comparing the suggested experts to the persons who eventually worked on the bug. Using eight years of Eclipse development as a case study, we achieve 33.6% top-1 precision and 71.0% top-10 recall.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115650718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 229
Mining search topics from a code search engine usage log 从代码搜索引擎使用日志中挖掘搜索主题
Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069489
S. Bajracharya, C. Lopes
We present a topic modeling analysis of a year long usage log of Koders, one of the major commercial code search engines. This analysis contributes to the understanding of what users of code search engines are looking for. Observations on the prevalence of these topics among the users, and on how search and download activities vary across topics, leads to the conclusion that users who find code search engines usable are those who already know to a high level of specificity what to look for. This paper presents a general categorization of these topics that provides insights on the different ways code search engine users express their queries. The findings support the conclusion that existing code search engines provide only a subset of the various information needs of the users when compared to the categories of queries they look at.
本文对主要商业代码搜索引擎之一Koders长达一年的使用日志进行了主题建模分析。这种分析有助于理解代码搜索引擎的用户在寻找什么。通过观察这些主题在用户中的流行程度,以及搜索和下载活动在不同主题之间的差异,可以得出这样的结论:发现代码搜索引擎可用的用户是那些已经高度明确地知道要查找什么的用户。本文提出了这些主题的一般分类,提供了对代码搜索引擎用户表达查询的不同方式的见解。这些发现支持了这样一个结论,即与用户查看的查询类别相比,现有的代码搜索引擎只提供了用户各种信息需求的一个子集。
{"title":"Mining search topics from a code search engine usage log","authors":"S. Bajracharya, C. Lopes","doi":"10.1109/MSR.2009.5069489","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069489","url":null,"abstract":"We present a topic modeling analysis of a year long usage log of Koders, one of the major commercial code search engines. This analysis contributes to the understanding of what users of code search engines are looking for. Observations on the prevalence of these topics among the users, and on how search and download activities vary across topics, leads to the conclusion that users who find code search engines usable are those who already know to a high level of specificity what to look for. This paper presents a general categorization of these topics that provides insights on the different ways code search engine users express their queries. The findings support the conclusion that existing code search engines provide only a subset of the various information needs of the users when compared to the categories of queries they look at.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122561049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
On what basis to recommend: Changesets or interactions? 在什么基础上推荐:变更集还是交互?
Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069494
Sarah Rastkar, G. Murphy
Different flavours of recommendation systems have been proposed to help software developers perform software evolution tasks. A number of these recommendation systems are based on changesets. When changeset information is used, recommendations are based on only the end result of the activity undertaken to complete a task. In this paper, we report on an investigation that compared how recommendations based on changesets compare to recommendations based on interactions collected as a programmer performed the task that resulted in a changeset. To provide a common basis for the comparison, our investigation considered how bug reports considered similar based on changeset information compare to bug reports considered similar based on interaction information. We found that there is no direct relationship between the bug reports found similar with the different methods, suggesting that each comparison methods captures a different aspect of the problem.
人们提出了不同风格的推荐系统来帮助软件开发人员执行软件进化任务。许多这样的推荐系统都是基于变更集的。当使用变更集信息时,建议仅基于为完成任务而进行的活动的最终结果。在本文中,我们报告了一项调查,该调查比较了基于变更集的建议与基于程序员执行导致变更集的任务时收集的交互的建议的比较。为了提供一个通用的比较基础,我们的调查考虑了基于变更集信息的bug报告与基于交互信息的bug报告的相似性。我们发现,用不同方法发现的相似bug报告之间没有直接关系,这表明每种比较方法捕获了问题的不同方面。
{"title":"On what basis to recommend: Changesets or interactions?","authors":"Sarah Rastkar, G. Murphy","doi":"10.1109/MSR.2009.5069494","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069494","url":null,"abstract":"Different flavours of recommendation systems have been proposed to help software developers perform software evolution tasks. A number of these recommendation systems are based on changesets. When changeset information is used, recommendations are based on only the end result of the activity undertaken to complete a task. In this paper, we report on an investigation that compared how recommendations based on changesets compare to recommendations based on interactions collected as a programmer performed the task that resulted in a changeset. To provide a common basis for the comparison, our investigation considered how bug reports considered similar based on changeset information compare to bug reports considered similar based on interaction information. We found that there is no direct relationship between the bug reports found similar with the different methods, suggesting that each comparison methods captures a different aspect of the problem.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"355 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122652848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Learning from defect removals 从缺陷移除中学习
Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069500
N. Ayewah, W. Pugh
Recent research has tried to identify changes in source code repositories that fix bugs by linking these changes to reports in issue tracking systems. These changes have been traced back to the point in time when they were previously modified as a way of identifying bug introducing changes. But we observe that not all changes linked to bug tracking systems are fixing bugs; some are enhancing the code. Furthermore, not all fixes are applied at the point in the code where the bug was originally introduced. We flesh out these observations with a manual review of several software projects, and use this opportunity to see how many defects are in the scope of static analysis tools.
最近的研究试图通过将这些更改链接到问题跟踪系统中的报告来识别源代码存储库中修复错误的更改。这些更改可以追溯到它们之前被修改的时间点,作为识别引入更改的bug的一种方式。但我们观察到,并非所有与bug跟踪系统相关的更改都在修复bug;一些公司正在改进代码。此外,并不是所有的修复都应用于最初引入错误的代码点。我们通过对几个软件项目的手工审查来充实这些观察结果,并利用这个机会来查看静态分析工具范围内有多少缺陷。
{"title":"Learning from defect removals","authors":"N. Ayewah, W. Pugh","doi":"10.1109/MSR.2009.5069500","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069500","url":null,"abstract":"Recent research has tried to identify changes in source code repositories that fix bugs by linking these changes to reports in issue tracking systems. These changes have been traced back to the point in time when they were previously modified as a way of identifying bug introducing changes. But we observe that not all changes linked to bug tracking systems are fixing bugs; some are enhancing the code. Furthermore, not all fixes are applied at the point in the code where the bug was originally introduced. We flesh out these observations with a manual review of several software projects, and use this opportunity to see how many defects are in the scope of static analysis tools.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115401464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Mining the Jazz repository: Challenges and opportunities 挖掘Jazz存储库:挑战和机遇
Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069495
Kim Herzig, A. Zeller
By integrating various development and collaboration tools into one single platform, the Jazz environment offers several opportunities for software repository miners. In particular, Jazz offers full traceability from the initial requirements via work packages and work assignments to the final changes and tests; all these features can be easily accessed and leveraged for better prediction and recommendation systems. In this paper, we share our initial experiences from mining the Jazz repository. We also give a short overview of the retrieved data sets and discuss possible problems of the Jazz repository and the platform itself.
通过将各种开发和协作工具集成到一个平台中,Jazz环境为软件存储库挖掘者提供了许多机会。特别是,Jazz提供了从初始需求到工作包和工作分配到最终更改和测试的完整可追溯性;所有这些特性都可以很容易地访问并用于更好的预测和推荐系统。在本文中,我们将分享挖掘Jazz存储库的初步经验。我们还简要概述了检索到的数据集,并讨论了Jazz存储库和平台本身可能存在的问题。
{"title":"Mining the Jazz repository: Challenges and opportunities","authors":"Kim Herzig, A. Zeller","doi":"10.1109/MSR.2009.5069495","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069495","url":null,"abstract":"By integrating various development and collaboration tools into one single platform, the Jazz environment offers several opportunities for software repository miners. In particular, Jazz offers full traceability from the initial requirements via work packages and work assignments to the final changes and tests; all these features can be easily accessed and leveraged for better prediction and recommendation systems. In this paper, we share our initial experiences from mining the Jazz repository. We also give a short overview of the retrieved data sets and discuss possible problems of the Jazz repository and the platform itself.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131988307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Does calling structure information improve the accuracy of fault prediction? 调用结构信息是否提高了故障预测的准确性?
Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069481
Yonghee Shin, Robert M. Bell, T. Ostrand, E. Weyuker
Previous studies have shown that software code attributes, such as lines of source code, and history information, such as the number of code changes and the number of faults in prior releases of software, are useful for predicting where faults will occur. In this study of an industrial software system, we investigate the effectiveness of adding information about calling structure to fault prediction models. The addition of calling structure information to a model based solely on non-calling structure code attributes provided noticeable improvement in prediction accuracy, but only marginally improved the best model based on history and non-calling structure code attributes. The best model based on history and non-calling structure code attributes outperformed the best model based on calling and non-calling structure code attributes.
以前的研究已经表明,软件代码属性(例如源代码行)和历史信息(例如代码更改的数量和先前软件版本中的错误数量)对于预测错误发生的位置非常有用。本文以一个工业软件系统为研究对象,研究了在故障预测模型中加入调用结构信息的有效性。将调用结构信息添加到仅基于非调用结构代码属性的模型中可以显著提高预测精度,但仅对基于历史和非调用结构代码属性的最佳模型有轻微的改进。基于历史和非调用结构代码属性的最佳模型优于基于调用和非调用结构代码属性的最佳模型。
{"title":"Does calling structure information improve the accuracy of fault prediction?","authors":"Yonghee Shin, Robert M. Bell, T. Ostrand, E. Weyuker","doi":"10.1109/MSR.2009.5069481","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069481","url":null,"abstract":"Previous studies have shown that software code attributes, such as lines of source code, and history information, such as the number of code changes and the number of faults in prior releases of software, are useful for predicting where faults will occur. In this study of an industrial software system, we investigate the effectiveness of adding information about calling structure to fault prediction models. The addition of calling structure information to a model based solely on non-calling structure code attributes provided noticeable improvement in prediction accuracy, but only marginally improved the best model based on history and non-calling structure code attributes. The best model based on history and non-calling structure code attributes outperformed the best model based on calling and non-calling structure code attributes.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121896828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Using Latent Dirichlet Allocation for automatic categorization of software 基于潜狄利克雷分配的自动分类软件
Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069496
Kai Tian, Meghan Revelle, D. Poshyvanyk
In this paper, we propose a technique called LACT for automatically categorizing software systems in open-source repositories. LACT is based on Latent Dirichlet Allocation, an information retrieval method which is used to index and analyze source code documents as mixtures of probabilistic topics. For an initial evaluation, we performed two studies. In the first study, LACT was compared against an existing tool, MUDABlue, for classifying 41 software systems written in C into problem domain categories. The results indicate that LACT can automatically produce meaningful category names and yield classification results comparable to MUDABlue. In the second study, we applied LACT to 43 software systems written in different programming languages such as C/C++, Java, C#, PHP, and Perl. The results indicate that LACT can be used effectively for the automatic categorization of software systems regardless of the underlying programming language or paradigm. Moreover, both studies indicate that LACT can identify several new categories that are based on libraries, architectures, or programming languages, which is a promising improvement as compared to manual categorization and existing techniques.
在本文中,我们提出了一种称为LACT的技术,用于对开源存储库中的软件系统进行自动分类。LACT是一种基于潜在狄利克雷分配的信息检索方法,该方法用于将源代码文档作为概率主题的混合物进行索引和分析。为了初步评估,我们进行了两项研究。在第一项研究中,将LACT与现有的工具mudabblue进行比较,将41个用C编写的软件系统划分为问题领域类别。结果表明,LACT可以自动生成有意义的分类名称,分类结果与mudabblue相当。在第二项研究中,我们将LACT应用于43个用C/ c++、Java、c#、PHP和Perl等不同编程语言编写的软件系统。结果表明,无论底层编程语言或范式如何,LACT都可以有效地用于软件系统的自动分类。此外,两项研究都表明,LACT可以识别基于库、体系结构或编程语言的几个新类别,与手动分类和现有技术相比,这是一个有希望的改进。
{"title":"Using Latent Dirichlet Allocation for automatic categorization of software","authors":"Kai Tian, Meghan Revelle, D. Poshyvanyk","doi":"10.1109/MSR.2009.5069496","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069496","url":null,"abstract":"In this paper, we propose a technique called LACT for automatically categorizing software systems in open-source repositories. LACT is based on Latent Dirichlet Allocation, an information retrieval method which is used to index and analyze source code documents as mixtures of probabilistic topics. For an initial evaluation, we performed two studies. In the first study, LACT was compared against an existing tool, MUDABlue, for classifying 41 software systems written in C into problem domain categories. The results indicate that LACT can automatically produce meaningful category names and yield classification results comparable to MUDABlue. In the second study, we applied LACT to 43 software systems written in different programming languages such as C/C++, Java, C#, PHP, and Perl. The results indicate that LACT can be used effectively for the automatic categorization of software systems regardless of the underlying programming language or paradigm. Moreover, both studies indicate that LACT can identify several new categories that are based on libraries, architectures, or programming languages, which is a promising improvement as compared to manual categorization and existing techniques.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125017996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 170
Using association rules to study the co-evolution of production & test code 利用关联规则研究生产代码与测试代码的协同演化
Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069493
Z. Lubsen, A. Zaidman, M. Pinzger
Unit tests are generally acknowledged as an important aid to produce high quality code, as they provide quick feedback to developers on the correctness of their code. In order to achieve high quality, well-maintained tests are needed. Ideally, tests co-evolve with the production code to test changes as soon as possible. In this paper, we explore an approach based on association rule mining to determine whether production and test code co-evolve synchronously. Through two case studies, one with an open source and another one with an industrial software system, we show that our association rule mining approach allows one to assess the co-evolution of product and test code in a software project and, moreover, to uncover the distribution of programmer effort over pure coding, pure testing, or a more test-driven-like practice.
单元测试通常被认为是生成高质量代码的重要辅助工具,因为它们为开发人员提供了关于代码正确性的快速反馈。为了达到高质量,需要良好维护的测试。理想情况下,测试与产品代码共同发展,以尽快测试更改。在本文中,我们探索了一种基于关联规则挖掘的方法来确定生产代码和测试代码是否同步共同进化。通过两个案例研究,一个是开放源码的,另一个是工业软件系统的,我们展示了我们的关联规则挖掘方法允许评估软件项目中产品和测试代码的共同演变,而且,揭示了程序员在纯编码、纯测试或更像测试驱动的实践上的工作分布。
{"title":"Using association rules to study the co-evolution of production & test code","authors":"Z. Lubsen, A. Zaidman, M. Pinzger","doi":"10.1109/MSR.2009.5069493","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069493","url":null,"abstract":"Unit tests are generally acknowledged as an important aid to produce high quality code, as they provide quick feedback to developers on the correctness of their code. In order to achieve high quality, well-maintained tests are needed. Ideally, tests co-evolve with the production code to test changes as soon as possible. In this paper, we explore an approach based on association rule mining to determine whether production and test code co-evolve synchronously. Through two case studies, one with an open source and another one with an industrial software system, we show that our association rule mining approach allows one to assess the co-evolution of product and test code in a software project and, moreover, to uncover the distribution of programmer effort over pure coding, pure testing, or a more test-driven-like practice.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116098670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
On mining data across software repositories 关于跨软件存储库挖掘数据
Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069498
P. Anbalagan, M. Vouk
Software repositories provide abundance of valuable information about open source projects. With the increase in the size of the data maintained by the repositories, automated extraction of such data from individual repositories, as well as of linked information across repositories, has become a necessity. In this paper we describe a framework that uses web scraping to automatically mine repositories and link information across repositories. We discuss two implementations of the framework. In the first implementation, we automatically identify and collect security problem reports from project repositories that deploy the Bugzilla bug tracker using related vulnerability information from the National Vulnerability Database. In the second, we collect security problem reports for projects that deploy the Launchpad bug tracker along with related vulnerability information from the National Vulnerability Database. We have evaluated our tool on various releases of Fedora, Ubuntu, Suse, RedHat, and Firefox projects. The percentage of security bugs identified using our tool is consistent with that reported by other researchers.
软件存储库提供了大量关于开放源码项目的有价值的信息。随着存储库维护的数据规模的增加,从单个存储库中自动提取这些数据以及跨存储库链接的信息已经成为一种必要。在本文中,我们描述了一个使用web抓取来自动挖掘存储库和跨存储库链接信息的框架。我们将讨论该框架的两种实现。在第一个实现中,我们使用来自国家漏洞数据库的相关漏洞信息,从部署Bugzilla漏洞跟踪器的项目存储库中自动识别和收集安全问题报告。其次,我们收集部署Launchpad漏洞跟踪器的项目的安全问题报告以及来自国家漏洞数据库的相关漏洞信息。我们已经在不同版本的Fedora、Ubuntu、Suse、RedHat和Firefox项目上评估了我们的工具。使用我们的工具发现的安全漏洞的百分比与其他研究人员报告的一致。
{"title":"On mining data across software repositories","authors":"P. Anbalagan, M. Vouk","doi":"10.1109/MSR.2009.5069498","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069498","url":null,"abstract":"Software repositories provide abundance of valuable information about open source projects. With the increase in the size of the data maintained by the repositories, automated extraction of such data from individual repositories, as well as of linked information across repositories, has become a necessity. In this paper we describe a framework that uses web scraping to automatically mine repositories and link information across repositories. We discuss two implementations of the framework. In the first implementation, we automatically identify and collect security problem reports from project repositories that deploy the Bugzilla bug tracker using related vulnerability information from the National Vulnerability Database. In the second, we collect security problem reports for projects that deploy the Launchpad bug tracker along with related vulnerability information from the National Vulnerability Database. We have evaluated our tool on various releases of Fedora, Ubuntu, Suse, RedHat, and Firefox projects. The percentage of security bugs identified using our tool is consistent with that reported by other researchers.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129683825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
2009 6th IEEE International Working Conference on Mining Software Repositories
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1