Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)最新文献

英文中文

Mining Workspace Updates in CVS 在CVS中挖掘工作区更新

Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)

Pub Date : 2007-05-20 DOI: 10.1109/MSR.2007.22

Thomas Zimmermann

The version control archive CVS records not only all changes in a project but also activity data such as when developers create or update their workspaces. Furthermore, CVS records when it has to integrate changes because of parallel development. In this paper, we analyze the CVS activity data of four large open-source projects CCC, JBOSS, JEDIT, and PYTHON to investigate parallel development: What is the degree of parallel development? How frequently do conflicts occur during updates and how are they resolved? How do we identify changes that contain integrations?

版本控制存档CVS不仅记录项目中的所有更改，还记录活动数据，例如开发人员何时创建或更新其工作区。此外，CVS会记录由于并行开发而必须集成更改的时间。本文通过分析四个大型开源项目CCC、JBOSS、JEDIT和PYTHON的CVS活动数据来研究并行开发:并行开发的程度如何?更新过程中冲突发生的频率如何?如何解决?我们如何识别包含集成的变更?

引用次数: 56

Visual Data Mining in Software Archives to Detect How Developers Work Together 软件档案中的可视化数据挖掘以检测开发人员如何协同工作

Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)

Pub Date : 2007-05-20 DOI: 10.1109/MSR.2007.34

P. Weißgerber, M. Pohl, Michael Burch

Analyzing the check-in information of open source software projects which use a version control system such as CVS or SUBVERSION can yield interesting and important insights into the programming behavior of developers. As in every major project tasks are assigned to many developers, the development must be coordinated between these programmers. This paper describes three visualization techniques that help to examine how programmers work together, e.g. if they work as a team or if they develop their part of the software separate from each other. Furthermore, phases of stagnation in the lifetime of a project can be uncovered and thus, possible problems are revealed. To demonstrate the usefulness of these visualization techniques we performed case studies on two open source projects. In these studies interesting patterns of developers' behavior, e.g. the specialization on a certain module can be observed. Moreover, modules that have been changed by many developers can be identified as well as such ones that have been altered by only one programmer.

分析使用版本控制系统(如CVS或SUBVERSION)的开源软件项目的签入信息，可以对开发人员的编程行为产生有趣而重要的见解。在每个主要项目中，任务被分配给许多开发人员，开发必须在这些程序员之间进行协调。本文描述了三种可视化技术，它们有助于检查程序员是如何一起工作的，例如，如果他们作为一个团队工作，或者如果他们开发自己的软件部分是彼此分开的。此外，可以发现项目生命周期中的停滞阶段，从而揭示可能出现的问题。为了演示这些可视化技术的有用性，我们对两个开放源码项目进行了案例研究。在这些研究中，可以观察到开发人员行为的有趣模式，例如对某个模块的专门化。此外，可以识别被许多开发人员更改过的模块，也可以识别只被一个程序员更改过的模块。

引用次数: 36

Mining CVS Repositories to Understand Open-Source Project Developer Roles 挖掘CVS存储库以理解开源项目开发人员角色

Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)

Pub Date : 2007-05-20 DOI: 10.1109/MSR.2007.19

Liguo Yu, S. Ramaswamy

This paper presents a model to represent the interactions of distributed open-source software developers and utilizes data mining techniques to derive developer roles. The model is then applied on case studies of two open-source projects, ORAC-DR and Mediawiki with encouraging results.

本文提出了一个模型来表示分布式开源软件开发人员之间的交互，并利用数据挖掘技术派生开发人员角色。然后将该模型应用于两个开源项目(ORAC-DR和Mediawiki)的案例研究，结果令人鼓舞。

引用次数: 58

Defect Data Analysis Based on Extended Association Rule Mining 基于扩展关联规则挖掘的缺陷数据分析

Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)

Pub Date : 2007-05-20 DOI: 10.1109/MSR.2007.5

Shuji Morisaki, Akito Monden, Tomoko Matsumura, Haruaki Tamada, Ken-ichi Matsumoto

This paper describes an empirical study to reveal rules associated with defect correction effort. We defined defect correction effort as a quantitative (ratio scale) variable, and extended conventional (nominal scale based) association rule mining to directly handle such quantitative variables. An extended rule describes the statistical characteristic of a ratio or interval scale variable in the consequent part of the rule by its mean value and standard deviation so that conditions producing distinctive statistics can be discovered As an analysis target, we collected various attributes of about 1,200 defects found in a typical medium-scale, multi-vendor (distance development) information system development project in Japan. Our findings based on extracted rules include: (l)Defects detected in coding/unit testing were easily corrected (less than 7% of mean effort) when they are related to data output or validation of input data. (2)Nevertheless, they sometimes required much more effort (lift of standard deviation was 5.845) in case of low reproducibility, (i)Defects introduced in coding/unit testing often required large correction effort (mean was 12.596 staff-hours and standard deviation was 25.716) when they were related to data handing. From these findings, we confirmed that we need to pay attention to types of defects having large mean effort as well as those having large standard deviation of effort since such defects sometimes cause excess effort.

本文描述了一项实证研究，以揭示与缺陷纠正工作相关的规则。我们将缺陷纠正工作定义为一个定量的(比例尺度)变量，并扩展了传统的(基于名义尺度的)关联规则挖掘来直接处理这些定量变量。扩展规则通过其平均值和标准偏差来描述规则后续部分中比率或间隔尺度变量的统计特征，从而可以发现产生独特统计的条件。作为分析目标，我们收集了在日本一个典型的中等规模、多供应商(远程开发)信息系统开发项目中发现的大约1200个缺陷的各种属性。我们基于提取规则的发现包括:(1)在编码/单元测试中检测到的缺陷，当它们与数据输出或输入数据的验证相关时，很容易纠正(不到平均工作量的7%)。(2)然而，在低再现性的情况下，它们有时需要更多的努力(标准偏差的提升率为5.845);(i)在编码/单元测试中引入的缺陷，当它们与数据处理相关时，通常需要大量的纠正工作(平均值为12.596员工小时，标准偏差为25.716)。从这些发现中，我们确认我们需要注意具有较大平均工作量的缺陷类型以及具有较大工作标准偏差的缺陷类型，因为这些缺陷有时会导致过多的工作。

{"title":"Defect Data Analysis Based on Extended Association Rule Mining","authors":"Shuji Morisaki, Akito Monden, Tomoko Matsumura, Haruaki Tamada, Ken-ichi Matsumoto","doi":"10.1109/MSR.2007.5","DOIUrl":"https://doi.org/10.1109/MSR.2007.5","url":null,"abstract":"This paper describes an empirical study to reveal rules associated with defect correction effort. We defined defect correction effort as a quantitative (ratio scale) variable, and extended conventional (nominal scale based) association rule mining to directly handle such quantitative variables. An extended rule describes the statistical characteristic of a ratio or interval scale variable in the consequent part of the rule by its mean value and standard deviation so that conditions producing distinctive statistics can be discovered As an analysis target, we collected various attributes of about 1,200 defects found in a typical medium-scale, multi-vendor (distance development) information system development project in Japan. Our findings based on extracted rules include: (l)Defects detected in coding/unit testing were easily corrected (less than 7% of mean effort) when they are related to data output or validation of input data. (2)Nevertheless, they sometimes required much more effort (lift of standard deviation was 5.845) in case of low reproducibility, (i)Defects introduced in coding/unit testing often required large correction effort (mean was 12.596 staff-hours and standard deviation was 25.716) when they were related to data handing. From these findings, we confirmed that we need to pay attention to types of defects having large mean effort as well as those having large standard deviation of effort since such defects sometimes cause excess effort.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"22 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120836620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

What Can OSS Mailing Lists Tell Us? A Preliminary Psychometric Text Analysis of the Apache Developer Mailing List OSS邮件列表能告诉我们什么?Apache开发者邮件列表的初步心理测量文本分析

Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)

Pub Date : 2007-05-20 DOI: 10.1109/MSR.2007.35

Peter C. Rigby, A. Hassan

Developer mailing lists are a rich source of information about Open Source Software (OSS) development. The unstructured nature of email makes extracting information difficult. We use a psychometrically-based linguistic analysis tool, the LIWC, to examine the Apache httpd server developer mailing list. We conduct three preliminary experiments to assess the appropriateness of this tool for information extraction from mailing lists. First, using LIWC dimensions that are correlated with the big five personality traits, we assess the personality of four top developers against a baseline for the entire mailing list. The two developers that were responsible for the major Apache releases had similar personalities. Their personalities were different from the baseline and the other developers. Second, the first and last 50 emails for two top developers who have left the project are examined. The analysis shows promise in understanding why developers join and leave a project. Third, we examine word usage on the mailing list for two major Apache releases. The differences may reflect the relative success of each release.

开发人员邮件列表是关于开源软件(OSS)开发的丰富信息来源。电子邮件的非结构化性质使得提取信息变得困难。我们使用基于心理测量学的语言分析工具LIWC来检查Apache httpd服务器开发人员邮件列表。我们进行了三个初步实验，以评估该工具从邮件列表中提取信息的适当性。首先，使用与五大人格特征相关的LIWC维度，我们根据整个邮件列表的基线评估了四位顶级开发人员的个性。负责Apache主要版本的两个开发人员有着相似的个性。他们的个性与基线和其他开发者不同。其次，检查两位顶级开发人员离开项目的前50封和后50封电子邮件。分析显示了理解开发人员为什么加入和离开项目的希望。第三，我们检查两个主要Apache版本的邮件列表中的单词用法。这些差异可能反映了每个版本的相对成功。

{"title":"What Can OSS Mailing Lists Tell Us? A Preliminary Psychometric Text Analysis of the Apache Developer Mailing List","authors":"Peter C. Rigby, A. Hassan","doi":"10.1109/MSR.2007.35","DOIUrl":"https://doi.org/10.1109/MSR.2007.35","url":null,"abstract":"Developer mailing lists are a rich source of information about Open Source Software (OSS) development. The unstructured nature of email makes extracting information difficult. We use a psychometrically-based linguistic analysis tool, the LIWC, to examine the Apache httpd server developer mailing list. We conduct three preliminary experiments to assess the appropriateness of this tool for information extraction from mailing lists. First, using LIWC dimensions that are correlated with the big five personality traits, we assess the personality of four top developers against a baseline for the entire mailing list. The two developers that were responsible for the major Apache releases had similar personalities. Their personalities were different from the baseline and the other developers. Second, the first and last 50 emails for two top developers who have left the project are examined. The analysis shows promise in understanding why developers join and leave a project. Third, we examine word usage on the mailing list for two major Apache releases. The differences may reflect the relative success of each release.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126370247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 122

Local and Global Recency Weighting Approach to Bug Prediction Bug预测的局部和全局近因加权方法

Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)

Pub Date : 2007-05-20 DOI: 10.1109/MSR.2007.17

Hemant Joshi, Chuanlei Zhang, S. Ramaswamy, Coskun Bayrak

Finding and fixing software bugs is a challenging maintenance task, and a significant amount of effort is invested by software development companies on this issue. In this paper, we use the Eclipse project's recorded software bug history to predict occurrence of future bugs. The history contains information on when bugs have been reported and subsequently fixed.

查找和修复软件错误是一项具有挑战性的维护任务，软件开发公司在这个问题上投入了大量的精力。在本文中，我们使用Eclipse项目记录的软件错误历史来预测未来错误的发生。历史记录包含有关何时报告错误并随后修复错误的信息。

引用次数: 15

Prioritizing Warning Categories by Analyzing Software History 通过分析软件历史来确定警告类别的优先级

Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)

Pub Date : 2007-05-20 DOI: 10.1109/MSR.2007.26

Sunghun Kim, Michael D. Ernst

Automatic bug finding tools tend to have high false positive rates: most warnings do not indicate real bugs. Usually bug finding tools prioritize each warning category. For example, the priority of "overflow " is 1 and the priority of "jumbled incremental" is 3, but the tools 'prioritization is not very effective. In this paper, we prioritize warning categories by analyzing the software change history. The underlying intuition is that if warnings from a category are resolved quickly by developers, the warnings in the category are important. Experiments with three bug finding tools (FindBugs, JLint, and PMD) and two open source projects (Columba and jEdit) indicate that different warning categories have very different lifetimes. Based on that observation, we propose a preliminary algorithm for warning category prioritizing.

自动错误查找工具往往有很高的误报率:大多数警告并不表明真正的错误。通常，bug查找工具会优先考虑每个警告类别。例如，“溢出”的优先级为1，“混乱增量”的优先级为3，但工具的优先级不是很有效。本文通过对软件变更历史的分析，对预警类别进行了排序。潜在的直觉是，如果来自某个类别的警告可以被开发人员快速解决，那么该类别中的警告就很重要。对三个bug查找工具(FindBugs、JLint和PMD)和两个开源项目(Columba和jEdit)的实验表明，不同的警告类别具有非常不同的生存期。在此基础上，提出了一种预警分类优先排序的初步算法。

引用次数: 87

Analysis of the Linux Kernel Evolution Using Code Clone Coverage 用代码克隆覆盖率分析Linux内核演化

Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)

Pub Date : 2007-05-20 DOI: 10.1109/MSR.2007.1

Simone Livieri, Yoshiki Higo, M. Matsushita, Katsuro Inoue

Most studies of the evolution of software systems are based on the comparison of simple software metrics. In this paper, we present our preliminary investigation of the evolution of the Linux kernel using code-clone analysis and the code-clone coverage metrics. We examined 136 versions of the stable Linux kernel using a distributed extension of the code clone detection tool CCFinder. The result is shown as a heat map.

大多数关于软件系统演化的研究都是基于简单软件度量的比较。在本文中，我们使用代码克隆分析和代码克隆覆盖率指标对Linux内核的演变进行了初步调查。我们使用代码克隆检测工具CCFinder的分布式扩展检查了136个版本的稳定Linux内核。结果显示为热图。

引用次数: 42

How Long Will It Take to Fix This Bug? 修复这个Bug需要多长时间?

Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)

Pub Date : 2007-05-20 DOI: 10.1109/MSR.2007.13

Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, A. Zeller

Predicting the time and effort for a software problem has long been a difficult task. We present an approach that automatically predicts the fixing effort, i.e., the person-hours spent on fixing an issue. Our technique leverages existing issue tracking systems: given a new issue report, we use the Lucene framework to search for similar, earlier reports and use their average time as a prediction. Our approach thus allows for early effort estimation, helping in assigning issues and scheduling stable releases. We evaluated our approach using effort data from the JBoss project. Given a sufficient number of issues reports, our automatic predictions are close to the actual effort; for issues that are bugs, we are off by only one hour, beating naive predictions by a factor of four.

长期以来，预测软件问题的时间和精力一直是一项艰巨的任务。我们提出了一种自动预测修复工作的方法，即修复问题所花费的人-小时。我们的技术利用了现有的问题跟踪系统:给定一个新的问题报告，我们使用Lucene框架来搜索类似的、早期的报告，并使用它们的平均时间作为预测。因此，我们的方法允许早期的工作量评估，帮助分配问题和调度稳定的发布。我们使用来自JBoss项目的工作数据来评估我们的方法。给定足够数量的问题报告，我们的自动预测接近实际工作;对于bug问题，我们只差了一个小时，比天真的预测差了四倍。

引用次数: 392

Predicting Eclipse Bug Lifetimes 预测Eclipse Bug的生命周期

Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)

Pub Date : 2007-05-20 DOI: 10.1109/MSR.2007.25

Lucas D. Panjer

In non-trivial software development projects planning and allocation of resources is an important and difficult task. Estimation of work time to fix a bug is commonly used to support this process. This research explores the viability of using data mining tools to predict the time to fix a bug given only the basic information known at the beginning of a bug's lifetime. To address this question, a historical portion of the Eclipse Bugzilla database is used for modeling and predicting bug lifetimes. A bug history transformation process is described and several data mining models are built and tested. Interesting behaviours derived from the models are documented. The models can correctly predict up to 34.9% of the bugs into a discretized log scaled lifetime class.

在重要的软件开发项目中，资源的规划和分配是一项重要而困难的任务。修复错误的工作时间估计通常用于支持此过程。本研究探讨了使用数据挖掘工具预测修复错误时间的可行性，仅给出了在错误生命周期开始时已知的基本信息。为了解决这个问题，使用Eclipse Bugzilla数据库的历史部分来建模和预测bug的生命周期。描述了错误历史转换过程，建立了几个数据挖掘模型并进行了测试。从模型中得到的有趣行为被记录下来。该模型可以将34.9%的错误正确预测为离散对数缩放的生命周期类。

引用次数: 172

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀