2009 6th IEEE International Working Conference on Mining Software Repositories最新文献

英文中文

On the use of Internet Relay Chat (IRC) meetings by developers of the GNOME GTK+ project 关于GNOME GTK+项目开发人员使用Internet中继聊天(IRC)会议

2009 6th IEEE International Working Conference on Mining Software Repositories

Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069488

Emad Shihab, Z. Jiang, A. Hassan

Developers of open source projects are distributed across the world. They rely on email, mailing lists, instant messaging, IRC channels and more recently IRC meetings to communicate. Most of the studies thus far focus on the use of mailing lists by OSS developers, however, an increasing number of open source projects are using IRC meetings to hold developer meetings. In this paper, we mine the #gtk-devel IRC meeting channel and study the usage of the IRC meetings held by the GNOME GTK+ core developers and maintainers. We look at three different dimensions: the discussion volume of the meetings, the number of participants attending the meetings and the activity of these participants. Our findings show that IRC meetings are gaining popularity among open source developers and maintainers: the IRC meeting discussions are increasing in volume, have increasing attendance levels, and the participants actively contribute to the meetings. To the best of our knowledge, this is the first study on the use of developer IRC meetings by OSS developers.

开源项目的开发人员分布在世界各地。他们依靠电子邮件、邮件列表、即时消息、IRC频道以及最近的IRC会议进行沟通。到目前为止，大多数研究都集中在OSS开发人员对邮件列表的使用上，然而，越来越多的开源项目正在使用IRC会议来召开开发人员会议。在本文中，我们挖掘了# GTK -devel IRC会议通道，并研究了GNOME GTK+核心开发人员和维护者召开的IRC会议的使用情况。我们考察三个不同的维度:会议的讨论量、参加会议的人数以及这些参与者的活动。我们的研究结果表明，IRC会议在开源开发人员和维护者中越来越受欢迎:IRC会议讨论的数量在增加，出席人数在增加，参与者积极地为会议做出贡献。据我们所知，这是关于OSS开发人员使用开发人员IRC会议的第一个研究。

引用次数: 49

Evaluating the relation between coding standard violations and faultswithin and across software versions 评估软件版本内部和跨版本的编码标准违反和错误之间的关系

2009 6th IEEE International Working Conference on Mining Software Repositories

Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069479

C. Boogerd, L. Moonen

In spite of the widespread use of coding standards and tools enforcing their rules, there is little empirical evidence supporting the intuition that they prevent the introduction of faults in software. In previous work, we performed a pilot study to assess the relation between rule violations and actual faults, using the MISRA C 2004 standard on an industrial case. In this paper, we investigate three different aspects of the relation between violations and faults on a larger case study, and compare the results across the two projects. We find that 10 rules in the standard are significant predictors of fault location.

尽管编码标准和执行其规则的工具被广泛使用，但很少有经验证据支持这种直觉，即它们可以防止在软件中引入错误。在之前的工作中，我们在一个工业案例中使用MISRA C 2004标准进行了一项试点研究，以评估规则违规与实际故障之间的关系。在本文中，我们在一个更大的案例研究中调查了违规和故障之间关系的三个不同方面，并比较了两个项目的结果。我们发现标准中的10条规则对故障定位有重要的预测作用。

引用次数: 50

Amassing and indexing a large sample of version control systems: Towards the census of public source code history 收集和索引版本控制系统的大样本:对公共源代码历史的普查

2009 6th IEEE International Working Conference on Mining Software Repositories

Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069476

A. Mockus

The source code and its history represent the output and process of software development activities and are an invaluable resource for study and improvement of software development practice. While individual projects and groups of projects have been extensively analyzed, some fundamental questions, such as the spread of innovation or genealogy of the source code, can be answered only by considering the entire universe of publicly available source code and its history. We describe methods we developed over the last six years to gather, index, and update an approximation of such a universal repository for publicly accessible version control systems and for the source code inside a large corporation. While challenging, the task is achievable with limited resources. The bottlenecks in network bandwidth, processing, and disk access can be dealt with using inherent parallelism of the tasks and suitable tradeoffs between the amount of storage and computations, but a completely automated discovery of public version control systems may require enticing participation of the sampled projects. Such universal repository would allow studies of global properties and origins of the source code that are not possible through other means.

源代码及其历史代表了软件开发活动的输出和过程，是研究和改进软件开发实践的宝贵资源。虽然已经对单个项目和项目组进行了广泛的分析，但一些基本问题，例如创新的传播或源代码的谱系，只能通过考虑整个公开可用的源代码及其历史来回答。我们描述了我们在过去六年中开发的方法，用于收集、索引和更新这样一个通用存储库的近似值，用于公开访问的版本控制系统和大型公司内部的源代码。虽然具有挑战性，但在有限的资源下是可以完成的。网络带宽、处理和磁盘访问方面的瓶颈可以使用任务的固有并行性以及存储量和计算量之间的适当权衡来解决，但是完全自动化地发现公共版本控制系统可能需要吸引抽样项目的参与。这样的通用存储库将允许研究全局属性和源代码的起源，这是通过其他方式无法实现的。

{"title":"Amassing and indexing a large sample of version control systems: Towards the census of public source code history","authors":"A. Mockus","doi":"10.1109/MSR.2009.5069476","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069476","url":null,"abstract":"The source code and its history represent the output and process of software development activities and are an invaluable resource for study and improvement of software development practice. While individual projects and groups of projects have been extensively analyzed, some fundamental questions, such as the spread of innovation or genealogy of the source code, can be answered only by considering the entire universe of publicly available source code and its history. We describe methods we developed over the last six years to gather, index, and update an approximation of such a universal repository for publicly accessible version control systems and for the source code inside a large corporation. While challenging, the task is achievable with limited resources. The bottlenecks in network bandwidth, processing, and disk access can be dealt with using inherent parallelism of the tasks and suitable tradeoffs between the amount of storage and computations, but a completely automated discovery of public version control systems may require enticing participation of the sampled projects. Such universal repository would allow studies of global properties and origins of the source code that are not possible through other means.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129340379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 111

On the transfer of evolutionary couplings to industry 论进化耦合向工业的转移

2009 6th IEEE International Working Conference on Mining Software Repositories

Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069502

P. V. D. Laar

In this paper, we describe a case study at Philips Healthcare MRI focusing on evolutionary couplings, i.e., a technique to infer relationships among modules by analyzing their history of changes in the source code archive. In this case study, we failed to transfer CouplingViewer, a tool implementing the current state-of-art in evolutionary couplings, to industry. According to the industrial experts an important industrial requirement was not met: the signal-to-noise ratio was too low.

在本文中，我们描述了Philips Healthcare MRI的一个案例研究，重点关注进化耦合，即一种通过分析源代码存档中的模块变更历史来推断模块之间关系的技术。在本案例研究中，我们未能将CouplingViewer(一种实现进化耦合的当前技术状态的工具)转移到工业中。据业内专家表示，一个重要的工业要求没有得到满足:信噪比太低。

引用次数: 4

A platform for software engineering research 软件工程研究平台

2009 6th IEEE International Working Conference on Mining Software Repositories

Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069478

Georgios Gousios, D. Spinellis

Research in the fields of software quality, maintainability and evolution requires the analysis of large quantities of data, which often originate from open source software projects. Collecting and preprocessing data, calculating metrics, and synthesizing composite results from a large corpus of project artifacts is a tedious and error prone task lacking direct scientific value. The Alitheia Core tool is an extensible platform for software quality analysis that is designed specifically to facilitate software engineering research on large and diverse data sources, by integrating data collection and preprocessing phases with an array of analysis services, and presenting the researcher with an easy to use extension mechanism. Alitheia Core aims to be the basis of an ecosystem of shared tools and research data that will enable researchers to focus on their research questions at hand, rather than spend time on re-implementing analysis tools. In this paper, we present the Alitheia Core platform in detail and demonstrate its usefulness in mining software repositories by guiding the reader through the steps required to execute a simple experiment.

软件质量、可维护性和演进领域的研究需要分析大量的数据，这些数据通常来源于开源软件项目。从大量的项目工件中收集和预处理数据、计算度量和合成合成结果是一项乏味且容易出错的任务，缺乏直接的科学价值。Alitheia Core工具是一个可扩展的软件质量分析平台，通过将数据收集和预处理阶段与一系列分析服务集成在一起，为研究人员提供了一个易于使用的扩展机制，专门为大型和多样化数据源的软件工程研究而设计。Alitheia Core旨在成为共享工具和研究数据生态系统的基础，使研究人员能够专注于手头的研究问题，而不是花时间重新实现分析工具。在本文中，我们详细介绍了Alitheia Core平台，并通过指导读者执行一个简单实验所需的步骤，展示了它在挖掘软件存储库中的实用性。

{"title":"A platform for software engineering research","authors":"Georgios Gousios, D. Spinellis","doi":"10.1109/MSR.2009.5069478","DOIUrl":"https://doi.org/10.1109/MSR.2009.5069478","url":null,"abstract":"Research in the fields of software quality, maintainability and evolution requires the analysis of large quantities of data, which often originate from open source software projects. Collecting and preprocessing data, calculating metrics, and synthesizing composite results from a large corpus of project artifacts is a tedious and error prone task lacking direct scientific value. The Alitheia Core tool is an extensible platform for software quality analysis that is designed specifically to facilitate software engineering research on large and diverse data sources, by integrating data collection and preprocessing phases with an array of analysis services, and presenting the researcher with an easy to use extension mechanism. Alitheia Core aims to be the basis of an ecosystem of shared tools and research data that will enable researchers to focus on their research questions at hand, rather than spend time on re-implementing analysis tools. In this paper, we present the Alitheia Core platform in detail and demonstrate its usefulness in mining software repositories by guiding the reader through the steps required to execute a simple experiment.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133735908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Author entropy vs. file size in the gnome suite of applications 作者熵与gnome应用程序套件中的文件大小

2009 6th IEEE International Working Conference on Mining Software Repositories

Pub Date : 2009-05-16 DOI: 10.1109/MSR.2009.5069484

Jason R. Casebolt, Jonathan L. Krein, Alexander C. MacLean, C. Knutson, Daniel P. Delorey

We present the results of a study in which author entropy was used to characterize author contributions per file. Our analysis reveals three patterns: banding in the data, uneven distribution of data across bands, and file size dependent distributions within bands. Our results suggest that when two authors contribute to a file, large files are more likely to have a dominant author than smaller files.

我们提出了一项研究的结果，其中作者熵被用来表征每个文件的作者贡献。我们的分析揭示了三种模式:数据中的带状，数据跨带的不均匀分布以及带内文件大小依赖的分布。我们的研究结果表明，当两个作者对一个文件做出贡献时，大文件比小文件更有可能有一个主导作者。

引用次数: 19

Visualizing Gnome with the Small Project Observatory 用小项目天文台可视化地精

2009 6th IEEE International Working Conference on Mining Software Repositories

Pub Date : 2009-05-01 DOI: 10.1109/MSR.2009.5069487

M. Lungu, Jacopo Malnati, Michele Lanza

We analyzed the Gnome family of systems with the Small Project Observatory, our online ecosystem visualization platform. We begin by briefly introducing the model of SPO. We then observe and discuss several phases in the activity of the Gnome ecosystem. We follow and look at how the contributors are distributed between writing source code and doing other activities such as internationalization. We end with a visual overview of the activity of more than 900 contributors in the 10 years of existence of Gnome.

我们用我们的在线生态系统可视化平台Small Project Observatory分析了Gnome系列系统。我们首先简要介绍SPO模型。然后我们观察和讨论Gnome生态系统活动的几个阶段。我们跟踪并查看贡献者是如何在编写源代码和执行其他活动(如国际化)之间分布的。我们以Gnome 10年来900多名贡献者的活动的可视化概述作为结束。

引用次数: 30

The promises and perils of mining git 采矿的希望和危险都消失了

2009 6th IEEE International Working Conference on Mining Software Repositories

Pub Date : 1900-01-01 DOI: 10.1109/msr.2009.5069475

C. Bird, Peter C. Rigby, Earl T. Barr, David J. Hamilton, D. Germán, Premkumar T. Devanbu

We are now witnessing the rapid growth of decentralized source code management (DSCM) systems, in which every developer has her own repository. DSCMs facilitate a style of collaboration in which work output can flow sideways (and privately) between collaborators, rather than always up and down (and publicly) via a central repository. Decentralization comes with both the promise of new data and the peril of its misinterpretation. We focus on git, a very popular DSCM used in high-profile projects. Decentralization, and other features of git, such as automatically recorded contributor attribution, lead to richer content histories, giving rise to new questions such as “How do contributions flow between developers to the official project repository?” However, there are pitfalls. Commits may be reordered, deleted, or edited as they move between repositories. The semantics of terms common to SCMs and DSCMs sometimes differ markedly, potentially creating confusion. For example, a commit is immediately visible to all developers in centralized SCMs, but not in DSCMs. Our goal is to help researchers interested in DSCMs avoid these and other perils when mining and analyzing git data.

我们现在正在见证分散式源代码管理(DSCM)系统的快速增长，其中每个开发人员都有自己的存储库。dscm促进了一种协作风格，其中工作输出可以在协作者之间横向(并且私下)流动，而不是总是通过中央存储库上下(并且公开)流动。去中心化既带来了新数据的希望，也带来了误解的危险。我们专注于git，这是一个非常流行的DSCM，用于高知名度的项目。去中心化和git的其他特性(如自动记录贡献者归属)导致了更丰富的内容历史，从而引发了新的问题，如“开发者之间的贡献如何流向官方项目存储库?”然而，也有陷阱。当提交在存储库之间移动时，它们可能被重新排序、删除或编辑。scm和dscm中常见的术语的语义有时明显不同，这可能会造成混淆。例如，在集中式scm中，提交对所有开发人员都是立即可见的，但在dscm中则不是。我们的目标是帮助对dscm感兴趣的研究人员在挖掘和分析git数据时避免这些和其他危险。

{"title":"The promises and perils of mining git","authors":"C. Bird, Peter C. Rigby, Earl T. Barr, David J. Hamilton, D. Germán, Premkumar T. Devanbu","doi":"10.1109/msr.2009.5069475","DOIUrl":"https://doi.org/10.1109/msr.2009.5069475","url":null,"abstract":"We are now witnessing the rapid growth of decentralized source code management (DSCM) systems, in which every developer has her own repository. DSCMs facilitate a style of collaboration in which work output can flow sideways (and privately) between collaborators, rather than always up and down (and publicly) via a central repository. Decentralization comes with both the promise of new data and the peril of its misinterpretation. We focus on git, a very popular DSCM used in high-profile projects. Decentralization, and other features of git, such as automatically recorded contributor attribution, lead to richer content histories, giving rise to new questions such as “How do contributions flow between developers to the official project repository?” However, there are pitfalls. Commits may be reordered, deleted, or edited as they move between repositories. The semantics of terms common to SCMs and DSCMs sometimes differ markedly, potentially creating confusion. For example, a commit is immediately visible to all developers in centralized SCMs, but not in DSCMs. Our goal is to help researchers interested in DSCMs avoid these and other perils when mining and analyzing git data.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130677734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 301

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2009 6th IEEE International Working Conference on Mining Software Repositories

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀