2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)最新文献

英文中文

Oops! where did that code snippet come from? 哦!这个代码片段是从哪里来的?

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597094

Lisong Guo, J. Lawall, Gilles Muller

A kernel oops is an error report that logs the status of the Linux kernel at the time of a crash. Such a report can provide valuable first-hand information for a Linux kernel maintainer to conduct postmortem debugging. Recently, a repository has been created that systematically collects kernel oopses from Linux users. However, debugging based on only the information in a kernel oops is difficult. We consider the initial problem of finding the offending line, i.e., the line of source code that incurs the crash. For this, we propose a novel algorithm based on approximate sequence matching, as used in bioinformatics, to automatically pinpoint the offending line based on information about nearby machine-code instructions, as found in a kernel oops. Our algorithm achieves 92% accuracy compared to 26% for the traditional approach of using only the oops instruction pointer.

内核oops是一个错误报告，它记录了Linux内核在崩溃时的状态。这样的报告可以为Linux内核维护人员进行事后调试提供有价值的第一手信息。最近，已经创建了一个存储库，用于系统地收集Linux用户的内核错误。然而，仅基于内核oop中的信息进行调试是很困难的。我们考虑的最初问题是找到有问题的行，即导致崩溃的源代码行。为此，我们提出了一种基于近似序列匹配的新算法，如生物信息学中使用的那样，根据有关附近机器代码指令的信息自动查明违规行，如在内核oops中发现的。与仅使用oops指令指针的传统方法相比，我们的算法达到了92%的准确率。

引用次数: 6

OpenHub: a scalable architecture for the analysis of software quality attributes OpenHub:用于分析软件质量属性的可扩展架构

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597135

G. Farah, Juan Sebastian Tejada, D. Correal

There is currently a vast array of open source projects available on the web, and although they are searchable by name or description in the search engines, there is no way to search for projects by how well they perform on a given set of quality attributes (e.g. usability or maintainability). With OpenHub, we present a scalable and extensible architecture for the static and runtime analysis of open source repositories written in Python, presenting the architecture and pinpointing future possibilities with it.

目前在网络上有大量的开源项目，尽管它们在搜索引擎中可以通过名称或描述进行搜索，但是没有办法通过它们在给定的质量属性集(例如可用性或可维护性)上的表现来搜索项目。通过OpenHub，我们为用Python编写的开源存储库的静态和运行时分析提供了一个可伸缩和可扩展的架构，展示了这个架构并指出了它未来的可能性。

引用次数: 17

Is mining software repositories data science? (keynote) 挖掘软件存储库是数据科学吗?(主题)

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2600728

A. Mockus

Trick question: what is Data Science? The collection and use of low-veracity data in software repositories and other operational support systems is exploding. It is, therefore, imperative to elucidate basic principles of how such data comes into being and what it means. Are there practices of constructing software data analysis tools that could raise the integrity of their results despite the problematic nature of the underlying data? The talk explores the basic nature of data in operational support systems and considers approaches to develop engineering practices for software mining tools.

难题:什么是数据科学?在软件存储库和其他操作支持系统中收集和使用低准确性数据的情况呈爆炸式增长。因此，必须阐明这些数据如何产生及其含义的基本原则。是否存在构建软件数据分析工具的实践，可以提高其结果的完整性，尽管底层数据有问题的本质?讲座探讨了操作支持系统中数据的基本性质，并考虑了为软件挖掘工具开发工程实践的方法。

引用次数: 5

Do developers discuss design? 开发人员会讨论设计吗?

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597115

João Brunet, G. Murphy, Ricardo Terra, J. Figueiredo, D. Guerrero

Design is often raised in the literature as important to attaining various properties and characteristics in a software system. At least for open-source projects, it can be hard to find evidence of ongoing design work in the technical artifacts produced as part of the development. Although developers usually do not produce specific design documents, they do communicate about design in different ways. In this paper, we provide quantitative evidence that developers address design through discussions in commits, issues, and pull requests. To achieve this, we built a discussions' classifier and automatically labeled 102,122 discussions from 77 projects. Based on this data, we make four observations about the projects: i) on average, 25% of the discussions in a project are about design; ii) on average, 26% of developers contribute to at least one design discussion; iii) only 1% of the developers contribute to more than 15% of the discussions in a project; and iv) these few developers who contribute to a broad range of design discussions are also the top committers in a project.

在文献中，设计经常被认为是获得软件系统中各种属性和特征的重要因素。至少对于开源项目，很难在作为开发一部分的技术工件中找到正在进行的设计工作的证据。虽然开发人员通常不制作具体的设计文档，但他们确实以不同的方式交流设计。在本文中，我们提供了量化的证据，证明开发人员通过讨论提交、问题和拉取请求来解决设计问题。为了实现这一点，我们构建了一个讨论分类器，并自动标记了来自77个项目的102,122个讨论。基于这些数据，我们对项目进行了四个观察:i)平均而言，项目中25%的讨论是关于设计的;Ii)平均而言，26%的开发者参与了至少一次设计讨论;Iii)在一个项目中，只有1%的开发者参与了超过15%的讨论;iv)这些参与设计讨论的少数开发人员也是项目中的顶级提交者。

引用次数: 49

Works for me! characterizing non-reproducible bug reports 对我有用!描述不可重现的bug报告

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597098

Mona Erfani Joorabchi, Mehdi MirzaAghaei, A. Mesbah

Bug repository systems have become an integral component of software development activities. Ideally, each bug report should help developers to find and fix a software fault. However, there is a subset of reported bugs that is not (easily) reproducible, on which developers spend considerable amounts of time and effort. We present an empirical analysis of non-reproducible bug reports to characterize their rate, nature, and root causes. We mine one industrial and five open-source bug repositories, resulting in 32K non-reproducible bug reports. We (1) compare properties of non-reproducible reports with their counterparts such as active time and number of authors, (2) investigate their life-cycle patterns, and (3) examine 120 Fixed non-reproducible reports. In addition, we qualitatively classify a set of randomly selected non-reproducible bug reports (1,643) into six common categories. Our results show that, on average, non-reproducible bug reports pertain to 17% of all bug reports, remain active three months longer than their counterparts, can be mainly (45%) classified as "Interbug Dependencies'', and 66% of Fixed non-reproducible reports were indeed reproduced and fixed.

Bug存储库系统已经成为软件开发活动的一个组成部分。理想情况下，每个bug报告都应该帮助开发人员找到并修复软件错误。然而，有一部分报告的bug是不容易重现的，开发人员在这些bug上花费了大量的时间和精力。我们对不可重现的bug报告进行了实证分析，以描述它们的比率、性质和根本原因。我们挖掘了一个工业bug库和五个开源bug库，产生了32K的不可复制的bug报告。我们(1)比较了不可重复报告的属性，如活跃时间和作者数量，(2)调查了它们的生命周期模式，(3)检查了120个固定的不可重复报告。此外，我们将随机选择的一组不可重复的bug报告(1,643)定性地分为六个常见类别。我们的结果表明，平均而言，不可重现的bug报告占所有bug报告的17%，比对应的报告保持活跃的时间长三个月，可以主要(45%)归类为“bug间依赖”，并且66%的固定不可重现报告确实被重现和修复了。

{"title":"Works for me! characterizing non-reproducible bug reports","authors":"Mona Erfani Joorabchi, Mehdi MirzaAghaei, A. Mesbah","doi":"10.1145/2597073.2597098","DOIUrl":"https://doi.org/10.1145/2597073.2597098","url":null,"abstract":"Bug repository systems have become an integral component of software development activities. Ideally, each bug report should help developers to find and fix a software fault. However, there is a subset of reported bugs that is not (easily) reproducible, on which developers spend considerable amounts of time and effort. We present an empirical analysis of non-reproducible bug reports to characterize their rate, nature, and root causes. We mine one industrial and five open-source bug repositories, resulting in 32K non-reproducible bug reports. We (1) compare properties of non-reproducible reports with their counterparts such as active time and number of authors, (2) investigate their life-cycle patterns, and (3) examine 120 Fixed non-reproducible reports. In addition, we qualitatively classify a set of randomly selected non-reproducible bug reports (1,643) into six common categories. Our results show that, on average, non-reproducible bug reports pertain to 17% of all bug reports, remain active three months longer than their counterparts, can be mainly (45%) classified as \"Interbug Dependencies'', and 66% of Fixed non-reproducible reports were indeed reproduced and fixed.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"69 1","pages":"62-71"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85575209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 79

MUX: algorithm selection for software model checkers MUX:软件模型检查器算法选择

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597080

Varun Tulsian, Aditya Kanade, Rahul Kumar, A. Lal, A. Nori

With the growing complexity of modern day software, software model checking has become a critical technology for ensuring correctness of software. As is true with any promising technology, there are a number of tools for software model checking. However, their respective performance trade-offs are difficult to characterize accurately – making it difficult for practitioners to select a suitable tool for the task at hand. This paper proposes a technique called MUX that addresses the problem of selecting the most suitable software model checker for a given input instance. MUX performs machine learning on a repository of software verification instances. The algorithm selector, synthesized through machine learning, uses structural features from an input instance, comprising a program-property pair, at runtime and determines which tool to use. We have implemented MUX for Windows device drivers and evaluated it on a number of drivers and model checkers. Our results are promising in that the algorithm selector not only avoids a significant number of timeouts but also improves the total runtime by a large margin, compared to any individual model checker. It also outperforms a portfolio-based algorithm selector being used in Microsoft at present. Besides, MUX identifies structural features of programs that are key factors in determining performance of model checkers.

随着现代软件的日益复杂，软件模型检查已成为保证软件正确性的一项关键技术。与任何有前途的技术一样，有许多用于软件模型检查的工具。然而，它们各自的性能权衡很难准确地表征——这使得从业者很难为手头的任务选择合适的工具。本文提出了一种称为MUX的技术，它解决了为给定输入实例选择最合适的软件模型检查器的问题。MUX在软件验证实例的存储库上执行机器学习。算法选择器通过机器学习合成，在运行时使用输入实例的结构特征，包括程序-属性对，并确定使用哪个工具。我们已经为Windows设备驱动程序实现了MUX，并在许多驱动程序和模型检查器上对其进行了评估。我们的结果很有希望，因为算法选择器不仅避免了大量的超时，而且与任何单独的模型检查器相比，还大大提高了总运行时间。它也优于微软目前使用的基于投资组合的算法选择器。此外，MUX识别程序的结构特征，这些结构特征是决定模型检查器性能的关键因素。

{"title":"MUX: algorithm selection for software model checkers","authors":"Varun Tulsian, Aditya Kanade, Rahul Kumar, A. Lal, A. Nori","doi":"10.1145/2597073.2597080","DOIUrl":"https://doi.org/10.1145/2597073.2597080","url":null,"abstract":"With the growing complexity of modern day software, software model checking has become a critical technology for ensuring correctness of software. As is true with any promising technology, there are a number of tools for software model checking. However, their respective performance trade-offs are difficult to characterize accurately – making it difficult for practitioners to select a suitable tool for the task at hand. This paper proposes a technique called MUX that addresses the problem of selecting the most suitable software model checker for a given input instance. MUX performs machine learning on a repository of software verification instances. The algorithm selector, synthesized through machine learning, uses structural features from an input instance, comprising a program-property pair, at runtime and determines which tool to use. \u0000 We have implemented MUX for Windows device drivers and evaluated it on a number of drivers and model checkers. Our results are promising in that the algorithm selector not only avoids a significant number of timeouts but also improves the total runtime by a large margin, compared to any individual model checker. It also outperforms a portfolio-based algorithm selector being used in Microsoft at present. Besides, MUX identifies structural features of programs that are key factors in determining performance of model checkers.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"27 1","pages":"132-141"},"PeriodicalIF":0.0,"publicationDate":"2014-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72768358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Collaboration in open-source projects: myth or reality? 开源项目中的合作:神话还是现实?

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597093

Y. Tymchuk, Andrea Mocci, Michele Lanza

One of the fundamental principles of open-source projects is that they foster collaboration among developers, disregarding their geographical location or personal background. When it comes to software repositories collaboration is a rather ephemeral phenomenon which lacks a clear definition, and it must therefore be mined and modeled. This throws up the question whether what is mined actually maps to reality. In this paper we investigate collaboration by modeling it using a number of diverse approaches that we then compare to a ground truth obtained by surveying a substantial set of developers of the Pharo open-source community. Our findings indicate that the notion of collaboration must be revisited, as it is undermined by a number of factors that are often tackled in imprecise ways or not taken into account at all.

开源项目的基本原则之一是促进开发人员之间的协作，而不考虑他们的地理位置或个人背景。当谈到软件存储库时，协作是一个相当短暂的现象，缺乏明确的定义，因此必须对其进行挖掘和建模。这就提出了一个问题，即挖掘出来的东西是否真的与现实相符。在本文中，我们通过使用多种不同的方法对协作进行建模，然后将其与通过调查Pharo开源社区的大量开发人员获得的基本事实进行比较。我们的研究结果表明，必须重新审视合作的概念，因为它受到许多因素的破坏，这些因素通常以不精确的方式处理或根本没有考虑到。

引用次数: 18

A dataset for pull-based development research 基于拉动的发展研究数据集

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597122

Georgios Gousios, A. Zaidman

Pull requests form a new method for collaborating in distributed software development. To study the pull request distributed development model, we constructed a dataset of almost 900 projects and 350,000 pull requests, including some of the largest users of pull requests on Github. In this paper, we describe how the project selection was done, we analyze the selected features and present a machine learning tool set for the R statistics environment.

Pull请求为分布式软件开发提供了一种新的协作方式。为了研究拉取请求分布式开发模型，我们构建了一个包含近900个项目和35万个拉取请求的数据集，其中包括Github上一些最大的拉取请求用户。在本文中，我们描述了项目选择是如何完成的，我们分析了选择的特征，并为R统计环境提供了一个机器学习工具集。

引用次数: 64

Analysing the 'biodiversity' of open source ecosystems: the GitHub case 分析开源生态系统的“生物多样性”:GitHub案例

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597119

N. Matragkas, James R. Williams, D. Kolovos, R. Paige

In nature the diversity of species and genes in ecological communities affects the functioning of these communities. Biologists have found out that more diverse communities appear to be more productive than less diverse communities. Moreover such communities appear to be more stable in the face of perturbations. In this paper, we draw the analogy between ecological communities and Open Source Software (OSS) ecosystems, and we investigate the diversity and structure of OSS communities. To address this question we use the MSR 2014 challenge dataset, which includes data from the top-10 software projects for the top programming languages on GitHub. Our findings show that OSS communities on GitHub consist of 3 types of users (core developers, active users, passive users). Moreover, we show that the percentage of core developers and active users does not change as the project grows and that the majority of members of large projects are passive users.

在自然界中，生态群落中物种和基因的多样性影响着这些群落的功能。生物学家发现，多样性越丰富的群落似乎比多样性越少的群落更有生产力。此外，这样的群落在面对扰动时似乎更稳定。本文将生态社区与开源软件生态系统进行类比，探讨了开源软件社区的多样性和结构。为了解决这个问题，我们使用了MSR 2014挑战数据集，其中包括来自GitHub上顶级编程语言的十大软件项目的数据。我们的研究结果表明，GitHub上的OSS社区由三种类型的用户组成(核心开发人员，活跃用户，被动用户)。此外，我们还表明，核心开发者和活跃用户的比例不会随着项目的增长而改变，而大型项目的大多数成员都是被动用户。

引用次数: 19

Estimating development effort in Free/Open source software projects by mining software repositories: a case study of OpenStack 通过挖掘软件存储库估算免费/开源软件项目的开发工作量:以OpenStack为例

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

Pub Date : 2014-05-31 DOI: 10.1145/2597073.2597107

G. Robles, Jesus M. Gonzalez-Barahona, C. Cervigón, A. Capiluppi, Daniel Izquierdo-Cortazar

Because of the distributed and collaborative nature of free / open source software (FOSS) projects, the development effort invested in a project is usually unknown, even after the software has been released. However, this information is becoming of major interest, especially ---but not only--- because of the growth in the number of companies for which FOSS has become relevant for their business strategy. In this paper we present a novel approach to estimate effort by considering data from source code management repositories. We apply our model to the OpenStack project, a FOSS project with more than 1,000 authors, in which several tens of companies cooperate. Based on data from its repositories and together with the input from a survey answered by more than 100 developers, we show that the model offers a simple, but sound way of obtaining software development estimations with bounded margins of error.

由于自由/开放源码软件(FOSS)项目的分布式和协作性，在项目中投入的开发工作通常是未知的，即使在软件已经发布之后也是如此。然而，这些信息正成为人们的主要兴趣，尤其是——但不仅仅是——因为越来越多的公司将自由/开源软件与他们的商业战略联系起来。在本文中，我们提出了一种通过考虑来自源代码管理存储库的数据来估计工作量的新方法。我们将我们的模型应用于OpenStack项目，这是一个拥有1000多名作者的自由/开源软件项目，其中有几十家公司合作。基于来自其存储库的数据以及来自100多名开发人员回答的调查的输入，我们表明该模型提供了一种简单但可靠的方法来获得具有有限误差范围的软件开发估计。

引用次数: 60

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀