2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)最新文献

英文中文

SPCP-Miner: A tool for mining code clones that are important for refactoring or tracking SPCP-Miner:用于挖掘对重构或跟踪很重要的代码克隆的工具

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

Pub Date : 2015-03-02 DOI: 10.1109/SANER.2015.7081861

Manishankar Mondal, C. Roy, Kevin A. Schneider

Code cloning has both positive and negative impacts on software maintenance and evolution. Focusing on the issues related to code cloning, researchers suggest to manage code clones through refactoring and tracking. However, it is impractical to refactor or track all clones in a software system. Thus, it is essential to identify which clones are important for refactoring and also, which clones are important for tracking. In this paper, we present a tool called SPCP-Miner which is the pioneer one to automatically identify and rank the important refactoring as well as important tracking candidates from the whole set of clones in a software system. SPCP-Miner implements the existing techniques that we used to conduct a large scale empirical study on SPCP clones (i.e., the clones that evolved following a Similarity Preserving Change Pattern called SPCP). We believe that SPCP-Miner can help us in better management of code clones by suggesting important clones for refactoring or tracking.

代码克隆对软件的维护和发展既有积极的影响，也有消极的影响。针对与代码克隆相关的问题，研究人员建议通过重构和跟踪来管理代码克隆。然而，重构或跟踪软件系统中的所有克隆是不切实际的。因此，确定哪些克隆对于重构是重要的，哪些克隆对于跟踪是重要的，这是非常重要的。在本文中，我们提出了一个名为SPCP-Miner的工具，它是在软件系统中从整套克隆中自动识别和排序重要重构以及重要跟踪候选项的先驱。SPCP- miner实现了我们用于对SPCP克隆(即遵循称为SPCP的相似性保持变化模式进化的克隆)进行大规模实证研究的现有技术。我们相信SPCP-Miner可以通过建议重构或跟踪的重要克隆来帮助我们更好地管理代码克隆。

引用次数: 30

SKilLed communication for toolchains 工具链的熟练沟通

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

Pub Date : 2015-03-02 DOI: 10.1109/SANER.2015.7081886

Timm Felden

The creation of a program analysis toolchain involves design choices regarding intermediate representations (IRs). Good choices for an IR depend on the analyses performed by a toolchain. In academia, new analyses are developed frequently. Therefore, the best single IR of a research-oriented toolchain does not exist. Thus, we will describe our design of an IR that can be easily adapted to new requirements.

程序分析工具链的创建涉及到关于中间表示(ir)的设计选择。IR的好选择取决于工具链执行的分析。在学术界，新的分析层出不穷。因此，面向研究的工具链的最佳单一IR并不存在。因此，我们将描述易于适应新需求的IR设计。

引用次数: 0

Impact analysis based on a global hierarchical Object Graph 基于全局分层对象图的影响分析

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

Pub Date : 2015-03-02 DOI: 10.1109/SANER.2015.7081832

Marwan Abi-Antoun, Yibin Wang, E. Khalaj, Andrew Giang, V. Rajlich

During impact analysis on object-oriented code, statically extracting dependencies is often complicated by subclassing, programming to interfaces, aliasing, and collections, among others. When a tool recommends a large number of types or does not rank its recommendations, it may lead developers to explore more irrelevant code. We propose to mine and rank dependencies based on a global, hierarchical points-to graph that is extracted using abstract interpretation. A previous whole-program static analysis interprets a program enriched with annotations that express hierarchy, and over-approximates all the objects that may be created at runtime and how they may communicate. In this paper, an analysis mines the hierarchy and the edges in the graph to extract and rank dependencies such as the most important classes related to a class, or the most important classes behind an interface. An evaluation using two case studies on two systems totaling 10,000 lines of code and five completed code modification tasks shows that following dependencies based on abstract interpretation achieves higher effectiveness compared to following dependencies extracted from the abstract syntax tree. As a result, developers explore less irrelevant code.

在对面向对象代码进行影响分析期间，静态提取依赖关系通常会因为子类化、接口编程、别名和集合等而变得复杂。当一个工具推荐大量的类型或者没有对其推荐进行排序时，它可能会导致开发人员探索更多不相关的代码。我们建议基于使用抽象解释提取的全局分层点到图来挖掘和排序依赖关系。以前的全程序静态分析解释了一个用表达层次结构的注释丰富的程序，并且过度近似于可能在运行时创建的所有对象以及它们如何通信。在本文中，分析挖掘了图中的层次结构和边缘，以提取和排序依赖关系，例如与类相关的最重要类，或接口背后最重要的类。通过对两个系统的两个案例研究(总共10,000行代码和五个已完成的代码修改任务)进行评估，结果表明，与从抽象语法树中提取的依赖项相比，基于抽象解释的依赖项获得了更高的效率。因此，开发人员可以探索不太相关的代码。

{"title":"Impact analysis based on a global hierarchical Object Graph","authors":"Marwan Abi-Antoun, Yibin Wang, E. Khalaj, Andrew Giang, V. Rajlich","doi":"10.1109/SANER.2015.7081832","DOIUrl":"https://doi.org/10.1109/SANER.2015.7081832","url":null,"abstract":"During impact analysis on object-oriented code, statically extracting dependencies is often complicated by subclassing, programming to interfaces, aliasing, and collections, among others. When a tool recommends a large number of types or does not rank its recommendations, it may lead developers to explore more irrelevant code. We propose to mine and rank dependencies based on a global, hierarchical points-to graph that is extracted using abstract interpretation. A previous whole-program static analysis interprets a program enriched with annotations that express hierarchy, and over-approximates all the objects that may be created at runtime and how they may communicate. In this paper, an analysis mines the hierarchy and the edges in the graph to extract and rank dependencies such as the most important classes related to a class, or the most important classes behind an interface. An evaluation using two case studies on two systems totaling 10,000 lines of code and five completed code modification tasks shows that following dependencies based on abstract interpretation achieves higher effectiveness compared to following dependencies extracted from the abstract syntax tree. As a result, developers explore less irrelevant code.","PeriodicalId":355949,"journal":{"name":"2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125820928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

ClonePacker: A tool for clone set visualization ClonePacker:克隆集可视化工具

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

Pub Date : 2015-03-02 DOI: 10.1109/SANER.2015.7081859

Hiroaki Murakami, Yoshiki Higo, S. Kusumoto

Programmers often copy and paste code fragments when they would like to reuse them. Although copy-and-paste operations enable programmers to realize rapid developments of software systems, it makes code clones. Some clones have negative impacts on software developments. For example, if we modify a code fragment, we have to check whether its clones need the same modification. In this case, programmers often use tools that take a code fragment as input and take its clones as output. However, when programmers use such existing tools, programmers have to open a number of source code and move up/down a scroll bar for browsing the detected clones. In order to reduce the cost of browsing the detected clones, we developed a tool that visualizes clones by using Circle Packing, named ClonePacker. As a result of an experiment with participants, we confirmed that participants using ClonePacker reported the locations of clones faster than an existing tool.

当程序员想要重用代码片段时，他们经常复制和粘贴代码片段。尽管复制-粘贴操作使程序员能够实现软件系统的快速开发，但它也造成了代码克隆。一些克隆对软件开发有负面影响。例如，如果我们修改一个代码片段，我们必须检查它的克隆是否需要同样的修改。在这种情况下，程序员通常使用将代码片段作为输入并将其克隆作为输出的工具。然而，当程序员使用这些现有的工具时，程序员必须打开许多源代码，并上下移动滚动条以浏览检测到的克隆。为了降低浏览检测到的克隆的成本，我们开发了一个工具，通过使用圆形包装可视化克隆，名为ClonePacker。通过对参与者的实验，我们证实使用ClonePacker的参与者报告克隆位置的速度比现有工具快。

引用次数: 6

Assessing the bus factor of Git repositories 评估Git存储库的总线因素

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

Pub Date : 2015-03-02 DOI: 10.1109/SANER.2015.7081864

Valerio Cosentino, Javier Luis Cánovas Izquierdo, Jordi Cabot

Software development projects face a lot of risks (requirements inflation, poor scheduling, technical problems, etc.). Underestimating those risks may put in danger the project success. One of the most critical risks is the employee turnover, that is the risk of key personnel leaving the project. A good indicator to evaluate this risk is to measure the concentration of information in individual developers. This is also popularly known as the bus factor (“number of key developers who would need to be incapacitated, i.e. hit by a bus, to make a project unable to proceed”). Despite the simplicity of the concept, calculating the actual bus factor for specific projects can quickly turn into an error-prone and time-consuming activity as soon as the size of the project and development team increase. In order to help project managers to assess the bus factor of their projects, in this paper we present a tool that, given a Git-based repository, automatically measures the bus factor for any file, directory and branch in the repository and for the project itself. You can also simulate with the tool what would happen to the project (e.g., which files would become orphans) if one or more developers disappeared.

软件开发项目面临许多风险(需求膨胀、糟糕的调度、技术问题等)。低估这些风险可能会危及项目的成功。最关键的风险之一是员工流动，即关键人员离开项目的风险。评估这种风险的一个很好的指标是度量单个开发人员的信息集中程度。这也通常被称为总线因素(“需要丧失能力的关键开发人员的数量，例如被总线击中，使项目无法进行”)。尽管这个概念很简单，但是一旦项目和开发团队的规模增加，计算特定项目的实际总线因子很快就会变成一个容易出错且耗时的活动。为了帮助项目经理评估他们项目的总线因子，在本文中，我们提供了一个工具，给定一个基于git的存储库，它可以自动测量存储库中任何文件、目录和分支以及项目本身的总线因子。您还可以使用该工具模拟如果一个或多个开发人员消失，项目会发生什么情况(例如，哪些文件将成为孤儿)。

{"title":"Assessing the bus factor of Git repositories","authors":"Valerio Cosentino, Javier Luis Cánovas Izquierdo, Jordi Cabot","doi":"10.1109/SANER.2015.7081864","DOIUrl":"https://doi.org/10.1109/SANER.2015.7081864","url":null,"abstract":"Software development projects face a lot of risks (requirements inflation, poor scheduling, technical problems, etc.). Underestimating those risks may put in danger the project success. One of the most critical risks is the employee turnover, that is the risk of key personnel leaving the project. A good indicator to evaluate this risk is to measure the concentration of information in individual developers. This is also popularly known as the bus factor (“number of key developers who would need to be incapacitated, i.e. hit by a bus, to make a project unable to proceed”). Despite the simplicity of the concept, calculating the actual bus factor for specific projects can quickly turn into an error-prone and time-consuming activity as soon as the size of the project and development team increase. In order to help project managers to assess the bus factor of their projects, in this paper we present a tool that, given a Git-based repository, automatically measures the bus factor for any file, directory and branch in the repository and for the project itself. You can also simulate with the tool what would happen to the project (e.g., which files would become orphans) if one or more developers disappeared.","PeriodicalId":355949,"journal":{"name":"2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117134369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

Are PHP applications ready for Hack? PHP应用程序准备好Hack了吗?

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

Pub Date : 2015-03-02 DOI: 10.1109/SANER.2015.7081816

L. Eshkevari, F. D. Santos, J. Cordy, G. Antoniol

PHP is by far the most popular WEB scripting language, accounting for more than 80% of existing websites. PHP is dynamically typed, which means that variables take on the type of the objects that they are assigned, and may change type as execution proceeds. While some type changes are likely not harmful, others involving function calls and global variables may be more difficult to understand and the source of many bugs. Hack, a new PHP variant endorsed by Facebook, attempts to address this problem by adding static typing to PHP variables, which limits them to a single consistent type throughout execution. This paper defines an empirical taxonomy of PHP type changes along three dimensions: the complexity or burden imposed to understand the type change; whether or not the change is potentially harmful; and the actual types changed. We apply static and dynamic analyses to three widely used WEB applications coded in PHP (WordPress, Drupal and phpBB) to investigate (1) to what extent developers really use dynamic typing, (2) what kinds of type changes are actually encountered; and (3) how difficult it might be to refactor the code to avoid type changes, and thus meet the constraints of Hack's static typing. We report evidence that dynamic typing is actually a relatively uncommon practice in production PHP programs, and that most dynamic type changes are simple representational changes, such as between strings and integers. We observe that most PHP type changes in these programs are relatively simple, and that the largest proportion of them are easy to refactor to consistent static typing using simple local renaming transformations. Overall, the paper casts doubt on the usefulness of dynamic typing in PHP, and indicates that for many production applications, conversion to Hack's static typing may not be very difficult.

PHP是目前最流行的WEB脚本语言，占现有网站的80%以上。PHP是动态类型的，这意味着变量采用分配给它们的对象的类型，并且可能在执行过程中改变类型。虽然一些类型更改可能没有害处，但涉及函数调用和全局变量的其他类型更改可能更难以理解，并且是许多bug的来源。Hack是Facebook支持的一种新的PHP变体，它试图通过向PHP变量添加静态类型来解决这个问题，这将它们在整个执行过程中限制为单一的一致类型。本文从三个方面定义了PHP类型更改的经验分类法:理解类型更改所带来的复杂性或负担;这种变化是否具有潜在的危害;实际的类型也变了。我们对三个使用PHP编写的广泛使用的WEB应用程序(WordPress、Drupal和phpBB)进行了静态和动态分析，以调查(1)开发人员在多大程度上真正使用了动态类型;(2)实际遇到了哪些类型的更改;(3)重构代码以避免类型更改，从而满足Hack的静态类型约束的困难程度。我们报告的证据表明，动态类型实际上是生产PHP程序中相对不常见的实践，并且大多数动态类型更改都是简单的表示更改，例如字符串和整数之间的更改。我们观察到，这些程序中的大多数PHP类型更改都相对简单，而且其中大部分很容易通过简单的本地重命名转换重构为一致的静态类型。总的来说，本文对PHP中动态类型的有用性提出了质疑，并指出对于许多生产应用程序来说，转换为Hack的静态类型可能不是很困难。

{"title":"Are PHP applications ready for Hack?","authors":"L. Eshkevari, F. D. Santos, J. Cordy, G. Antoniol","doi":"10.1109/SANER.2015.7081816","DOIUrl":"https://doi.org/10.1109/SANER.2015.7081816","url":null,"abstract":"PHP is by far the most popular WEB scripting language, accounting for more than 80% of existing websites. PHP is dynamically typed, which means that variables take on the type of the objects that they are assigned, and may change type as execution proceeds. While some type changes are likely not harmful, others involving function calls and global variables may be more difficult to understand and the source of many bugs. Hack, a new PHP variant endorsed by Facebook, attempts to address this problem by adding static typing to PHP variables, which limits them to a single consistent type throughout execution. This paper defines an empirical taxonomy of PHP type changes along three dimensions: the complexity or burden imposed to understand the type change; whether or not the change is potentially harmful; and the actual types changed. We apply static and dynamic analyses to three widely used WEB applications coded in PHP (WordPress, Drupal and phpBB) to investigate (1) to what extent developers really use dynamic typing, (2) what kinds of type changes are actually encountered; and (3) how difficult it might be to refactor the code to avoid type changes, and thus meet the constraints of Hack's static typing. We report evidence that dynamic typing is actually a relatively uncommon practice in production PHP programs, and that most dynamic type changes are simple representational changes, such as between strings and integers. We observe that most PHP type changes in these programs are relatively simple, and that the largest proportion of them are easy to refactor to consistent static typing using simple local renaming transformations. Overall, the paper casts doubt on the usefulness of dynamic typing in PHP, and indicates that for many production applications, conversion to Hack's static typing may not be very difficult.","PeriodicalId":355949,"journal":{"name":"2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123547484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Explore the evolution of development topics via on-line LDA 通过在线LDA探索发展主题的演变

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

Pub Date : 2015-03-02 DOI: 10.1109/SANER.2015.7081876

Jiajun Hu, Xiaobing Sun, Bin Li

Software repositories such as revision control systems and bug tracking systems are usually used to manage the changes of software projects. During software maintenance and evolution, software developers and stakeholders need to investigate these repositories to identify what tasks were worked on in a particular time interval and how much effort was devoted to them. A typical way of mining software repositories is to use topic analysis models, e.g., Latent Dirichlet Allocation (LDA), to identify and organize the underlying structure in software documents to understand the evolution of development topics. These previously LDA-based topic analysis models can capture either changes on the strength (popularity) of various development topics over time (i.e., strength evolution) or changes in the content (the words that form the topic) of existing topics over time (i.e., content evolution). Unfortunately, few techniques can capture both strength and content evolution simultaneously. However, both pieces of information are necessary for developers to fully understand how software evolves. In this paper, we propose a novel approach to analyze commit messages within a project's lifetime to capture both strength and content evolution simultaneously via Online Latent Dirichlet Allocation (On-Line LDA). Moreover, the proposed approach also provides an efficient way to detect emerging topics in real development iteration when a new feature request arrives at a particular time, thus helping project stakeholds progress their projects smoothly.

诸如修订控制系统和bug跟踪系统之类的软件存储库通常用于管理软件项目的更改。在软件维护和发展期间，软件开发人员和涉众需要调查这些存储库，以确定在特定的时间间隔内工作了哪些任务，以及为它们投入了多少工作。挖掘软件存储库的一种典型方法是使用主题分析模型，例如Latent Dirichlet Allocation (LDA)，来识别和组织软件文档中的底层结构，以了解开发主题的演变。这些以前基于lda的主题分析模型既可以捕获各种开发主题的强度(流行度)随时间的变化(即，强度演变)，也可以捕获现有主题的内容(构成主题的单词)随时间的变化(即，内容演变)。不幸的是，很少有技术可以同时捕获强度和内容的演变。然而，这两条信息对于开发人员完全理解软件的发展是必要的。在本文中，我们提出了一种新的方法来分析项目生命周期内的提交消息，通过在线潜在狄利克雷分配(Online Latent Dirichlet Allocation，在线LDA)同时捕获强度和内容演变。此外，该方法还提供了一种有效的方法来检测在实际开发迭代中出现的主题，当一个新的特性请求到达特定的时间时，从而帮助项目涉众顺利地推进他们的项目。

{"title":"Explore the evolution of development topics via on-line LDA","authors":"Jiajun Hu, Xiaobing Sun, Bin Li","doi":"10.1109/SANER.2015.7081876","DOIUrl":"https://doi.org/10.1109/SANER.2015.7081876","url":null,"abstract":"Software repositories such as revision control systems and bug tracking systems are usually used to manage the changes of software projects. During software maintenance and evolution, software developers and stakeholders need to investigate these repositories to identify what tasks were worked on in a particular time interval and how much effort was devoted to them. A typical way of mining software repositories is to use topic analysis models, e.g., Latent Dirichlet Allocation (LDA), to identify and organize the underlying structure in software documents to understand the evolution of development topics. These previously LDA-based topic analysis models can capture either changes on the strength (popularity) of various development topics over time (i.e., strength evolution) or changes in the content (the words that form the topic) of existing topics over time (i.e., content evolution). Unfortunately, few techniques can capture both strength and content evolution simultaneously. However, both pieces of information are necessary for developers to fully understand how software evolves. In this paper, we propose a novel approach to analyze commit messages within a project's lifetime to capture both strength and content evolution simultaneously via Online Latent Dirichlet Allocation (On-Line LDA). Moreover, the proposed approach also provides an efficient way to detect emerging topics in real development iteration when a new feature request arrives at a particular time, thus helping project stakeholds progress their projects smoothly.","PeriodicalId":355949,"journal":{"name":"2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128383674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Where was this SQL query executed? a static concept location approach 这个SQL查询在哪里执行?静态概念定位方法

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

Pub Date : 2015-03-02 DOI: 10.1109/SANER.2015.7081881

Csaba Nagy, L. Meurice, Anthony Cleve

Concept location in software engineering is the process of identifying where a specific concept is implemented in the source code of a software system. It is a very common task performed by developers during development or maintenance, and many techniques have been studied by researchers to make it more efficient. However, most of the current techniques ignore the role of a database in the architecture of a system, which is also an important source of concepts or dependencies among them. In this paper, we present a concept location technique for data-intensive systems, as systems with at least one database server in their architecture which is intensively used by its clients. Specifically, we present a static technique for identifying the exact source code location from where a given SQL query was sent to the database. We evaluate our technique by collecting and locating SQL queries from testing scenarios of two open source Java systems under active development. With our technique, we are able to successfully identify the source of most of these queries.

在软件工程中，概念定位是识别特定概念在软件系统源代码中的实现位置的过程。这是开发人员在开发或维护期间执行的一项非常常见的任务，研究人员已经研究了许多技术来提高其效率。然而，当前的大多数技术都忽略了数据库在系统体系结构中的作用，而数据库又是它们之间的概念或依赖关系的重要来源。在本文中，我们提出了一种数据密集型系统的概念定位技术，即在其体系结构中至少有一个数据库服务器被其客户端密集使用的系统。具体地说，我们提供了一种静态技术，用于识别将给定SQL查询发送到数据库的源代码的确切位置。我们通过收集和定位正在积极开发的两个开源Java系统的测试场景中的SQL查询来评估我们的技术。使用我们的技术，我们能够成功地识别大多数这些查询的来源。

引用次数: 23

Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review 谁应该审查我的代码?基于文件位置的现代代码审查推荐方法

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

Pub Date : 2015-03-02 DOI: 10.1109/SANER.2015.7081824

Patanamon Thongtanunam, C. Tantithamthavorn, R. Kula, Norihiro Yoshida, Hajimu Iida, Ken-ichi Matsumoto

Software code review is an inspection of a code change by an independent third-party developer in order to identify and fix defects before an integration. Effectively performing code review can improve the overall software quality. In recent years, Modern Code Review (MCR), a lightweight and tool-based code inspection, has been widely adopted in both proprietary and open-source software systems. Finding appropriate code-reviewers in MCR is a necessary step of reviewing a code change. However, little research is known the difficulty of finding code-reviewers in a distributed software development and its impact on reviewing time. In this paper, we investigate the impact of reviews with code-reviewer assignment problem has on reviewing time. We find that reviews with code-reviewer assignment problem take 12 days longer to approve a code change. To help developers find appropriate code-reviewers, we propose RevFinder, a file location-based code-reviewer recommendation approach. We leverage a similarity of previously reviewed file path to recommend an appropriate code-reviewer. The intuition is that files that are located in similar file paths would be managed and reviewed by similar experienced code-reviewers. Through an empirical evaluation on a case study of 42,045 reviews of Android Open Source Project (AOSP), OpenStack, Qt and LibreOffice projects, we find that RevFinder accurately recommended 79% of reviews with a top 10 recommendation. RevFinder also correctly recommended the code-reviewers with a median rank of 4. The overall ranking of RevFinder is 3 times better than that of a baseline approach. We believe that RevFinder could be applied to MCR in order to help developers find appropriate code-reviewers and speed up the overall code review process.

软件代码审查是由独立的第三方开发人员对代码更改进行的检查，目的是在集成之前识别和修复缺陷。有效地执行代码审查可以提高软件的整体质量。近年来，现代代码审查(Modern Code Review, MCR)，一种轻量级的、基于工具的代码检查，在私有软件和开源软件系统中被广泛采用。在MCR中找到合适的代码审查者是审查代码更改的必要步骤。然而，很少有研究知道在分布式软件开发中寻找代码审查者的困难及其对审查时间的影响。在本文中，我们研究了代码评审分配问题对评审时间的影响。我们发现，有代码审稿人分配问题的评审要多花12天的时间来批准一个代码变更。为了帮助开发人员找到合适的代码审查者，我们提出RevFinder，这是一种基于文件位置的代码审查者推荐方法。我们利用先前审查过的文件路径的相似性来推荐合适的代码审查者。直觉是，位于类似文件路径中的文件将由类似的经验丰富的代码审查人员管理和审查。通过对Android开源项目(AOSP)、OpenStack、Qt和LibreOffice项目的42,045条评论的案例研究进行实证评估，我们发现RevFinder准确推荐了79%的前10条推荐。RevFinder还正确地推荐了中位数为4的代码审阅者。RevFinder的总体排名是基线方法的3倍。我们相信RevFinder可以应用于MCR，以帮助开发人员找到合适的代码审查者，并加快整个代码审查过程。

{"title":"Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review","authors":"Patanamon Thongtanunam, C. Tantithamthavorn, R. Kula, Norihiro Yoshida, Hajimu Iida, Ken-ichi Matsumoto","doi":"10.1109/SANER.2015.7081824","DOIUrl":"https://doi.org/10.1109/SANER.2015.7081824","url":null,"abstract":"Software code review is an inspection of a code change by an independent third-party developer in order to identify and fix defects before an integration. Effectively performing code review can improve the overall software quality. In recent years, Modern Code Review (MCR), a lightweight and tool-based code inspection, has been widely adopted in both proprietary and open-source software systems. Finding appropriate code-reviewers in MCR is a necessary step of reviewing a code change. However, little research is known the difficulty of finding code-reviewers in a distributed software development and its impact on reviewing time. In this paper, we investigate the impact of reviews with code-reviewer assignment problem has on reviewing time. We find that reviews with code-reviewer assignment problem take 12 days longer to approve a code change. To help developers find appropriate code-reviewers, we propose RevFinder, a file location-based code-reviewer recommendation approach. We leverage a similarity of previously reviewed file path to recommend an appropriate code-reviewer. The intuition is that files that are located in similar file paths would be managed and reviewed by similar experienced code-reviewers. Through an empirical evaluation on a case study of 42,045 reviews of Android Open Source Project (AOSP), OpenStack, Qt and LibreOffice projects, we find that RevFinder accurately recommended 79% of reviews with a top 10 recommendation. RevFinder also correctly recommended the code-reviewers with a median rank of 4. The overall ranking of RevFinder is 3 times better than that of a baseline approach. We believe that RevFinder could be applied to MCR in order to help developers find appropriate code-reviewers and speed up the overall code review process.","PeriodicalId":355949,"journal":{"name":"2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129734191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 202

Investigating modern release engineering practices 调查现代发布工程实践

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

Pub Date : 2015-03-02 DOI: 10.1109/SANER.2015.7081893

Md Tajmilur Rahman

In my PhD research I will focus on modern release engineering practices. First, I have quantified the time and effort that is involved in stabilizing a release. I found that despite using rapid release, the Chrome and Linux projects still have a period where they rush changes into a release. Second, developers typically isolate unrelated changes on branches. However, developers at major companies, such as Google and Facebook, commit all changes to a single branch. They isolate unrelated changes using feature-flags, which allows them to disable works in progress. My goal is to empirically determine the best practices when using flags and identify dead code. Finally, I will develop tool support to manage feature flags.

在我的博士研究中，我将专注于现代发布工程实践。首先，我已经量化了稳定发布所涉及的时间和精力。我发现，尽管使用了快速发布，Chrome和Linux项目仍然有一段时间会将更改匆忙发布。其次，开发人员通常在分支上隔离不相关的更改。然而，大公司(如Google和Facebook)的开发人员会将所有更改提交到单个分支。他们使用特性标志来隔离不相关的更改，这允许他们禁用正在进行的工作。我的目标是根据经验确定使用标志和识别死代码时的最佳实践。最后，我将开发工具支持来管理特性标志。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀