2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)最新文献

Automatically identifying focal methods under test in unit test cases 在单元测试用例中自动识别测试中的焦点方法

2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)

Pub Date : 2015-11-23 DOI: 10.1109/SCAM.2015.7335402

Mohammad Ghafari, C. Ghezzi, K. Rubinov

Modern iterative and incremental software development relies on continuous testing. The knowledge of test-to-code traceability links facilitates test-driven development and improves software evolution. Previous research identified traceability links between test cases and classes under test. Though this information is helpful, a finer granularity technique can provide more useful information beyond the knowledge of the class under test. In this paper, we focus on Java classes that instantiate stateful objects and propose an automated technique for precise detection of the focal methods under test in unit test cases. Focal methods represent the core of a test scenario inside a unit test case. Their main purpose is to affect an object's state that is then checked by other inspector methods whose purpose is ancillary and needs to be identified as such. Distinguishing focal from other (non-focal) methods is hard to accomplish manually. We propose an approach to detect focal methods under test automatically. An experimental assessment with real-world software shows that our approach identifies focal methods under test in more than 85% of cases, providing a ground for precise automatic recovery of test-to-code traceability links.

现代迭代和增量软件开发依赖于持续的测试。测试到代码的可追溯性链接的知识促进了测试驱动的开发，并改进了软件的发展。先前的研究确定了测试用例和被测试类之间的可追溯性链接。虽然这些信息很有帮助，但是更细粒度的技术可以提供更多有用的信息，而不仅仅是被测类的知识。在本文中，我们将重点放在实例化有状态对象的Java类上，并提出一种自动化技术，用于在单元测试用例中精确检测被测试的重点方法。焦点方法表示单元测试用例中测试场景的核心。它们的主要目的是影响对象的状态，然后由其他检查器方法检查，这些检查器方法的目的是辅助的，需要被识别出来。将焦点与其他(非焦点)方法区分开来很难手动完成。提出了一种自动检测被测焦点方法的方法。对真实世界软件的实验评估表明，我们的方法在超过85%的情况下识别了测试中的焦点方法，为测试到代码的可追溯性链接的精确自动恢复提供了基础。

{"title":"Automatically identifying focal methods under test in unit test cases","authors":"Mohammad Ghafari, C. Ghezzi, K. Rubinov","doi":"10.1109/SCAM.2015.7335402","DOIUrl":"https://doi.org/10.1109/SCAM.2015.7335402","url":null,"abstract":"Modern iterative and incremental software development relies on continuous testing. The knowledge of test-to-code traceability links facilitates test-driven development and improves software evolution. Previous research identified traceability links between test cases and classes under test. Though this information is helpful, a finer granularity technique can provide more useful information beyond the knowledge of the class under test. In this paper, we focus on Java classes that instantiate stateful objects and propose an automated technique for precise detection of the focal methods under test in unit test cases. Focal methods represent the core of a test scenario inside a unit test case. Their main purpose is to affect an object's state that is then checked by other inspector methods whose purpose is ancillary and needs to be identified as such. Distinguishing focal from other (non-focal) methods is hard to accomplish manually. We propose an approach to detect focal methods under test automatically. An experimental assessment with real-world software shows that our approach identifies focal methods under test in more than 85% of cases, providing a ground for precise automatic recovery of test-to-code traceability links.","PeriodicalId":192232,"journal":{"name":"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117345500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

FaultBuster: An automatic code smell refactoring toolset FaultBuster:一个自动代码气味重构工具集

2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)

Pub Date : 2015-11-23 DOI: 10.1109/SCAM.2015.7335422

Gábor Szoke, Csaba Nagy, Lajos Jeno Fülöp, R. Ferenc, T. Gyimóthy

One solution to prevent the quality erosion of a software product is to maintain its quality by continuous refac-toring. However, refactoring is not always easy. Developers need to identify the piece of code that should be improved and decide how to rewrite it. Furthermore, refactoring can also be risky; that is, the modified code needs to be re-tested, so developers can see if they broke something. Many IDEs offer a range of refactorings to support so-called automatic refactoring, but tools which are really able to automatically refactor code smells are still under research. In this paper we introduce FaultBuster, a refactoring toolset which is able to support automatic refactoring: identifying the problematic code parts via static code analysis, running automatic algorithms to fix selected code smells, and executing integrated testing tools. In the heart of the toolset lies a refactoring framework to control the analysis and the execution of automatic algorithms. FaultBuster provides IDE plugins to interact with developers via popular IDEs (Eclipse, Netbeans and IntelliJ IDEA). All the tools were developed and tested in a 2-year project with 6 software development companies where thousands of code smells were identified and fixed in 5 systems having altogether over 5 million lines of code.

防止软件产品质量下降的一个解决方案是通过持续的重构来维持其质量。然而，重构并不总是那么容易。开发人员需要确定应该改进的代码片段，并决定如何重写它。此外，重构也可能有风险;也就是说，修改后的代码需要重新测试，这样开发人员就可以看到他们是否破坏了某些东西。许多ide提供了一系列的重构来支持所谓的自动重构，但是真正能够自动重构代码气味的工具仍在研究中。在本文中，我们介绍了FaultBuster，一个重构工具集，它能够支持自动重构:通过静态代码分析识别有问题的代码部分，运行自动算法来修复选定的代码气味，并执行集成测试工具。该工具集的核心是一个重构框架，用于控制自动算法的分析和执行。FaultBuster提供IDE插件，通过流行的IDE (Eclipse, Netbeans和IntelliJ IDEA)与开发人员进行交互。所有这些工具都是在6家软件开发公司为期2年的项目中开发和测试的，其中在5个系统中识别并修复了数千个代码气味，总共有超过500万行代码。

{"title":"FaultBuster: An automatic code smell refactoring toolset","authors":"Gábor Szoke, Csaba Nagy, Lajos Jeno Fülöp, R. Ferenc, T. Gyimóthy","doi":"10.1109/SCAM.2015.7335422","DOIUrl":"https://doi.org/10.1109/SCAM.2015.7335422","url":null,"abstract":"One solution to prevent the quality erosion of a software product is to maintain its quality by continuous refac-toring. However, refactoring is not always easy. Developers need to identify the piece of code that should be improved and decide how to rewrite it. Furthermore, refactoring can also be risky; that is, the modified code needs to be re-tested, so developers can see if they broke something. Many IDEs offer a range of refactorings to support so-called automatic refactoring, but tools which are really able to automatically refactor code smells are still under research. In this paper we introduce FaultBuster, a refactoring toolset which is able to support automatic refactoring: identifying the problematic code parts via static code analysis, running automatic algorithms to fix selected code smells, and executing integrated testing tools. In the heart of the toolset lies a refactoring framework to control the analysis and the execution of automatic algorithms. FaultBuster provides IDE plugins to interact with developers via popular IDEs (Eclipse, Netbeans and IntelliJ IDEA). All the tools were developed and tested in a 2-year project with 6 software development companies where thousands of code smells were identified and fixed in 5 systems having altogether over 5 million lines of code.","PeriodicalId":192232,"journal":{"name":"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115106190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

Using changeset descriptions as a data source to assist feature location 使用变更集描述作为数据源来帮助定位特性

2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)

Pub Date : 2015-11-23 DOI: 10.1109/SCAM.2015.7335401

Muslim Chochlov, M. English, J. Buckley

Feature location attempts to assist developers in discovering functionality in source code. Many textual feature location techniques utilize information retrieval and rely on comments and identifiers of source code to describe software entities. An interesting alternative would be to employ the changeset descriptions of the code altered in that changeset as a data source to describe such software entities. To investigate this we implement a technique utilizing changeset descriptions and conduct an empirical study to observe this technique's overall performance. Moreover, we study how the granularity (i.e. file or method level of software entities) and changeset range inclusion (i.e. most recent or all historical changesets) affect such an approach. The results of a preliminary study with Rhino and Mylyn. Tasks systems suggest that the approach could lead to a potentially efficient feature location technique. They also suggest that it is advantageous in terms of the effort to configure the technique at method level granularity and that older changesets from older systems may reduce the effectiveness of the technique.

特性定位试图帮助开发人员发现源代码中的功能。许多文本特征定位技术利用信息检索，依赖于源代码的注释和标识符来描述软件实体。一个有趣的替代方法是使用在该更改集中更改的代码的更改集描述作为描述此类软件实体的数据源。为了研究这一点，我们实现了一种利用变更集描述的技术，并进行了一项实证研究，以观察该技术的总体性能。此外，我们还研究了粒度(即软件实体的文件或方法级别)和变更集范围包含(即最近或所有历史变更集)如何影响这种方法。Rhino和Mylyn的初步研究结果。任务系统表明，该方法可能导致一种潜在的有效的特征定位技术。他们还建议，就在方法级别粒度上配置技术的努力而言，它是有利的，并且来自旧系统的旧更改集可能会降低该技术的有效性。

{"title":"Using changeset descriptions as a data source to assist feature location","authors":"Muslim Chochlov, M. English, J. Buckley","doi":"10.1109/SCAM.2015.7335401","DOIUrl":"https://doi.org/10.1109/SCAM.2015.7335401","url":null,"abstract":"Feature location attempts to assist developers in discovering functionality in source code. Many textual feature location techniques utilize information retrieval and rely on comments and identifiers of source code to describe software entities. An interesting alternative would be to employ the changeset descriptions of the code altered in that changeset as a data source to describe such software entities. To investigate this we implement a technique utilizing changeset descriptions and conduct an empirical study to observe this technique's overall performance. Moreover, we study how the granularity (i.e. file or method level of software entities) and changeset range inclusion (i.e. most recent or all historical changesets) affect such an approach. The results of a preliminary study with Rhino and Mylyn. Tasks systems suggest that the approach could lead to a potentially efficient feature location technique. They also suggest that it is advantageous in terms of the effort to configure the technique at method level granularity and that older changesets from older systems may reduce the effectiveness of the technique.","PeriodicalId":192232,"journal":{"name":"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133735864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Navigating source code with words 用单词导航源代码

2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)

Pub Date : 2015-11-23 DOI: 10.1109/SCAM.2015.7335403

Dawn J Lawrie, D. Binkley

The hierarchical method of organizing information has proven beneficial in learning in part because it maps well onto the human brain's memory. Exploiting this organizational strategy may help engineers cope with large software systems. In fact such an strategy is already present in source code and is manifested in the class hierarchies of objected-oriented programs. However, an engineer faced with fixing a bug or any similar need to locate the implementation of a particular feature in the code is less interested in the syntactic organization of the code and more interested in its conceptual organization. Therefore, a conceptual hierarchy would bring clear benefit. Fortunately, such a view can be extracted automatically the source code. The hierarchy generating tool HierIT performs this task using an information-theoretic approach to identify “content-bearing” words and associate them hierarchically. The resulting hierarchy enables an engineer to better understand the concepts contained in a software system. To study their value, an experiment was conducted to quantitatively and qualitatively investigate the value that hierarchies bring. The quantitative evaluation first considers the Expected Mutual Information Measure (EMIM) between the set of topic words and natural language extracted from the source code. It then considers the Best Case Tree Walk (BCTW), which captures how “expensive” it is to find interesting documents. Finally, the hierarchies are considered qualitatively by investigating their perceived usefulness in a case study involving three engineers.

组织信息的分层方法已被证明对学习有益，部分原因是它很好地映射了人类大脑的记忆。利用这种组织策略可以帮助工程师处理大型软件系统。事实上，这样的策略已经出现在源代码中，并且在面向对象程序的类层次结构中得到了体现。然而，当工程师面临修复bug或定位代码中特定功能实现的类似需求时，他对代码的语法组织不太感兴趣，而对其概念组织更感兴趣。因此，概念层次将带来明显的好处。幸运的是，这样的视图可以从源代码中自动提取出来。层次结构生成工具HierIT使用一种信息理论方法来识别“包含内容”的单词，并在层次上将它们关联起来。由此产生的层次结构使工程师能够更好地理解软件系统中包含的概念。为了研究它们的价值，我们进行了一项实验，定量地和定性地考察了层次所带来的价值。定量评价首先考虑从源代码中提取的主题词集和自然语言之间的期望互信息度量(EMIM)。然后，它考虑最佳情况树遍历(BCTW)，它捕捉到寻找有趣文档的“代价”。最后，在一个涉及三名工程师的案例研究中，通过调查层次结构的感知有用性，定性地考虑了层次结构。

{"title":"Navigating source code with words","authors":"Dawn J Lawrie, D. Binkley","doi":"10.1109/SCAM.2015.7335403","DOIUrl":"https://doi.org/10.1109/SCAM.2015.7335403","url":null,"abstract":"The hierarchical method of organizing information has proven beneficial in learning in part because it maps well onto the human brain's memory. Exploiting this organizational strategy may help engineers cope with large software systems. In fact such an strategy is already present in source code and is manifested in the class hierarchies of objected-oriented programs. However, an engineer faced with fixing a bug or any similar need to locate the implementation of a particular feature in the code is less interested in the syntactic organization of the code and more interested in its conceptual organization. Therefore, a conceptual hierarchy would bring clear benefit. Fortunately, such a view can be extracted automatically the source code. The hierarchy generating tool HierIT performs this task using an information-theoretic approach to identify “content-bearing” words and associate them hierarchically. The resulting hierarchy enables an engineer to better understand the concepts contained in a software system. To study their value, an experiment was conducted to quantitatively and qualitatively investigate the value that hierarchies bring. The quantitative evaluation first considers the Expected Mutual Information Measure (EMIM) between the set of topic words and natural language extracted from the source code. It then considers the Best Case Tree Walk (BCTW), which captures how “expensive” it is to find interesting documents. Finally, the hierarchies are considered qualitatively by investigating their perceived usefulness in a case study involving three engineers.","PeriodicalId":192232,"journal":{"name":"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133824684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting function purity in JavaScript 检测JavaScript中的函数纯度

2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)

Pub Date : 2015-11-23 DOI: 10.1109/SCAM.2015.7335406

Jens Nicolay, Carlos Noguera, Coen De Roover, W. Meuter

We present an approach to detect function purity in JavaScript. A function is pure if none of its applications cause observable side-effects. The approach is based on a pushdown flow analysis that besides traditional control and value flow also keeps track of write effects. To increase the precision of our purity analysis, we combine it with an intraprocedural analysis to determine freshness of variables and object references. We formalize the core aspects of our analysis, and discuss our implementation used to analyze several common JavaScript benchmarks. Experiments show that our technique is capable of detecting function purity, even in the presence of higher-order functions, dynamic property expressions, and prototypal inheritance.

我们提出了一种在JavaScript中检测函数纯度的方法。如果一个函数的任何应用程序都不会引起可观察到的副作用，那么这个函数就是纯粹的。该方法基于下推流分析，除了传统的控制和价值流之外，还跟踪写入效果。为了提高纯度分析的精度，我们将其与程序内分析相结合，以确定变量和对象引用的新鲜度。我们将形式化分析的核心方面，并讨论用于分析几个常见JavaScript基准的实现。实验表明，即使在存在高阶函数、动态属性表达式和原型继承的情况下，我们的技术也能够检测函数纯度。

引用次数: 14

ORBS and the limits of static slicing orb和静态切片的限制

2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)

Pub Date : 2015-11-23 DOI: 10.1109/SCAM.2015.7335396

D. Binkley, N. Gold, M. Harman, Syed S. Islam, J. Krinke, S. Yoo

Observation-based slicing is a recently-introduced, language-independent slicing technique based on the dependencies observable from program behaviour. Due to the well-known limits of dynamic analysis, we may only compute an under-approximation of the true observation-based slice. However, because the observation-based slice captures all possible dependence that can be observed, even such approximations can yield insight into the limitations of static slicing. For example, a static slice, S, that is strictly smaller than the corresponding observation based slice is potentially unsafe. We present the results of three sets of experiments on 12 different programs, including benchmarks and larger programs, which investigate the relationship between static and observation-based slicing. We show that, in extreme cases, observation-based slices can find the true minimal static slice, where static techniques cannot. For more typical cases, our results illustrate the potential for observation-based slicing to highlight limitations in static slicers. Finally, we report on the sensitivity of observation-based slicing to test quality.

基于观察的切片是最近引入的一种独立于语言的切片技术，它基于从程序行为中观察到的依赖关系。由于众所周知的动态分析的局限性，我们可能只计算一个真实的基于观察的切片的不足近似值。然而，由于基于观察的切片捕获了可以观察到的所有可能的依赖性，因此即使是这样的近似也可以深入了解静态切片的局限性。例如，严格小于相应的基于观察的切片的静态切片S可能是不安全的。我们在12个不同的程序上进行了三组实验，包括基准测试和更大的程序，研究了静态切片和基于观察的切片之间的关系。我们表明，在极端情况下，基于观察的切片可以找到真正的最小静态切片，而静态技术不能。对于更典型的情况，我们的结果说明了基于观察的切片的潜力，以突出静态切片器的局限性。最后，我们报告了基于观察的切片对测试质量的敏感性。

{"title":"ORBS and the limits of static slicing","authors":"D. Binkley, N. Gold, M. Harman, Syed S. Islam, J. Krinke, S. Yoo","doi":"10.1109/SCAM.2015.7335396","DOIUrl":"https://doi.org/10.1109/SCAM.2015.7335396","url":null,"abstract":"Observation-based slicing is a recently-introduced, language-independent slicing technique based on the dependencies observable from program behaviour. Due to the well-known limits of dynamic analysis, we may only compute an under-approximation of the true observation-based slice. However, because the observation-based slice captures all possible dependence that can be observed, even such approximations can yield insight into the limitations of static slicing. For example, a static slice, S, that is strictly smaller than the corresponding observation based slice is potentially unsafe. We present the results of three sets of experiments on 12 different programs, including benchmarks and larger programs, which investigate the relationship between static and observation-based slicing. We show that, in extreme cases, observation-based slices can find the true minimal static slice, where static techniques cannot. For more typical cases, our results illustrate the potential for observation-based slicing to highlight limitations in static slicers. Finally, we report on the sensitivity of observation-based slicing to test quality.","PeriodicalId":192232,"journal":{"name":"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125765855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Can the use of types and query expansion help improve large-scale code search? 类型和查询扩展的使用是否有助于改进大规模代码搜索?

2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)

Pub Date : 2015-11-23 DOI: 10.1109/SCAM.2015.7335400

Otávio Augusto Lazzarini Lemos, A. C. D. Paula, Hitesh Sajnani, C. Lopes

With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function.

随着开源代码运动的兴起，以重用为目的的代码搜索变得越来越流行。以至于研究人员将其称为软件重用的新方面。尽管代码搜索在本质上不同于通用文档搜索，但大多数工具仍然主要依赖与源代码文本匹配的关键字。最近，研究人员提出了更复杂的方法来执行代码搜索，例如在查询中包含接口定义(例如，所需函数的返回和参数类型，以及关键字;这里称为接口驱动代码搜索(IDCS)。然而，据我们所知，很少有实证研究将传统的基于关键字的代码搜索(KBCS)与更先进的方法(如IDCS)进行比较。在本文中，我们描述了一个实验，比较了KBCS和IDCS在Java实现的辅助函数的大规模代码搜索任务中的有效性。我们还测量了基于类型和WordNet的查询扩展对这两种方法的影响。我们的实验涉及36个主题，这些主题为16个不同的辅助函数和一个包含超过2,000,000个Java方法的存储库生成了实际查询。结果表明，当与查询扩展相结合时，类型的使用可以提高召回率和返回的相关函数的数量(#RFR)(召回率提高~30%，#RFR提高~43%)。然而，更详细的分析表明，在某些情况下，最好只使用关键字，特别是当这些关键字足以在语义上定义所需的功能时。

{"title":"Can the use of types and query expansion help improve large-scale code search?","authors":"Otávio Augusto Lazzarini Lemos, A. C. D. Paula, Hitesh Sajnani, C. Lopes","doi":"10.1109/SCAM.2015.7335400","DOIUrl":"https://doi.org/10.1109/SCAM.2015.7335400","url":null,"abstract":"With the open source code movement, code search with the intent of reuse has become increasingly popular. So much so that researchers have been calling it the new facet of software reuse. Although code search differs from general-purpose document search in essential ways, most tools still rely mainly on keywords matched against source code text. Recently, researchers have proposed more sophisticated ways to perform code search, such as including interface definitions in the queries (e.g., return and parameter types of the desired function, along with keywords; called here Interface-Driven Code Search - IDCS). However, to the best of our knowledge, there are few empirical studies that compare traditional keyword-based code search (KBCS) with more advanced approaches such as IDCS. In this paper we describe an experiment that compares the effectiveness of KBCS with IDCS in the task of large-scale code search of auxiliary functions implemented in Java. We also measure the impact of query expansion based on types and WordNet on both approaches. Our experiment involved 36 subjects that produced real-world queries for 16 different auxiliary functions and a repository with more than 2,000,000 Java methods. Results show that the use of types can improve recall and the number of relevant functions returned (#RFR) when combined with query expansion (~30% improvement in recall, and ~43% improvement in #RFR). However, a more detailed analysis suggests that in some situations it is best to use keywords only, in particular when these are sufficient to semantically define the desired function.","PeriodicalId":192232,"journal":{"name":"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122324821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

When code smells twice as much: Metric-based detection of variability-aware code smells 当代码气味增加一倍时:基于度量的可变性感知代码气味检测

2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)

Pub Date : 2015-11-23 DOI: 10.1109/SCAM.2015.7335413

W. Fenske, Sandro Schulze, Daniel Meyer, G. Saake

Code smells are established, widely used characterizations of shortcomings in design and implementation of software systems. As such, they have been subject to intensive research regarding their detection and impact on understandability and changeability of source code. However, current methods do not support highly configurable software systems, that is, systems that can be customized to fit a wide range of requirements or platforms. Such systems commonly owe their configurability to conditional compilation based on C preprocessor annotations (a. k. a. #ifdefs). Since annotations directly interact with the host language (e. g., C), they may have adverse effects on understandability and changeability of source code, referred to as variability-aware code smells. In this paper, we propose a metric-based method that integrates source code and C preprocessor annotations to detect such smells. We evaluate our method for one specific smell on five open-source systems of medium size, thus, demonstrating its general applicability. Moreover, we manually reviewed 100 instances of the smell and provide a qualitative analysis of its potential impact as well as common causes for the occurrence.

代码气味是建立的，广泛用于描述软件系统设计和实现中的缺陷。因此，对于它们的检测和对源代码的可理解性和可变性的影响，它们已经受到了深入的研究。然而，当前的方法不支持高度可配置的软件系统，也就是说，可以定制以适应广泛的需求或平台的系统。这类系统通常将其可配置性归功于基于C预处理器注释(也就是#ifdefs)的条件编译。由于注释直接与宿主语言(例如C语言)交互，因此它们可能对源代码的可理解性和可变性产生不利影响，称为可变性感知代码气味。在本文中，我们提出了一种基于度量的方法，该方法集成了源代码和C预处理器注释来检测这些气味。我们在五个中等规模的开源系统上评估了我们的方法，从而证明了它的一般适用性。此外，我们手动审查了100个气味实例，并对其潜在影响以及发生的常见原因进行了定性分析。

{"title":"When code smells twice as much: Metric-based detection of variability-aware code smells","authors":"W. Fenske, Sandro Schulze, Daniel Meyer, G. Saake","doi":"10.1109/SCAM.2015.7335413","DOIUrl":"https://doi.org/10.1109/SCAM.2015.7335413","url":null,"abstract":"Code smells are established, widely used characterizations of shortcomings in design and implementation of software systems. As such, they have been subject to intensive research regarding their detection and impact on understandability and changeability of source code. However, current methods do not support highly configurable software systems, that is, systems that can be customized to fit a wide range of requirements or platforms. Such systems commonly owe their configurability to conditional compilation based on C preprocessor annotations (a. k. a. #ifdefs). Since annotations directly interact with the host language (e. g., C), they may have adverse effects on understandability and changeability of source code, referred to as variability-aware code smells. In this paper, we propose a metric-based method that integrates source code and C preprocessor annotations to detect such smells. We evaluate our method for one specific smell on five open-source systems of medium size, thus, demonstrating its general applicability. Moreover, we manually reviewed 100 instances of the smell and provide a qualitative analysis of its potential impact as well as common causes for the occurrence.","PeriodicalId":192232,"journal":{"name":"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134393990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

The use of C++ exception handling constructs: A comprehensive study c++异常处理结构的使用:综合研究

2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)

Pub Date : 2015-11-23 DOI: 10.1109/SCAM.2015.7335398

R. Bonifácio, Fausto Carvalho, G. N. Ramos, U. Kulesza, Roberta Coelho

Exception handling (EH) is a well-known mechanism that aims at improving software reliability in a modular way - allowing a better separation between the code that deals with exceptional conditions and the code that deals with the normal control flow of a program. Although the exception handling mechanism was conceived almost 40 years ago, formulating a reasonable design of exception handling code is still considered a challenge, which might hinder its widespread use. This paper reports the results of an empirical study that use a mixed-method approach to investigate the adoption of the exception handing mechanism in C++. Firstly, we carried out a static analysis investigation to understand how developers employ the exception handling construct of C++, considering 65 open-source systems (which comprise 34 million lines of C++ code overall). Then, to better understand the findings from the static analysis phase, we conducted a survey involving 145 C++ developers who have contributed to the subject systems. Some of the findings consistently detected during this mixed-method study reveal that, for several projects, the use of exception handling constructs is scarce and developers favor the use of other strategies to deal with exceptional conditions. In addition, the survey respondents consider that incompatibility with existing C code and libraries, extra performance costs (in terms of response time and size of the compiled code), and lack of expertise to design an exception handling strategy are among the reasons for avoiding the use of exception handling constructs.

异常处理(EH)是一种众所周知的机制，旨在以模块化的方式提高软件可靠性——允许在处理异常条件的代码和处理程序正常控制流的代码之间更好地分离。尽管异常处理机制是在近40年前提出的，但是制定异常处理代码的合理设计仍然被认为是一个挑战，这可能会阻碍异常处理机制的广泛使用。本文报告了一项实证研究的结果，该研究使用混合方法方法来研究c++中异常处理机制的采用。首先，我们进行了静态分析调查，以了解开发人员如何使用c++的异常处理结构，考虑了65个开源系统(总共包含3400万行c++代码)。然后，为了更好地理解静态分析阶段的发现，我们进行了一项涉及145名为主题系统做出贡献的c++开发人员的调查。在这个混合方法研究过程中发现的一些结果表明，对于一些项目，很少使用异常处理构造，开发人员倾向于使用其他策略来处理异常情况。此外，受访者认为与现有C代码和库的不兼容性、额外的性能成本(在响应时间和编译代码的大小方面)以及缺乏设计异常处理策略的专业知识是避免使用异常处理构造的原因之一。

{"title":"The use of C++ exception handling constructs: A comprehensive study","authors":"R. Bonifácio, Fausto Carvalho, G. N. Ramos, U. Kulesza, Roberta Coelho","doi":"10.1109/SCAM.2015.7335398","DOIUrl":"https://doi.org/10.1109/SCAM.2015.7335398","url":null,"abstract":"Exception handling (EH) is a well-known mechanism that aims at improving software reliability in a modular way - allowing a better separation between the code that deals with exceptional conditions and the code that deals with the normal control flow of a program. Although the exception handling mechanism was conceived almost 40 years ago, formulating a reasonable design of exception handling code is still considered a challenge, which might hinder its widespread use. This paper reports the results of an empirical study that use a mixed-method approach to investigate the adoption of the exception handing mechanism in C++. Firstly, we carried out a static analysis investigation to understand how developers employ the exception handling construct of C++, considering 65 open-source systems (which comprise 34 million lines of C++ code overall). Then, to better understand the findings from the static analysis phase, we conducted a survey involving 145 C++ developers who have contributed to the subject systems. Some of the findings consistently detected during this mixed-method study reveal that, for several projects, the use of exception handling constructs is scarce and developers favor the use of other strategies to deal with exceptional conditions. In addition, the survey respondents consider that incompatibility with existing C code and libraries, extra performance costs (in terms of response time and size of the compiled code), and lack of expertise to design an exception handling strategy are among the reasons for avoiding the use of exception handling constructs.","PeriodicalId":192232,"journal":{"name":"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117281178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

The impact of cross-distribution bug duplicates, empirical study on Debian and Ubuntu 交叉发行版bug的影响是重复的，对Debian和Ubuntu的实证研究

2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)

Pub Date : 2015-11-23 DOI: 10.1109/SCAM.2015.7335409

Vincent Boisselle, Bram Adams

Although open source distributions like Debian and Ubuntu are closely related, sometimes a bug reported in the Debian bug repository is reported independently in the Ubuntu repository as well, without the Ubuntu users nor developers being aware. Such cases of undetected cross-distribution bug duplicates can cause developers and users to lose precious time working on a fix that already exists or to work individually instead of collaborating to find a fix faster. We perform a case study on Ubuntu and Debian bug repositories to measure the amount of cross-distribution bug duplicates and estimate the amount of time lost. By adapting an existing within-project duplicate detection approach (achieving a similar recall of 60%), we find 821 cross-duplicates. The early detection of such duplicates could reduce the time lost by users waiting for a fix by a median of 38 days. Furthermore, we estimate that developers from the different distributions lose a median of 47 days in which they could have collaborated together, had they been aware of duplicates. These results show the need to detect and monitor cross-distribution duplicates.

尽管像Debian和Ubuntu这样的开源发行版是密切相关的，但有时在Debian错误存储库中报告的错误也会在Ubuntu存储库中独立报告，而Ubuntu用户和开发人员都不知道。这种未被发现的跨发行版错误重复可能会导致开发人员和用户浪费宝贵的时间来修复已经存在的修复，或者单独工作，而不是协作来更快地找到修复。我们对Ubuntu和Debian的bug库进行了一个案例研究，以衡量跨发行版的重复bug的数量，并估计损失的时间。通过采用现有的项目内部重复检测方法(达到相似的60%召回率)，我们发现了821个交叉重复。早期发现这种重复可以减少用户等待修复的时间，平均减少38天。此外，我们估计来自不同发行版的开发人员损失了平均47天的时间，如果他们意识到重复的话。这些结果表明需要检测和监测交叉分布的重复。

{"title":"The impact of cross-distribution bug duplicates, empirical study on Debian and Ubuntu","authors":"Vincent Boisselle, Bram Adams","doi":"10.1109/SCAM.2015.7335409","DOIUrl":"https://doi.org/10.1109/SCAM.2015.7335409","url":null,"abstract":"Although open source distributions like Debian and Ubuntu are closely related, sometimes a bug reported in the Debian bug repository is reported independently in the Ubuntu repository as well, without the Ubuntu users nor developers being aware. Such cases of undetected cross-distribution bug duplicates can cause developers and users to lose precious time working on a fix that already exists or to work individually instead of collaborating to find a fix faster. We perform a case study on Ubuntu and Debian bug repositories to measure the amount of cross-distribution bug duplicates and estimate the amount of time lost. By adapting an existing within-project duplicate detection approach (achieving a similar recall of 60%), we find 821 cross-duplicates. The early detection of such duplicates could reduce the time lost by users waiting for a fix by a median of 38 days. Furthermore, we estimate that developers from the different distributions lose a median of 47 days in which they could have collaborated together, had they been aware of duplicates. These results show the need to detect and monitor cross-distribution duplicates.","PeriodicalId":192232,"journal":{"name":"2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115687215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15