2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)最新文献_第6页

How not to Structure Your Database-Backed Web Applications: A Study of Performance Bugs in the Wild 如何不构建数据库支持的Web应用程序:野外性能缺陷研究

2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)

Pub Date : 2018-05-27 DOI: 10.1145/3180155.3180194

Junwen Yang, Pranav Subramaniam, Shan Lu, Cong Yan, Alvin Cheung

Many web applications use databases for persistent data storage, and using Object Relational Mapping (ORM) frameworks is a common way to develop such database-backed web applications. Unfortunately, developing efficient ORM applications is challenging, as the ORM framework hides the underlying database query generation and execution. This problem is becoming more severe as these applications need to process an increasingly large amount of persistent data. Recent research has targeted specific aspects of performance problems in ORM applications. However, there has not been any systematic study to identify common performance anti-patterns in real-world such applications, how they affect resulting application performance, and remedies for them. In this paper, we try to answer these questions through a comprehensive study of 12 representative real-world ORM applications. We generalize 9 ORM performance anti-patterns from more than 200 performance issues that we obtain by studying their bug-tracking systems and profiling their latest versions. To prove our point, we manually fix 64 performance issues in their latest versions and obtain a median speedup of 2× (and up to 39× max) with fewer than 5 lines of code change in most cases. Many of the issues we found have been confirmed by developers, and we have implemented ways to identify other code fragments with similar issues as well.

许多web应用程序使用数据库进行持久数据存储，使用对象关系映射(ORM)框架是开发这种数据库支持的web应用程序的常用方法。不幸的是，开发高效的ORM应用程序具有挑战性，因为ORM框架隐藏了底层数据库查询的生成和执行。随着这些应用程序需要处理越来越多的持久数据，这个问题变得越来越严重。最近的研究针对ORM应用程序中性能问题的特定方面。然而，目前还没有任何系统的研究来确定实际应用程序中常见的性能反模式、它们如何影响最终的应用程序性能以及补救措施。在本文中，我们试图通过对12个具有代表性的现实世界ORM应用的全面研究来回答这些问题。我们从200多个性能问题中归纳出9个ORM性能反模式，这些问题是我们通过研究它们的错误跟踪系统和分析它们的最新版本而获得的。为了证明我们的观点，我们在最新版本中手动修复了64个性能问题，并且在大多数情况下，只需要修改不到5行代码，就可以获得2倍的中位数加速(最高可达39倍)。我们发现的许多问题已经得到了开发人员的确认，并且我们已经实现了识别具有类似问题的其他代码片段的方法。

{"title":"How not to Structure Your Database-Backed Web Applications: A Study of Performance Bugs in the Wild","authors":"Junwen Yang, Pranav Subramaniam, Shan Lu, Cong Yan, Alvin Cheung","doi":"10.1145/3180155.3180194","DOIUrl":"https://doi.org/10.1145/3180155.3180194","url":null,"abstract":"Many web applications use databases for persistent data storage, and using Object Relational Mapping (ORM) frameworks is a common way to develop such database-backed web applications. Unfortunately, developing efficient ORM applications is challenging, as the ORM framework hides the underlying database query generation and execution. This problem is becoming more severe as these applications need to process an increasingly large amount of persistent data. Recent research has targeted specific aspects of performance problems in ORM applications. However, there has not been any systematic study to identify common performance anti-patterns in real-world such applications, how they affect resulting application performance, and remedies for them. In this paper, we try to answer these questions through a comprehensive study of 12 representative real-world ORM applications. We generalize 9 ORM performance anti-patterns from more than 200 performance issues that we obtain by studying their bug-tracking systems and profiling their latest versions. To prove our point, we manually fix 64 performance issues in their latest versions and obtain a median speedup of 2× (and up to 39× max) with fewer than 5 lines of code change in most cases. Many of the issues we found have been confirmed by developers, and we have implemented ways to identify other code fragments with similar issues as well.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"71 1","pages":"800-810"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74745491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

DeFlaker: Automatically Detecting Flaky Tests DeFlaker:自动检测片状测试

2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)

Pub Date : 2018-05-27 DOI: 10.1145/3180155.3180164

Jonathan Bell, Owolabi Legunsen, Michael C Hilton, Lamyaa Eloussi, Tifany Yung, D. Marinov

Developers often run tests to check that their latest changes to a code repository did not break any previously working functionality. Ideally, any new test failures would indicate regressions caused by the latest changes. However, some test failures may not be due to the latest changes but due to non-determinism in the tests, popularly called flaky tests. The typical way to detect flaky tests is to rerun failing tests repeatedly. Unfortunately, rerunning failing tests can be costly and can slow down the development cycle. We present the first extensive evaluation of rerunning failing tests and propose a new technique, called DeFlaker, that detects if a test failure is due to a flaky test without rerunning and with very low runtime overhead. DeFlaker monitors the coverage of latest code changes and marks as flaky any newly failing test that did not execute any of the changes. We deployed DeFlaker live, in the build process of 96 Java projects on TravisCI, and found 87 previously unknown flaky tests in 10 of these projects. We also ran experiments on project histories, where DeFlaker detected 1,874 flaky tests from 4,846 failures, with a low false alarm rate (1.5%). DeFlaker had a higher recall (95.5% vs. 23%) of confirmed flaky tests than Maven's default flaky test detector.

开发人员经常运行测试来检查他们对代码存储库的最新更改没有破坏任何先前工作的功能。理想情况下，任何新的测试失败都将表明由最新更改引起的回归。然而，一些测试失败可能不是由于最新的更改，而是由于测试中的不确定性，通常称为片状测试。检测不可靠测试的典型方法是反复运行失败测试。不幸的是，重新运行失败的测试可能代价高昂，并且会减慢开发周期。我们首次对重新运行失败测试进行了广泛的评估，并提出了一种名为DeFlaker的新技术，该技术可以在不重新运行且运行时开销非常低的情况下检测测试失败是否由不可靠的测试引起。DeFlaker监视最新代码更改的覆盖率，并将任何没有执行任何更改的新失败测试标记为片状。我们在TravisCI上的96个Java项目的构建过程中实时部署了DeFlaker，并在其中10个项目中发现了87个以前未知的不可靠测试。我们还对项目历史进行了实验，其中DeFlaker从4,846个失败中检测出1,874个不可靠的测试，误报率很低(1.5%)。与Maven的默认片状测试检测器相比，DeFlaker对已确认的片状测试的召回率(95.5%对23%)更高。

{"title":"DeFlaker: Automatically Detecting Flaky Tests","authors":"Jonathan Bell, Owolabi Legunsen, Michael C Hilton, Lamyaa Eloussi, Tifany Yung, D. Marinov","doi":"10.1145/3180155.3180164","DOIUrl":"https://doi.org/10.1145/3180155.3180164","url":null,"abstract":"Developers often run tests to check that their latest changes to a code repository did not break any previously working functionality. Ideally, any new test failures would indicate regressions caused by the latest changes. However, some test failures may not be due to the latest changes but due to non-determinism in the tests, popularly called flaky tests. The typical way to detect flaky tests is to rerun failing tests repeatedly. Unfortunately, rerunning failing tests can be costly and can slow down the development cycle. We present the first extensive evaluation of rerunning failing tests and propose a new technique, called DeFlaker, that detects if a test failure is due to a flaky test without rerunning and with very low runtime overhead. DeFlaker monitors the coverage of latest code changes and marks as flaky any newly failing test that did not execute any of the changes. We deployed DeFlaker live, in the build process of 96 Java projects on TravisCI, and found 87 previously unknown flaky tests in 10 of these projects. We also ran experiments on project histories, where DeFlaker detected 1,874 flaky tests from 4,846 failures, with a low false alarm rate (1.5%). DeFlaker had a higher recall (95.5% vs. 23%) of confirmed flaky tests than Maven's default flaky test detector.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"88 1","pages":"433-444"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78174893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 156

Accurate and Efficient Refactoring Detection in Commit History 在提交历史中准确高效的重构检测

2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)

Pub Date : 2018-05-27 DOI: 10.1145/3180155.3180206

Nikolaos Tsantalis, Matin Mansouri, L. Eshkevari, D. Mazinanian, Danny Dig

Refactoring detection algorithms have been crucial to a variety of applications: (i) empirical studies about the evolution of code, tests, and faults, (ii) tools for library API migration, (iii) improving the comprehension of changes and code reviews, etc. However, recent research has questioned the accuracy of the state-of-the-art refactoring detection tools, which poses threats to the reliability of their application. Moreover, previous refactoring detection tools are very sensitive to user-provided similarity thresholds, which further reduces their practical accuracy. In addition, their requirement to build the project versions/revisions under analysis makes them inapplicable in many real-world scenarios. To reinvigorate a previously fruitful line of research that has stifled, we designed, implemented, and evaluated RMiner, a technique that overcomes the above limitations. At the heart of RMiner is an AST-based statement matching algorithm that determines refactoring candidates without requiring user-defined thresholds. To empirically evaluate RMiner, we created the most comprehensive oracle to date that uses triangulation to create a dataset with considerably reduced bias, representing 3,188 refactorings from 185 open-source projects. Using this oracle, we found that RMiner has a precision of 98% and recall of 87%, which is a significant improvement over the previous state-of-the-art.

重构检测算法在很多应用中都是至关重要的:(i)关于代码、测试和错误演变的实证研究，(ii)库API迁移的工具，(iii)提高对更改和代码审查的理解，等等。然而，最近的研究对最先进的重构检测工具的准确性提出了质疑，这对其应用的可靠性构成了威胁。此外，以前的重构检测工具对用户提供的相似度阈值非常敏感，这进一步降低了它们的实际准确性。此外，它们需要在分析中构建项目版本/修订，这使得它们在许多实际场景中不适用。为了重振先前已被扼杀的卓有成效的研究，我们设计、实现并评估了RMiner，这是一种克服上述限制的技术。RMiner的核心是一个基于ast的语句匹配算法，该算法确定重构候选对象，而不需要用户定义的阈值。为了对RMiner进行实证评估，我们创建了迄今为止最全面的oracle，它使用三角测量来创建一个偏差大大减少的数据集，代表了来自185个开源项目的3188次重构。使用这个oracle，我们发现RMiner的准确率为98%，召回率为87%，这比以前的最先进的技术有了显著的提高。

{"title":"Accurate and Efficient Refactoring Detection in Commit History","authors":"Nikolaos Tsantalis, Matin Mansouri, L. Eshkevari, D. Mazinanian, Danny Dig","doi":"10.1145/3180155.3180206","DOIUrl":"https://doi.org/10.1145/3180155.3180206","url":null,"abstract":"Refactoring detection algorithms have been crucial to a variety of applications: (i) empirical studies about the evolution of code, tests, and faults, (ii) tools for library API migration, (iii) improving the comprehension of changes and code reviews, etc. However, recent research has questioned the accuracy of the state-of-the-art refactoring detection tools, which poses threats to the reliability of their application. Moreover, previous refactoring detection tools are very sensitive to user-provided similarity thresholds, which further reduces their practical accuracy. In addition, their requirement to build the project versions/revisions under analysis makes them inapplicable in many real-world scenarios. To reinvigorate a previously fruitful line of research that has stifled, we designed, implemented, and evaluated RMiner, a technique that overcomes the above limitations. At the heart of RMiner is an AST-based statement matching algorithm that determines refactoring candidates without requiring user-defined thresholds. To empirically evaluate RMiner, we created the most comprehensive oracle to date that uses triangulation to create a dataset with considerably reduced bias, representing 3,188 refactorings from 185 open-source projects. Using this oracle, we found that RMiner has a precision of 98% and recall of 87%, which is a significant improvement over the previous state-of-the-art.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"176 1","pages":"483-494"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73960138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 221

Repairing Crashes in Android Apps 修复Android应用程序的崩溃

2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)

Pub Date : 2018-05-27 DOI: 10.1145/3180155.3180243

Shin Hwei Tan, Zhen Dong, Xiang Gao, Abhik Roychoudhury

Android apps are omnipresent, and frequently suffer from crashes — leading to poor user experience and economic loss. Past work focused on automated test generation to detect crashes in Android apps. However, automated repair of crashes has not been studied. In this paper, we propose the first approach to automatically repair Android apps, specifically we propose a technique for fixing crashes in Android apps. Unlike most test-based repair approaches, we do not need a test-suite; instead a single failing test is meticulously analyzed for crash locations and reasons behind these crashes. Our approach hinges on a careful empirical study which seeks to establish common root-causes for crashes in Android apps, and then distills the remedy of these root-causes in the form of eight generic transformation operators. These operators are applied using a search-based repair framework embodied in our repair tool Droix. We also prepare a benchmark DroixBench capturing reproducible crashes in Android apps. Our evaluation of Droix on DroixBench reveals that the automatically produced patches are often syntactically identical to the human patch, and on some rare occasion even better than the human patch (in terms of avoiding regressions). These results confirm our intuition that our proposed transformations form a sufficient set of operators to patch crashes in Android.

Android应用无处不在，经常崩溃——导致糟糕的用户体验和经济损失。过去的工作主要集中在自动测试生成，以检测Android应用程序的崩溃。然而，车祸的自动修复还没有研究过。在本文中，我们提出了第一种自动修复Android应用程序的方法，特别是我们提出了一种修复Android应用程序崩溃的技术。与大多数基于测试的修复方法不同，我们不需要测试套件;相反，一个失败的测试会仔细分析崩溃的位置和崩溃背后的原因。我们的方法取决于仔细的实证研究，该研究试图建立Android应用程序崩溃的常见根源，然后以8个通用转换操作符的形式提炼出这些根源的补救措施。这些操作符使用了我们的维修工具Droix中包含的基于搜索的维修框架。我们还准备了一个基准测试DroixBench，用于捕获Android应用程序中可重复的崩溃。我们在DroixBench上对Droix的评估显示，自动生成的补丁通常在语法上与人类补丁相同，在某些罕见的情况下甚至比人类补丁更好(在避免回归方面)。这些结果证实了我们的直觉，即我们提出的转换形成了一组足够的操作符来修补Android中的崩溃。

{"title":"Repairing Crashes in Android Apps","authors":"Shin Hwei Tan, Zhen Dong, Xiang Gao, Abhik Roychoudhury","doi":"10.1145/3180155.3180243","DOIUrl":"https://doi.org/10.1145/3180155.3180243","url":null,"abstract":"Android apps are omnipresent, and frequently suffer from crashes — leading to poor user experience and economic loss. Past work focused on automated test generation to detect crashes in Android apps. However, automated repair of crashes has not been studied. In this paper, we propose the first approach to automatically repair Android apps, specifically we propose a technique for fixing crashes in Android apps. Unlike most test-based repair approaches, we do not need a test-suite; instead a single failing test is meticulously analyzed for crash locations and reasons behind these crashes. Our approach hinges on a careful empirical study which seeks to establish common root-causes for crashes in Android apps, and then distills the remedy of these root-causes in the form of eight generic transformation operators. These operators are applied using a search-based repair framework embodied in our repair tool Droix. We also prepare a benchmark DroixBench capturing reproducible crashes in Android apps. Our evaluation of Droix on DroixBench reveals that the automatically produced patches are often syntactically identical to the human patch, and on some rare occasion even better than the human patch (in terms of avoiding regressions). These results confirm our intuition that our proposed transformations form a sufficient set of operators to patch crashes in Android.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"14 1","pages":"187-198"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80474148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 67

Program Splicing 程序连接

2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)

Pub Date : 2018-05-27 DOI: 10.1145/3180155.3180190

Yanxin Lu, Swarat Chaudhuri, C. Jermaine, David Melski

We introduce program splicing, a programming methodology that aims to automate the work ow of copying, pasting, and modifying code available online. Here, the programmer starts by writing a "draft" that mixes un nished code, natural language comments, and correctness requirements. A program synthesizer that interacts with a large, searchable database of program snippets is used to automatically complete the draft into a program that meets the re-quirements. The synthesis process happens in two stages. First, the synthesizer identi es a small number of programs in the database that are relevant to the synthesis task. Next it uses an enumerative search to systematically ll the draft with expressions and statements from these relevant programs. The resulting program is returned to the programmer, who can modify it and possibly invoke additional rounds of synthesis. We present an implementation of program splicing, called Splicer, for the Java programming language. Splicer uses a corpus of over 3.5 million procedures from an open-source software repository. Our evaluation uses the system in a suite of everyday programming tasks, and includes a comparison with a state-of-the-art competing approach as well as a user study. The results point to the broad scope and scalability of program splicing and indicate that the approach can signi cantly boost programmer productivity.

我们介绍程序拼接，这是一种编程方法，旨在自动化复制、粘贴和修改在线可用代码的工作流程。在这里，程序员首先编写一个“草案”，其中混合了未完成的代码、自然语言注释和正确性要求。程序合成器与一个大型的、可搜索的程序片段数据库相互作用，用于自动将草稿完成为满足要求的程序。合成过程分为两个阶段。首先，合成器识别数据库中与合成任务相关的少量程序。然后，它使用枚举搜索系统地搜索所有来自这些相关程序的表达式和语句的草案。生成的程序返回给程序员，程序员可以对其进行修改，并可能调用额外的合成轮。我们提出了一个程序拼接的实现，称为Splicer，用于Java编程语言。Splicer使用来自开源软件存储库的超过350万个过程的语料库。我们的评估在一套日常编程任务中使用该系统，并包括与最先进的竞争方法的比较以及用户研究。结果表明，程序拼接具有广泛的范围和可扩展性，并表明该方法可以显著提高程序员的工作效率。

{"title":"Program Splicing","authors":"Yanxin Lu, Swarat Chaudhuri, C. Jermaine, David Melski","doi":"10.1145/3180155.3180190","DOIUrl":"https://doi.org/10.1145/3180155.3180190","url":null,"abstract":"We introduce program splicing, a programming methodology that aims to automate the work ow of copying, pasting, and modifying code available online. Here, the programmer starts by writing a \"draft\" that mixes un nished code, natural language comments, and correctness requirements. A program synthesizer that interacts with a large, searchable database of program snippets is used to automatically complete the draft into a program that meets the re-quirements. The synthesis process happens in two stages. First, the synthesizer identi es a small number of programs in the database that are relevant to the synthesis task. Next it uses an enumerative search to systematically ll the draft with expressions and statements from these relevant programs. The resulting program is returned to the programmer, who can modify it and possibly invoke additional rounds of synthesis. We present an implementation of program splicing, called Splicer, for the Java programming language. Splicer uses a corpus of over 3.5 million procedures from an open-source software repository. Our evaluation uses the system in a suite of everyday programming tasks, and includes a comparison with a state-of-the-art competing approach as well as a user study. The results point to the broad scope and scalability of program splicing and indicate that the approach can signi cantly boost programmer productivity.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"49 1","pages":"338-349"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91324104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Identifying Design Problems in the Source Code: A Grounded Theory 识别源代码中的设计问题:一个有根据的理论

2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)

Pub Date : 2018-05-27 DOI: 10.1145/3180155.3180239

L. Sousa, Anderson Oliveira, W. Oizumi, Simone Diniz Junqueira Barbosa, Alessandro F. Garcia, Jaejoon Lee, Marcos Kalinowski, R. Mello, B. Neto, R. Oliveira, C. Lucena, R. Paes

The prevalence of design problems may cause re-engineering or even discontinuation of the system. Due to missing, informal or outdated design documentation, developers often have to rely on the source code to identify design problems. Therefore, developers have to analyze different symptoms that manifest in several code elements, which may quickly turn into a complex task. Although researchers have been investigating techniques to help developers in identifying design problems, there is little knowledge on how developers actually proceed to identify design problems. In order to tackle this problem, we conducted a multi-trial industrial experiment with professionals from 5 software companies to build a grounded theory. The resulting theory offers explanations on how developers identify design problems in practice. For instance, it reveals the characteristics of symptoms that developers consider helpful. Moreover, developers often combine different types of symptoms to identify a single design problem. This knowledge serves as a basis to further understand the phenomena and advance towards more effective identification techniques.

设计问题的普遍存在可能导致系统的重新设计甚至终止。由于缺少、非正式或过时的设计文档，开发人员经常不得不依赖源代码来识别设计问题。因此，开发人员必须分析在几个代码元素中出现的不同症状，这可能很快就会变成一项复杂的任务。尽管研究人员一直在研究帮助开发人员识别设计问题的技术，但对于开发人员实际上如何识别设计问题却知之甚少。为了解决这个问题，我们与来自5家软件公司的专业人员进行了多试验工业实验，以建立一个接地气的理论。由此产生的理论解释了开发人员如何在实践中识别设计问题。例如，它揭示了开发人员认为有用的症状特征。此外，开发人员经常结合不同类型的症状来识别单个设计问题。这些知识是进一步理解这些现象和开发更有效的识别技术的基础。

{"title":"Identifying Design Problems in the Source Code: A Grounded Theory","authors":"L. Sousa, Anderson Oliveira, W. Oizumi, Simone Diniz Junqueira Barbosa, Alessandro F. Garcia, Jaejoon Lee, Marcos Kalinowski, R. Mello, B. Neto, R. Oliveira, C. Lucena, R. Paes","doi":"10.1145/3180155.3180239","DOIUrl":"https://doi.org/10.1145/3180155.3180239","url":null,"abstract":"The prevalence of design problems may cause re-engineering or even discontinuation of the system. Due to missing, informal or outdated design documentation, developers often have to rely on the source code to identify design problems. Therefore, developers have to analyze different symptoms that manifest in several code elements, which may quickly turn into a complex task. Although researchers have been investigating techniques to help developers in identifying design problems, there is little knowledge on how developers actually proceed to identify design problems. In order to tackle this problem, we conducted a multi-trial industrial experiment with professionals from 5 software companies to build a grounded theory. The resulting theory offers explanations on how developers identify design problems in practice. For instance, it reveals the characteristics of symptoms that developers consider helpful. Moreover, developers often combine different types of symptoms to identify a single design problem. This knowledge serves as a basis to further understand the phenomena and advance towards more effective identification techniques.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"14 2 1","pages":"921-931"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90266951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Testing Vision-Based Control Systems Using Learnable Evolutionary Algorithms 使用可学习进化算法测试基于视觉的控制系统

2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)

Pub Date : 2018-05-27 DOI: 10.1145/3180155.3180160

Raja Ben Abdessalem, S. Nejati, L. Briand, Thomas Stifter

Vision-based control systems are key enablers of many autonomous vehicular systems, including self-driving cars. Testing such systems is complicated by complex and multidimensional input spaces. We propose an automated testing algorithm that builds on learnable evolutionary algorithms. These algorithms rely on machine learning or a combination of machine learning and Darwinian genetic operators to guide the generation of new solutions (test scenarios in our context). Our approach combines multiobjective population-based search algorithms and decision tree classification models to achieve the following goals: First, classification models guide the search-based generation of tests faster towards critical test scenarios (i.e., test scenarios leading to failures). Second, search algorithms refine classification models so that the models can accurately characterize critical regions (i.e., the regions of a test input space that are likely to contain most critical test scenarios). Our evaluation performed on an industrial automotive automotive system shows that: (1) Our algorithm outperforms a baseline evolutionary search algorithm and generates 78% more distinct, critical test scenarios compared to the baseline algorithm. (2) Our algorithm accurately characterizes critical regions of the system under test, thus identifying the conditions that are likely to lead to system failures.

基于视觉的控制系统是包括自动驾驶汽车在内的许多自动驾驶车辆系统的关键促成因素。测试这样的系统由于复杂和多维的输入空间而变得复杂。我们提出了一种基于可学习进化算法的自动化测试算法。这些算法依赖于机器学习或机器学习与达尔文遗传算子的结合来指导新解决方案的生成(在我们的环境中测试场景)。我们的方法结合了基于多目标群体的搜索算法和决策树分类模型来实现以下目标:首先，分类模型引导基于搜索的测试更快地生成关键测试场景(即导致失败的测试场景)。其次，搜索算法改进分类模型，使模型能够准确地描述关键区域(即，测试输入空间中可能包含最关键测试场景的区域)。我们在工业汽车系统上进行的评估表明:(1)我们的算法优于基线进化搜索算法，与基线算法相比，生成的不同关键测试场景多78%。(2)我们的算法准确表征了被测系统的关键区域，从而识别出可能导致系统故障的条件。

{"title":"Testing Vision-Based Control Systems Using Learnable Evolutionary Algorithms","authors":"Raja Ben Abdessalem, S. Nejati, L. Briand, Thomas Stifter","doi":"10.1145/3180155.3180160","DOIUrl":"https://doi.org/10.1145/3180155.3180160","url":null,"abstract":"Vision-based control systems are key enablers of many autonomous vehicular systems, including self-driving cars. Testing such systems is complicated by complex and multidimensional input spaces. We propose an automated testing algorithm that builds on learnable evolutionary algorithms. These algorithms rely on machine learning or a combination of machine learning and Darwinian genetic operators to guide the generation of new solutions (test scenarios in our context). Our approach combines multiobjective population-based search algorithms and decision tree classification models to achieve the following goals: First, classification models guide the search-based generation of tests faster towards critical test scenarios (i.e., test scenarios leading to failures). Second, search algorithms refine classification models so that the models can accurately characterize critical regions (i.e., the regions of a test input space that are likely to contain most critical test scenarios). Our evaluation performed on an industrial automotive automotive system shows that: (1) Our algorithm outperforms a baseline evolutionary search algorithm and generates 78% more distinct, critical test scenarios compared to the baseline algorithm. (2) Our algorithm accurately characterizes critical regions of the system under test, thus identifying the conditions that are likely to lead to system failures.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"22 1","pages":"1016-1026"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89139603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 186

Does the Propagation of Artifact Changes Across Tasks Reflect Work Dependencies? 工件变更在任务间的传播是否反映了工作依赖关系?

2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)

Pub Date : 2018-05-27 DOI: 10.1145/3180155.3180185

Christoph Mayr-Dorn, Alexander Egyed

Developers commonly define tasks to help coordinate software development efforts---whether they be feature implementation, refactoring, or bug fixes. Developers establish links between tasks to express implicit dependencies that needs explicit handling---dependencies that often require the developers responsible for a given task to assess how changes in a linked task affect their own work and vice versa (i.e., change propagation). While seemingly useful, it is unknown if change propagation indeed coincides with task links. No study has investigated to what extent change propagation actually occurs between task pairs and whether it is able to serve as a metric for characterizing the underlying task dependency. In this paper, we study the temporal relationship between developer reading and changing of source code in relationship to task links We identify seven situations that explain the varying correlation of change propagation with linked task pairs and find six motifs describing when change propagation occurs between non-linked task pairs. Our paper demonstrates that task links are indeed useful for recommending which artifacts to monitor for changes, which developers to involve in a task, or which tasks to inspect.

开发人员通常定义任务来帮助协调软件开发工作——无论是特性实现、重构还是错误修复。开发人员在任务之间建立链接，以表达需要显式处理的隐式依赖关系——依赖关系通常要求负责给定任务的开发人员评估链接任务中的更改如何影响他们自己的工作，反之亦然(即，更改传播)。虽然看起来很有用，但不知道更改传播是否确实与任务链接一致。目前还没有研究调查任务对之间变化传播的实际程度，以及它是否能够作为表征潜在任务依赖性的指标。在本文中，我们研究了开发人员阅读和更改源代码与任务链接之间的时间关系。我们确定了七种情况，解释了更改传播与链接任务对之间的不同相关性，并找到了六个描述更改传播发生在非链接任务对之间的基序。我们的论文论证了任务链接对于推荐哪些工件需要监控变更，哪些开发人员需要参与到一个任务中，或者哪些任务需要检查是非常有用的。

{"title":"Does the Propagation of Artifact Changes Across Tasks Reflect Work Dependencies?","authors":"Christoph Mayr-Dorn, Alexander Egyed","doi":"10.1145/3180155.3180185","DOIUrl":"https://doi.org/10.1145/3180155.3180185","url":null,"abstract":"Developers commonly define tasks to help coordinate software development efforts---whether they be feature implementation, refactoring, or bug fixes. Developers establish links between tasks to express implicit dependencies that needs explicit handling---dependencies that often require the developers responsible for a given task to assess how changes in a linked task affect their own work and vice versa (i.e., change propagation). While seemingly useful, it is unknown if change propagation indeed coincides with task links. No study has investigated to what extent change propagation actually occurs between task pairs and whether it is able to serve as a metric for characterizing the underlying task dependency. In this paper, we study the temporal relationship between developer reading and changing of source code in relationship to task links We identify seven situations that explain the varying correlation of change propagation with linked task pairs and find six motifs describing when change propagation occurs between non-linked task pairs. Our paper demonstrates that task links are indeed useful for recommending which artifacts to monitor for changes, which developers to involve in a task, or which tasks to inspect.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"116 1","pages":"397-407"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78230861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Self-Hiding Behavior in Android Apps: Detection and Characterization

2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)

Pub Date : 2018-05-27 DOI: 10.1145/3180155.3180214

Zhiyong Shan, Iulian Neamtiu, Raina Samuel

Applications (apps) that conceal their activities are fundamentally deceptive; app marketplaces and end-users should treat such apps as suspicious. However, due to its nature and intent, activity concealing is not disclosed up-front, which puts users at risk. In this paper, we focus on characterization and detection of such techniques, e.g., hiding the app or removing traces, which we call "self hiding behavior" (SHB). SHB has not been studied per se – rather it has been reported on only as a byproduct of malware investigations. We address this gap via a study and suite of static analyses targeted at SH in Android apps. Specifically, we present (1) a detailed characterization of SHB, (2) a suite of static analyses to detect such behavior, and (3) a set of detectors that employ SHB to distinguish between benign and malicious apps. We show that SHB ranges from hiding the app's presence or activity to covering an app's traces, e.g., by blocking phone calls/text messages or removing calls and messages from logs. Using our static analysis tools on a large dataset of 9,452 Android apps (benign as well as malicious) we expose the frequency of 12 such SH behaviors. Our approach is effective: it has revealed that malicious apps employ 1.5 SHBs per app on average. Surprisingly, SH behavior is also employed by legitimate ("benign") apps, which can affect users negatively in multiple ways. When using our approach for separating malicious from benign apps, our approach has high precision and recall (combined F-measure = 87.19%). Our approach is also efficient, with analysis typically taking just 37 seconds per app. We believe that our findings and analysis tool are beneficial to both app marketplaces and end-users.

隐藏其活动的应用程序从根本上说是欺骗性的;应用市场和终端用户应该将此类应用视为可疑应用。然而，由于其性质和意图，活动隐藏不会预先披露，这将使用户处于危险之中。在本文中，我们专注于这些技术的表征和检测，例如隐藏应用程序或删除痕迹，我们称之为“自我隐藏行为”(SHB)。SHB本身并没有被研究过——它只是作为恶意软件调查的副产品被报道过。我们通过一项研究和一套针对Android应用中的SH的静态分析来解决这一差距。具体来说，我们提出(1)SHB的详细特征，(2)一套检测此类行为的静态分析，以及(3)一组使用SHB来区分良性和恶意应用程序的检测器。我们展示了SHB的范围从隐藏应用程序的存在或活动到覆盖应用程序的踪迹，例如，通过阻止电话/短信或从日志中删除电话和消息。使用我们的静态分析工具对9,452个Android应用程序(良性和恶意)的大型数据集进行分析，我们暴露了12种此类SH行为的频率。我们的方法是有效的:它揭示了恶意应用程序平均每个应用程序使用1.5 shb。令人惊讶的是，合法(“良性”)应用程序也会使用这种行为，这会以多种方式对用户产生负面影响。当使用我们的方法分离恶意和良性应用程序时，我们的方法具有很高的精度和召回率(综合F-measure = 87.19%)。我们的方法也很有效，每个应用的分析通常只需要37秒。我们相信我们的发现和分析工具对应用市场和最终用户都是有益的。

{"title":"Self-Hiding Behavior in Android Apps: Detection and Characterization","authors":"Zhiyong Shan, Iulian Neamtiu, Raina Samuel","doi":"10.1145/3180155.3180214","DOIUrl":"https://doi.org/10.1145/3180155.3180214","url":null,"abstract":"Applications (apps) that conceal their activities are fundamentally deceptive; app marketplaces and end-users should treat such apps as suspicious. However, due to its nature and intent, activity concealing is not disclosed up-front, which puts users at risk. In this paper, we focus on characterization and detection of such techniques, e.g., hiding the app or removing traces, which we call \"self hiding behavior\" (SHB). SHB has not been studied per se – rather it has been reported on only as a byproduct of malware investigations. We address this gap via a study and suite of static analyses targeted at SH in Android apps. Specifically, we present (1) a detailed characterization of SHB, (2) a suite of static analyses to detect such behavior, and (3) a set of detectors that employ SHB to distinguish between benign and malicious apps. We show that SHB ranges from hiding the app's presence or activity to covering an app's traces, e.g., by blocking phone calls/text messages or removing calls and messages from logs. Using our static analysis tools on a large dataset of 9,452 Android apps (benign as well as malicious) we expose the frequency of 12 such SH behaviors. Our approach is effective: it has revealed that malicious apps employ 1.5 SHBs per app on average. Surprisingly, SH behavior is also employed by legitimate (\"benign\") apps, which can affect users negatively in multiple ways. When using our approach for separating malicious from benign apps, our approach has high precision and recall (combined F-measure = 87.19%). Our approach is also efficient, with analysis typically taking just 37 seconds per app. We believe that our findings and analysis tool are beneficial to both app marketplaces and end-users.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"43 1","pages":"728-739"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77118010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Deep Code Search 深度代码搜索

2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)

Pub Date : 2018-05-27 DOI: 10.1145/3180155.3180167

Xiaodong Gu, Hongyu Zhang, Sunghun Kim

To implement a program functionality, developers can reuse previously written code snippets by searching through a large-scale codebase. Over the years, many code search tools have been proposed to help developers. The existing approaches often treat source code as textual documents and utilize information retrieval models to retrieve relevant code snippets that match a given query. These approaches mainly rely on the textual similarity between source code and natural language query. They lack a deep understanding of the semantics of queries and source code. In this paper, we propose a novel deep neural network named CODEnn (Code-Description Embedding Neural Network). Instead of matching text similarity, CODEnn jointly embeds code snippets and natural language descriptions into a high-dimensional vector space, in such a way that code snippet and its corresponding description have similar vectors. Using the unified vector representation, code snippets related to a natural language query can be retrieved according to their vectors. Semantically related words can also be recognized and irrelevant/noisy keywords in queries can be handled. As a proof-of-concept application, we implement a code search tool named DeepCS using the proposed CODEnn model. We empirically evaluate DeepCS on a large scale codebase collected from GitHub. The experimental results show that our approach can effectively retrieve relevant code snippets and outperforms previous techniques.

为了实现程序功能，开发人员可以通过搜索大规模的代码库来重用以前编写的代码片段。多年来，已经提出了许多代码搜索工具来帮助开发人员。现有的方法通常将源代码视为文本文档，并利用信息检索模型检索与给定查询匹配的相关代码片段。这些方法主要依赖于源代码和自然语言查询之间的文本相似度。他们缺乏对查询和源代码语义的深刻理解。本文提出了一种新的深度神经网络——编码描述嵌入神经网络(Code-Description Embedding neural network)。CODEnn不匹配文本相似度，而是将代码片段和自然语言描述共同嵌入到高维向量空间中，从而使代码片段及其对应的描述具有相似的向量。使用统一的向量表示，可以根据其向量检索与自然语言查询相关的代码片段。语义相关的词也可以被识别，查询中不相关/嘈杂的关键字也可以处理。作为概念验证应用，我们使用提出的CODEnn模型实现了一个名为DeepCS的代码搜索工具。我们在从GitHub收集的大规模代码库上对DeepCS进行了经验评估。实验结果表明，该方法可以有效地检索出相关的代码片段，优于现有的方法。

{"title":"Deep Code Search","authors":"Xiaodong Gu, Hongyu Zhang, Sunghun Kim","doi":"10.1145/3180155.3180167","DOIUrl":"https://doi.org/10.1145/3180155.3180167","url":null,"abstract":"To implement a program functionality, developers can reuse previously written code snippets by searching through a large-scale codebase. Over the years, many code search tools have been proposed to help developers. The existing approaches often treat source code as textual documents and utilize information retrieval models to retrieve relevant code snippets that match a given query. These approaches mainly rely on the textual similarity between source code and natural language query. They lack a deep understanding of the semantics of queries and source code. In this paper, we propose a novel deep neural network named CODEnn (Code-Description Embedding Neural Network). Instead of matching text similarity, CODEnn jointly embeds code snippets and natural language descriptions into a high-dimensional vector space, in such a way that code snippet and its corresponding description have similar vectors. Using the unified vector representation, code snippets related to a natural language query can be retrieved according to their vectors. Semantically related words can also be recognized and irrelevant/noisy keywords in queries can be handled. As a proof-of-concept application, we implement a code search tool named DeepCS using the proposed CODEnn model. We empirically evaluate DeepCS on a large scale codebase collected from GitHub. The experimental results show that our approach can effectively retrieve relevant code snippets and outperforms previous techniques.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"71 1","pages":"933-944"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81848528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 484