2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)最新文献_第6页

Mining Software Defects: Should We Consider Affected Releases? 挖掘软件缺陷:我们应该考虑受影响的版本吗?

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00075

S. Yatish, Jirayus Jiarpakdee, Patanamon Thongtanunam, C. Tantithamthavorn

With the rise of the Mining Software Repositories (MSR) field, defect datasets extracted from software repositories play a foundational role in many empirical studies related to software quality. At the core of defect data preparation is the identification of post-release defects. Prior studies leverage many heuristics (e.g., keywords and issue IDs) to identify post-release defects. However, such the heuristic approach is based on several assumptions, which pose common threats to the validity of many studies. In this paper, we set out to investigate the nature of the difference of defect datasets generated by the heuristic approach and the realistic approach that leverages the earliest affected release that is realistically estimated by a software development team for a given defect. In addition, we investigate the impact of defect identification approaches on the predictive accuracy and the ranking of defective modules that are produced by defect models. Through a case study of defect datasets of 32 releases, we find that that the heuristic approach has a large impact on both defect count datasets and binary defect datasets. Surprisingly, we find that the heuristic approach has a minimal impact on defect count models, suggesting that future work should not be too concerned about defect count models that are constructed using heuristic defect datasets. On the other hand, using defect datasets generated by the realistic approach lead to an improvement in the predictive accuracy of defect classification models.

随着挖掘软件存储库(MSR)领域的兴起，从软件存储库中提取的缺陷数据集在许多与软件质量相关的实证研究中起着基础作用。缺陷数据准备的核心是发布后缺陷的识别。先前的研究利用了许多启发式方法(例如，关键字和问题id)来识别发布后的缺陷。然而，这种启发式方法是基于几个假设，这对许多研究的有效性构成了共同的威胁。在本文中，我们着手调查由启发式方法和实际方法生成的缺陷数据集差异的本质，实际方法利用了由软件开发团队对给定缺陷实际估计的最早受影响的版本。此外，我们还研究了缺陷识别方法对缺陷模型产生的缺陷模块的预测精度和排名的影响。通过对32个版本的缺陷数据集的案例研究，我们发现启发式方法对缺陷计数数据集和二进制缺陷数据集都有很大的影响。令人惊讶的是，我们发现启发式方法对缺陷计数模型的影响很小，这表明未来的工作不应该过于关注使用启发式缺陷数据集构建的缺陷计数模型。另一方面，利用现实方法生成的缺陷数据集可以提高缺陷分类模型的预测精度。

{"title":"Mining Software Defects: Should We Consider Affected Releases?","authors":"S. Yatish, Jirayus Jiarpakdee, Patanamon Thongtanunam, C. Tantithamthavorn","doi":"10.1109/ICSE.2019.00075","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00075","url":null,"abstract":"With the rise of the Mining Software Repositories (MSR) field, defect datasets extracted from software repositories play a foundational role in many empirical studies related to software quality. At the core of defect data preparation is the identification of post-release defects. Prior studies leverage many heuristics (e.g., keywords and issue IDs) to identify post-release defects. However, such the heuristic approach is based on several assumptions, which pose common threats to the validity of many studies. In this paper, we set out to investigate the nature of the difference of defect datasets generated by the heuristic approach and the realistic approach that leverages the earliest affected release that is realistically estimated by a software development team for a given defect. In addition, we investigate the impact of defect identification approaches on the predictive accuracy and the ranking of defective modules that are produced by defect models. Through a case study of defect datasets of 32 releases, we find that that the heuristic approach has a large impact on both defect count datasets and binary defect datasets. Surprisingly, we find that the heuristic approach has a minimal impact on defect count models, suggesting that future work should not be too concerned about defect count models that are constructed using heuristic defect datasets. On the other hand, using defect datasets generated by the realistic approach lead to an improvement in the predictive accuracy of defect classification models.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"46 1","pages":"654-665"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77259891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 70

When Code Completion Fails: A Case Study on Real-World Completions 当代码完成失败:一个关于真实世界完成的案例研究

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00101

V. Hellendoorn, Sebastian Proksch, H. Gall, Alberto Bacchelli

Code completion is commonly used by software developers and is integrated into all major IDE's. Good completion tools can not only save time and effort but may also help avoid incorrect API usage. Many proposed completion tools have shown promising results on synthetic benchmarks, but these benchmarks make no claims about the realism of the completions they test. This lack of grounding in real-world data could hinder our scientific understanding of developer needs and of the efficacy of completion models. This paper presents a case study on 15,000 code completions that were applied by 66 real developers, which we study and contrast with artificial completions to inform future research and tools in this area. We find that synthetic benchmarks misrepresent many aspects of real-world completions; tested completion tools were far less accurate on real-world data. Worse, on the few completions that consumed most of the developers' time, prediction accuracy was less than 20% -- an effect that is invisible in synthetic benchmarks. Our findings have ramifications for future benchmarks, tool design and real-world efficacy: Benchmarks must account for completions that developers use most, such as intra-project APIs; models should be designed to be amenable to intra-project data; and real-world developer trials are essential to quantifying performance on the least predictable completions, which are both most time-consuming and far more typical than artificial data suggests. We publicly release our preprint [https://doi.org/10.5281/zenodo.2565673] and replication data and materials [https://doi.org/10.5281/zenodo.2562249].

代码补全通常被软件开发人员使用，并且集成到所有主要的IDE中。好的完井工具不仅可以节省时间和精力，还可以帮助避免错误的API使用。许多建议的完井工具在综合基准测试中显示出有希望的结果，但是这些基准测试并没有说明它们所测试完井的真实性。缺乏真实数据的基础可能会阻碍我们对开发人员需求和完井模型有效性的科学理解。本文提出了一个由66名真正的开发人员应用的15000个代码完成的案例研究，我们将其与人工完成进行研究和对比，以告知该领域未来的研究和工具。我们发现，合成基准错误地反映了现实完井的许多方面;经过测试的完井工具在实际数据上的准确性要低得多。更糟糕的是，在少数几个消耗了开发人员大部分时间的完井中，预测准确率低于20%——这在合成基准测试中是看不见的。我们的研究结果对未来的基准测试、工具设计和现实世界的效率有影响:基准测试必须考虑开发人员使用最多的完成，比如项目内部api;模型的设计应符合项目内部数据;现实世界的开发人员试验对于量化最不可预测完井的性能至关重要，这些完井既耗时又比人工数据所显示的更为典型。我们公开发布我们的预印本[https://doi.org/10.5281/zenodo.2565673]和复制数据和材料[https://doi.org/10.5281/zenodo.2562249]。

{"title":"When Code Completion Fails: A Case Study on Real-World Completions","authors":"V. Hellendoorn, Sebastian Proksch, H. Gall, Alberto Bacchelli","doi":"10.1109/ICSE.2019.00101","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00101","url":null,"abstract":"Code completion is commonly used by software developers and is integrated into all major IDE's. Good completion tools can not only save time and effort but may also help avoid incorrect API usage. Many proposed completion tools have shown promising results on synthetic benchmarks, but these benchmarks make no claims about the realism of the completions they test. This lack of grounding in real-world data could hinder our scientific understanding of developer needs and of the efficacy of completion models. This paper presents a case study on 15,000 code completions that were applied by 66 real developers, which we study and contrast with artificial completions to inform future research and tools in this area. We find that synthetic benchmarks misrepresent many aspects of real-world completions; tested completion tools were far less accurate on real-world data. Worse, on the few completions that consumed most of the developers' time, prediction accuracy was less than 20% -- an effect that is invisible in synthetic benchmarks. Our findings have ramifications for future benchmarks, tool design and real-world efficacy: Benchmarks must account for completions that developers use most, such as intra-project APIs; models should be designed to be amenable to intra-project data; and real-world developer trials are essential to quantifying performance on the least predictable completions, which are both most time-consuming and far more typical than artificial data suggests. We publicly release our preprint [https://doi.org/10.5281/zenodo.2565673] and replication data and materials [https://doi.org/10.5281/zenodo.2562249].","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"8 1","pages":"960-970"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89302549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

Hunting for Bugs in Code Coverage Tools via Randomized Differential Testing 通过随机差异测试在代码覆盖工具中寻找bug

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00061

Yibiao Yang, Yuming Zhou, Hao Sun, Z. Su, Zhiqiang Zuo, Lei Xu, Baowen Xu

Reliable code coverage tools are critically important as it is heavily used to facilitate many quality assurance activities, such as software testing, fuzzing, and debugging. However, little attention has been devoted to assessing the reliability of code coverage tools. In this study, we propose a randomized differential testing approach to hunting for bugs in the most widely used C code coverage tools. Specifically, by generating random input programs, our approach seeks for inconsistencies in code coverage reports produced by different code coverage tools, and then identifies inconsistencies as potential code coverage bugs. To effectively report code coverage bugs, we addressed three specific challenges: (1) How to filter out duplicate test programs as many of them triggering the same bugs in code coverage tools; (2) how to automatically reduce large test programs to much smaller ones that have the same properties; and (3) how to determine which code coverage tools have bugs? The extensive evaluations validate the effectiveness of our approach, resulting in 42 and 28 confirmed/fixed bugs for gcov and llvm-cov, respectively. This case study indicates that code coverage tools are not as reliable as it might have been envisaged. It not only demonstrates the effectiveness of our approach, but also highlights the need to continue improving the reliability of code coverage tools. This work opens up a new direction in code coverage validation which calls for more attention in this area.

可靠的代码覆盖工具非常重要，因为它被大量用于促进许多质量保证活动，例如软件测试、模糊测试和调试。然而，很少有人关注评估代码覆盖工具的可靠性。在这项研究中，我们提出了一种随机差异测试方法来寻找最广泛使用的C代码覆盖工具中的错误。具体地说，通过生成随机输入程序，我们的方法寻找由不同代码覆盖工具生成的代码覆盖报告中的不一致性，然后将不一致性识别为潜在的代码覆盖错误。为了有效地报告代码覆盖错误，我们提出了三个具体的挑战:(1)如何过滤掉重复的测试程序，因为它们中的许多会在代码覆盖工具中触发相同的错误;(2)如何自动将大型测试程序缩减为具有相同属性的小得多的测试程序;(3)如何确定哪些代码覆盖工具有bug ?广泛的评估验证了我们方法的有效性，分别为gcov和llvm-cov确认/修复了42和28个bug。这个案例研究表明代码覆盖工具并不像想象的那样可靠。它不仅证明了我们方法的有效性，而且还强调了继续改进代码覆盖工具的可靠性的必要性。这项工作为代码覆盖验证开辟了一个新的方向，需要更多的关注。

{"title":"Hunting for Bugs in Code Coverage Tools via Randomized Differential Testing","authors":"Yibiao Yang, Yuming Zhou, Hao Sun, Z. Su, Zhiqiang Zuo, Lei Xu, Baowen Xu","doi":"10.1109/ICSE.2019.00061","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00061","url":null,"abstract":"Reliable code coverage tools are critically important as it is heavily used to facilitate many quality assurance activities, such as software testing, fuzzing, and debugging. However, little attention has been devoted to assessing the reliability of code coverage tools. In this study, we propose a randomized differential testing approach to hunting for bugs in the most widely used C code coverage tools. Specifically, by generating random input programs, our approach seeks for inconsistencies in code coverage reports produced by different code coverage tools, and then identifies inconsistencies as potential code coverage bugs. To effectively report code coverage bugs, we addressed three specific challenges: (1) How to filter out duplicate test programs as many of them triggering the same bugs in code coverage tools; (2) how to automatically reduce large test programs to much smaller ones that have the same properties; and (3) how to determine which code coverage tools have bugs? The extensive evaluations validate the effectiveness of our approach, resulting in 42 and 28 confirmed/fixed bugs for gcov and llvm-cov, respectively. This case study indicates that code coverage tools are not as reliable as it might have been envisaged. It not only demonstrates the effectiveness of our approach, but also highlights the need to continue improving the reliability of code coverage tools. This work opens up a new direction in code coverage validation which calls for more attention in this area.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"8 1","pages":"488-499"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86713090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams Java 8流智能并行化的安全自动重构

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00072

Raffi Khatchadourian, Yiming Tang, M. Bagherzadeh, Syed Ahmed

Streaming APIs are becoming more pervasive in mainstream Object-Oriented programming languages. For example, the Stream API introduced in Java 8 allows for functional-like, MapReduce-style operations in processing both finite and infinite data structures. However, using this API efficiently involves subtle considerations like determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. In this paper, we present an automated refactoring approach that assists developers in writing efficient stream code in a semantics-preserving fashion. The approach, based on a novel data ordering and typestate analysis, consists of preconditions for automatically determining when it is safe and possibly advantageous to convert sequential streams to parallel and unorder or de-parallelize already parallel streams. The approach was implemented as a plug-in to the Eclipse IDE, uses the WALA and SAFE analysis frameworks, and was evaluated on 11 Java projects consisting of ?642K lines of code. We found that 57 of 157 candidate streams (36.31%) were refactorable, and an average speedup of 3.49 on performance tests was observed. The results indicate that the approach is useful in optimizing stream code to their full potential.

流api在主流的面向对象编程语言中变得越来越普遍。例如，Java 8中引入的Stream API允许在处理有限和无限数据结构时使用类似函数的mapreduce风格的操作。然而，有效地使用此API涉及一些微妙的考虑，例如确定何时并行运行流操作是最好的，何时并行运行操作可能效率较低，以及由于可能的lambda表达式副作用，何时并行运行是安全的。在本文中，我们提出了一种自动化重构方法，帮助开发人员以保持语义的方式编写高效的流代码。该方法基于一种新颖的数据排序和类型状态分析，包括自动确定何时安全且可能有利地将顺序流转换为并行和无序流或将已经并行的流去并行化的先决条件。该方法是作为Eclipse IDE的插件实现的，使用了WALA和SAFE分析框架，并在11个包含642K行代码的Java项目上进行了评估。我们发现157个候选流中有57个(36.31%)是可重构的，并且在性能测试中观察到平均加速为3.49。结果表明，该方法在优化流代码以充分发挥其潜力方面是有用的。

{"title":"Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams","authors":"Raffi Khatchadourian, Yiming Tang, M. Bagherzadeh, Syed Ahmed","doi":"10.1109/ICSE.2019.00072","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00072","url":null,"abstract":"Streaming APIs are becoming more pervasive in mainstream Object-Oriented programming languages. For example, the Stream API introduced in Java 8 allows for functional-like, MapReduce-style operations in processing both finite and infinite data structures. However, using this API efficiently involves subtle considerations like determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. In this paper, we present an automated refactoring approach that assists developers in writing efficient stream code in a semantics-preserving fashion. The approach, based on a novel data ordering and typestate analysis, consists of preconditions for automatically determining when it is safe and possibly advantageous to convert sequential streams to parallel and unorder or de-parallelize already parallel streams. The approach was implemented as a plug-in to the Eclipse IDE, uses the WALA and SAFE analysis frameworks, and was evaluated on 11 Java projects consisting of ?642K lines of code. We found that 57 of 157 candidate streams (36.31%) were refactorable, and an average speedup of 3.49 on performance tests was observed. The results indicate that the approach is useful in optimizing stream code to their full potential.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"7 1","pages":"619-630"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81946310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Practical GUI Testing of Android Applications Via Model Abstraction and Refinement 通过模型抽象和细化的Android应用程序的实际GUI测试

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00042

Tianxiao Gu, Chengnian Sun, Xiaoxing Ma, Chun Cao, Chang Xu, Yuan Yao, Qirun Zhang, Jian Lu, Z. Su

This paper introduces a new, fully automated modelbased approach for effective testing of Android apps. Different from existing model-based approaches that guide testing with a static GUI model (i.e., the model does not evolve its abstraction during testing, and is thus often imprecise), our approach dynamically optimizes the model by leveraging the runtime information during testing. This capability of model evolution significantly improves model precision, and thus dramatically enhances the testing effectiveness compared to existing approaches, which our evaluation confirms.We have realized our technique in a practical tool, APE. On 15 large, widely-used apps from the Google Play Store, APE outperforms the state-of-the-art Android GUI testing tools in terms of both testing coverage and the number of detected unique crashes. To further demonstrate APE’s effectiveness and usability, we conduct another evaluation of APE on 1,316 popular apps, where it found 537 unique crashes. Out of the 38 reported crashes, 13 have been fixed and 5 have been confirmed.

本文介绍了一种新的、全自动的基于模型的方法来有效地测试Android应用程序。与使用静态GUI模型指导测试的现有基于模型的方法不同(即，模型在测试期间不会发展其抽象，因此通常是不精确的)，我们的方法通过在测试期间利用运行时信息来动态优化模型。这种模型演化能力显著提高了模型精度，与现有方法相比显著提高了测试效率，我们的评估证实了这一点。我们已经在一个实用的工具APE中实现了我们的技术。在来自Google Play Store的15款大型且广泛使用的应用中，APE在测试覆盖率和检测到的独特崩溃数量方面都优于最先进的Android GUI测试工具。为了进一步证明APE的有效性和可用性，我们对1,316个流行应用程序进行了另一次APE评估，发现了537个独特的崩溃。在38起报告的撞车事故中，13起已经修复，5起已经确认。

{"title":"Practical GUI Testing of Android Applications Via Model Abstraction and Refinement","authors":"Tianxiao Gu, Chengnian Sun, Xiaoxing Ma, Chun Cao, Chang Xu, Yuan Yao, Qirun Zhang, Jian Lu, Z. Su","doi":"10.1109/ICSE.2019.00042","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00042","url":null,"abstract":"This paper introduces a new, fully automated modelbased approach for effective testing of Android apps. Different from existing model-based approaches that guide testing with a static GUI model (i.e., the model does not evolve its abstraction during testing, and is thus often imprecise), our approach dynamically optimizes the model by leveraging the runtime information during testing. This capability of model evolution significantly improves model precision, and thus dramatically enhances the testing effectiveness compared to existing approaches, which our evaluation confirms.We have realized our technique in a practical tool, APE. On 15 large, widely-used apps from the Google Play Store, APE outperforms the state-of-the-art Android GUI testing tools in terms of both testing coverage and the number of detected unique crashes. To further demonstrate APE’s effectiveness and usability, we conduct another evaluation of APE on 1,316 popular apps, where it found 537 unique crashes. Out of the 38 reported crashes, 13 have been fixed and 5 have been confirmed.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"5 1","pages":"269-280"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72645971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 93

AutoTap: Synthesizing and Repairing Trigger-Action Programs Using LTL Properties AutoTap:使用LTL属性合成和修复触发操作程序

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00043

Lefan Zhang, Weijia He, Jesse Martinez, Noah Brackenbury, Shan Lu, Blase Ur

End-user programming, particularly trigger-action programming (TAP), is a popular method of letting users express their intent for how smart devices and cloud services interact. Unfortunately, sometimes it can be challenging for users to correctly express their desires through TAP. This paper presents AutoTap, a system that lets novice users easily specify desired properties for devices and services. AutoTap translates these properties to linear temporal logic (LTL) and both automatically synthesizes property-satisfying TAP rules from scratch and repairs existing TAP rules. We designed AutoTap based on a user study about properties users wish to express. Through a second user study, we show that novice users made significantly fewer mistakes when expressing desired behaviors using AutoTap than using TAP rules. Our experiments show that AutoTap is a simple and effective option for expressive end-user programming.

终端用户编程，特别是触发操作编程(TAP)，是一种流行的方法，可以让用户表达他们对智能设备和云服务如何交互的意图。不幸的是，用户有时很难通过TAP正确地表达他们的愿望。本文介绍了AutoTap，一个允许新手用户轻松指定设备和服务所需属性的系统。AutoTap将这些属性转换为线性时序逻辑(LTL)，然后自动从零开始合成满足属性的TAP规则，并修复现有的TAP规则。我们在对用户希望表达的属性进行研究的基础上设计了AutoTap。通过第二项用户研究，我们发现新手用户在使用AutoTap表达期望行为时比使用TAP规则犯的错误要少得多。我们的实验表明，AutoTap是一个简单而有效的选择，表达终端用户编程。

引用次数: 48

How Practitioners Perceive Coding Proficiency 从业人员如何看待编码熟练程度

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00098

Xin Xia, Zhiyuan Wan, Pavneet Singh Kochhar, D. Lo

Coding proficiency is essential to software practitioners. Unfortunately, our understanding on coding proficiency often translates to vague stereotypes, e.g., "able to write good code". The lack of specificity hinders employers from measuring a software engineer's coding proficiency, and software engineers from improving their coding proficiency skills. This raises an important question: what skills matter to improve one's coding proficiency. To answer this question, we perform an empirical study by surveying 340 software practitioners from 33 countries across 5 continents. We first identify 38 coding proficiency skills grouped into nine categories by interviewing 15 developers from three companies. We then ask our survey respondents to rate the level of importance for these skills, and provide rationales of their ratings. Our study highlights a total of 21 important skills that receive an average rating of 4.0 and above (important and very important), along with rationales given by proponents and dissenters. We discuss implications of our findings to researchers, educators, and practitioners.

精通编码对软件从业者来说是必不可少的。不幸的是，我们对编码熟练程度的理解常常转化为模糊的刻板印象，例如，“能够写出好的代码”。缺乏专用性阻碍了雇主衡量软件工程师的编码熟练程度，也阻碍了软件工程师提高编码熟练程度的技能。这就提出了一个重要的问题:什么技能对提高一个人的编码熟练程度很重要。为了回答这个问题，我们进行了一项实证研究，调查了来自5大洲33个国家的340名软件从业者。我们首先通过采访来自三家公司的15名开发人员，将38种编码熟练程度分为9类。然后，我们要求我们的调查受访者对这些技能的重要性进行评级，并提供他们评级的基本原理。我们的研究强调了21项重要的技能，这些技能的平均评分在4.0以上(重要和非常重要)，以及支持者和反对者给出的理由。我们讨论了我们的发现对研究人员、教育工作者和实践者的影响。

引用次数: 18

ActionNet: Vision-Based Workflow Action Recognition From Programming Screencasts ActionNet:基于视觉的工作流动作识别

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00049

Dehai Zhao, Zhenchang Xing, Chunyang Chen, Xin Xia, Guoqiang Li

Programming screencasts have two important applications in software engineering context: study developer behaviors, information needs and disseminate software engineering knowledge. Although programming screencasts are easy to produce, they are not easy to analyze or index due to the image nature of the data. Existing techniques extract only content from screencasts, but ignore workflow actions by which developers accomplish programming tasks. This significantly limits the effective use of programming screencasts in downstream applications. In this paper, we are the first to present a novel technique for recognizing workflow actions in programming screencasts. Our technique exploits image differencing and Convolutional Neural Network (CNN) to analyze the correspondence and change of consecutive frames, based on which nine classes of frequent developer actions can be recognized from programming screencasts. Using programming screencasts from Youtube, we evaluate different configurations of our CNN model and the performance of our technique for developer action recognition across developers, working environments and programming languages. Using screencasts of developers’ real work, we demonstrate the usefulness of our technique in a practical application for actionaware extraction of key-code frames in developers’ work.

编程视频在软件工程环境中有两个重要的应用:研究开发人员的行为、信息需求和传播软件工程知识。虽然编程视频很容易制作，但由于数据的图像性质，它们不容易分析或索引。现有的技术只从屏幕视频中提取内容，而忽略了开发人员完成编程任务的工作流操作。这极大地限制了在下游应用程序中编程屏幕视频的有效使用。在本文中，我们首先提出了一种新的技术来识别编程视频中的工作流动作。我们的技术利用图像差分和卷积神经网络(CNN)来分析连续帧的对应关系和变化，在此基础上可以从编程视频中识别出9类频繁的开发人员动作。使用Youtube上的编程视频，我们评估了CNN模型的不同配置，以及我们在开发人员、工作环境和编程语言之间的开发人员动作识别技术的性能。通过使用开发人员实际工作的屏幕视频，我们演示了我们的技术在开发人员工作中的关键代码帧的动作感知提取的实际应用中的实用性。

{"title":"ActionNet: Vision-Based Workflow Action Recognition From Programming Screencasts","authors":"Dehai Zhao, Zhenchang Xing, Chunyang Chen, Xin Xia, Guoqiang Li","doi":"10.1109/ICSE.2019.00049","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00049","url":null,"abstract":"Programming screencasts have two important applications in software engineering context: study developer behaviors, information needs and disseminate software engineering knowledge. Although programming screencasts are easy to produce, they are not easy to analyze or index due to the image nature of the data. Existing techniques extract only content from screencasts, but ignore workflow actions by which developers accomplish programming tasks. This significantly limits the effective use of programming screencasts in downstream applications. In this paper, we are the first to present a novel technique for recognizing workflow actions in programming screencasts. Our technique exploits image differencing and Convolutional Neural Network (CNN) to analyze the correspondence and change of consecutive frames, based on which nine classes of frequent developer actions can be recognized from programming screencasts. Using programming screencasts from Youtube, we evaluate different configurations of our CNN model and the performance of our technique for developer action recognition across developers, working environments and programming languages. Using screencasts of developers’ real work, we demonstrate the usefulness of our technique in a practical application for actionaware extraction of key-code frames in developers’ work.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"77 1","pages":"350-361"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83993841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Detection and Repair of Architectural Inconsistencies in Java Java中体系结构不一致的检测和修复

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00067

Negar Ghorbani, Joshua Garcia, S. Malek

Java is one of the most widely used programming languages. However, the absence of explicit support for architectural constructs, such as software components, in the programming language itself has prevented software developers from achieving the many benefits that come with architecture-based development. To address this issue, Java 9 has introduced the Java Platform Module System (JPMS), resulting in the first instance of encapsulation of modules with rich software architectural interfaces added to a mainstream programming language. The primary goal of JPMS is to construct and maintain large applications efficiently-as well as improve the encapsulation, security, and maintainability of Java applications in general and the JDK itself. A challenge, however, is that module declarations do not necessarily reflect actual usage of modules in an application, allowing developers to mistakenly specify inconsistent dependencies among the modules. In this paper, we formally define 8 inconsistent modular dependencies that may arise in Java-9 applications. We also present DARCY, an approach that leverages these definitions and static program analyses to automatically (1) detect the specified inconsistent dependencies within Java applications and (2) repair those identified inconsistencies. The results of our experiments, conducted over 38 open-source Java-9 applications, indicate that architectural inconsistencies are widespread and demonstrate the benefits of DARCY in automated detection and repair of these inconsistencies.

Java是使用最广泛的编程语言之一。然而，编程语言本身缺乏对体系结构构造(如软件组件)的显式支持，这阻碍了软件开发人员获得基于体系结构的开发带来的许多好处。为了解决这个问题，Java 9引入了Java平台模块系统(Java Platform Module System, JPMS)，这是将具有丰富软件架构接口的模块封装到主流编程语言中的第一个实例。JPMS的主要目标是高效地构建和维护大型应用程序，以及改进Java应用程序和JDK本身的封装、安全性和可维护性。然而，一个挑战是模块声明不一定反映应用程序中模块的实际使用情况，从而允许开发人员错误地指定模块之间不一致的依赖关系。在本文中，我们正式定义了Java-9应用程序中可能出现的8个不一致的模块化依赖项。我们还介绍了DARCY，这是一种利用这些定义和静态程序分析来自动地(1)检测Java应用程序中指定的不一致的依赖关系，(2)修复那些已识别的不一致。我们在38个开源Java-9应用程序上进行的实验结果表明，体系结构的不一致性是普遍存在的，并且证明了DARCY在自动检测和修复这些不一致性方面的好处。

{"title":"Detection and Repair of Architectural Inconsistencies in Java","authors":"Negar Ghorbani, Joshua Garcia, S. Malek","doi":"10.1109/ICSE.2019.00067","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00067","url":null,"abstract":"Java is one of the most widely used programming languages. However, the absence of explicit support for architectural constructs, such as software components, in the programming language itself has prevented software developers from achieving the many benefits that come with architecture-based development. To address this issue, Java 9 has introduced the Java Platform Module System (JPMS), resulting in the first instance of encapsulation of modules with rich software architectural interfaces added to a mainstream programming language. The primary goal of JPMS is to construct and maintain large applications efficiently-as well as improve the encapsulation, security, and maintainability of Java applications in general and the JDK itself. A challenge, however, is that module declarations do not necessarily reflect actual usage of modules in an application, allowing developers to mistakenly specify inconsistent dependencies among the modules. In this paper, we formally define 8 inconsistent modular dependencies that may arise in Java-9 applications. We also present DARCY, an approach that leverages these definitions and static program analyses to automatically (1) detect the specified inconsistent dependencies within Java applications and (2) repair those identified inconsistencies. The results of our experiments, conducted over 38 open-source Java-9 applications, indicate that architectural inconsistencies are widespread and demonstrate the benefits of DARCY in automated detection and repair of these inconsistencies.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"10 1","pages":"560-571"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83004484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Distance-Based Sampling of Software Configuration Spaces 基于距离的软件组态空间采样

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

Pub Date : 2019-05-01 DOI: 10.1109/ICSE.2019.00112

Christian Kaltenecker, A. Grebhahn, Norbert Siegmund, Jianmei Guo, S. Apel

Configurable software systems provide a multitude of configuration options to adjust and optimize their functional and non-functional properties. For instance, to find the fastest configuration for a given setting, a brute-force strategy measures the performance of all configurations, which is typically intractable. Addressing this challenge, state-of-the-art strategies rely on machine learning, analyzing only a few configurations (i.e., a sample set) to predict the performance of other configurations. However, to obtain accurate performance predictions, a representative sample set of configurations is required. Addressing this task, different sampling strategies have been proposed, which come with different advantages (e.g., covering the configuration space systematically) and disadvantages (e.g., the need to enumerate all configurations). In our experiments, we found that most sampling strategies do not achieve a good coverage of the configuration space with respect to covering relevant performance values. That is, they miss important configurations with distinct performance behavior. Based on this observation, we devise a new sampling strategy, called distance-based sampling, that is based on a distance metric and a probability distribution to spread the configurations of the sample set according to a given probability distribution across the configuration space. This way, we cover different kinds of interactions among configuration options in the sample set. To demonstrate the merits of distance-based sampling, we compare it to state-of-the-art sampling strategies, such as t-wise sampling, on $10$ real-world configurable software systems. Our results show that distance-based sampling leads to more accurate performance models for medium to large sample sets.

可配置软件系统提供了大量的配置选项来调整和优化其功能和非功能属性。例如，为了找到给定设置的最快配置，蛮力策略测量所有配置的性能，这通常是难以处理的。为了应对这一挑战，最先进的策略依赖于机器学习，仅分析少数配置(即样本集)来预测其他配置的性能。然而，为了获得准确的性能预测，需要一个具有代表性的配置样本集。针对这一任务，提出了不同的采样策略，这些策略具有不同的优点(例如，系统地覆盖配置空间)和缺点(例如，需要枚举所有配置)。在我们的实验中，我们发现大多数采样策略在覆盖相关性能值方面不能很好地覆盖配置空间。也就是说，它们错过了具有不同性能行为的重要配置。基于这一观察，我们设计了一种新的采样策略，称为基于距离的采样，它基于距离度量和概率分布，根据给定的概率分布在配置空间中扩展样本集的配置。这样，我们就涵盖了示例集中配置选项之间的不同类型的交互。为了证明基于距离的采样的优点，我们将其与最先进的采样策略进行比较，例如在$10$真实世界的可配置软件系统上的t-wise采样。我们的研究结果表明，基于距离的采样可以为中大型样本集提供更准确的性能模型。

{"title":"Distance-Based Sampling of Software Configuration Spaces","authors":"Christian Kaltenecker, A. Grebhahn, Norbert Siegmund, Jianmei Guo, S. Apel","doi":"10.1109/ICSE.2019.00112","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00112","url":null,"abstract":"Configurable software systems provide a multitude of configuration options to adjust and optimize their functional and non-functional properties. For instance, to find the fastest configuration for a given setting, a brute-force strategy measures the performance of all configurations, which is typically intractable. Addressing this challenge, state-of-the-art strategies rely on machine learning, analyzing only a few configurations (i.e., a sample set) to predict the performance of other configurations. However, to obtain accurate performance predictions, a representative sample set of configurations is required. Addressing this task, different sampling strategies have been proposed, which come with different advantages (e.g., covering the configuration space systematically) and disadvantages (e.g., the need to enumerate all configurations). In our experiments, we found that most sampling strategies do not achieve a good coverage of the configuration space with respect to covering relevant performance values. That is, they miss important configurations with distinct performance behavior. Based on this observation, we devise a new sampling strategy, called distance-based sampling, that is based on a distance metric and a probability distribution to spread the configurations of the sample set according to a given probability distribution across the configuration space. This way, we cover different kinds of interactions among configuration options in the sample set. To demonstrate the merits of distance-based sampling, we compare it to state-of-the-art sampling strategies, such as t-wise sampling, on $10$ real-world configurable software systems. Our results show that distance-based sampling leads to more accurate performance models for medium to large sample sets.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"22 1","pages":"1084-1094"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76515171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 81