2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)最新文献

Effectiveness and Challenges in Generating Concurrent Tests for Thread-Safe Classes 为线程安全类生成并发测试的有效性和挑战

2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)

Pub Date : 2018-09-01 DOI: 10.1145/3238147.3238224

Valerio Terragni, M. Pezzè

Developing correct and efficient concurrent programs is difficult and error-prone, due to the complexity of thread synchronization. Often, developers alleviate such problem by relying on thread-safe classes, which encapsulate most synchronization-related challenges. Thus, testing such classes is crucial to ensure the reliability of the concurrency aspects of programs. Some recent techniques and corresponding tools tackle the problem of testing thread-safe classes by automatically generating concurrent tests. In this paper, we present a comprehensive study of the state-of-the-art techniques and an independent empirical evaluation of the publicly available tools. We conducted the study by executing all tools on the JaConTeBe benchmark that contains 47 well-documented concurrency faults. Our results show that 8 out of 47 faults (17%) were detected by at least one tool. By studying the issues of the tools and the generated tests, we derive insights to guide future research on improving the effectiveness of automated concurrent test generation.

由于线程同步的复杂性，开发正确、高效的并发程序非常困难，而且容易出错。通常，开发人员通过依赖线程安全类来缓解此类问题，线程安全类封装了大多数与同步相关的挑战。因此，测试这些类对于确保程序并发性方面的可靠性至关重要。一些最新的技术和相应的工具通过自动生成并发测试来解决测试线程安全类的问题。在本文中，我们提出了最先进的技术的综合研究，并对公开可用的工具进行了独立的实证评估。我们通过在JaConTeBe基准测试上执行所有工具来进行这项研究，该基准测试包含47个记录良好的并发性错误。我们的结果表明，47个故障中有8个(17%)被至少一个工具检测到。通过研究工具和生成的测试的问题，我们得出了指导未来研究提高自动化并发测试生成的有效性的见解。

引用次数: 15

An Automated Approach to Estimating Code Coverage Measures via Execution Logs 通过执行日志估算代码覆盖度量的自动化方法

2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)

Pub Date : 2018-09-01 DOI: 10.1145/3238147.3238214

Boyuan Chen, Jian Song, Peng Xu, Xing Hu, Z. Jiang

Software testing is a widely used technique to ensure the quality of software systems. Code coverage measures are commonly used to evaluate and improve the existing test suites. Based on our industrial and open source studies, existing state-of-the-art code coverage tools are only used during unit and integration testing due to issues like engineering challenges, performance overhead, and incomplete results. To resolve these issues, in this paper we have proposed an automated approach, called LogCoCo, to estimating code coverage measures using the readily available execution logs. Using program analysis techniques, LogCoCo matches the execution logs with their corresponding code paths and estimates three different code coverage criteria: method coverage, statement coverage, and branch coverage. Case studies on one open source system (HBase) and five commercial systems from Baidu and systems show that: (1) the results of LogCoCo are highly accurate (> 96% in seven out of nine experiments) under a variety of testing activities (unit testing, integration testing, and benchmarking); and (2) the results of LogCoCo can be used to evaluate and improve the existing test suites. Our collaborators at Baidu are currently considering adopting LogCoCo and use it on a daily basis.

软件测试是一种广泛使用的技术，以确保软件系统的质量。代码覆盖度量通常用于评估和改进现有的测试套件。根据我们的工业和开源研究，由于工程挑战、性能开销和不完整的结果等问题，现有的最先进的代码覆盖工具仅在单元和集成测试期间使用。为了解决这些问题，在本文中，我们提出了一种称为LogCoCo的自动化方法，使用现成的执行日志来估计代码覆盖率。使用程序分析技术，LogCoCo将执行日志与其相应的代码路径相匹配，并估计三种不同的代码覆盖标准:方法覆盖、语句覆盖和分支覆盖。对一个开源系统(HBase)和五个百度商业系统和系统的案例研究表明:(1)LogCoCo在多种测试活动(单元测试、集成测试和基准测试)下的结果具有很高的准确率(9个实验中有7个> 96%);(2) LogCoCo的结果可用于评估和改进现有的测试套件。我们在百度的合作伙伴目前正在考虑采用LogCoCo，并在日常生活中使用它。

{"title":"An Automated Approach to Estimating Code Coverage Measures via Execution Logs","authors":"Boyuan Chen, Jian Song, Peng Xu, Xing Hu, Z. Jiang","doi":"10.1145/3238147.3238214","DOIUrl":"https://doi.org/10.1145/3238147.3238214","url":null,"abstract":"Software testing is a widely used technique to ensure the quality of software systems. Code coverage measures are commonly used to evaluate and improve the existing test suites. Based on our industrial and open source studies, existing state-of-the-art code coverage tools are only used during unit and integration testing due to issues like engineering challenges, performance overhead, and incomplete results. To resolve these issues, in this paper we have proposed an automated approach, called LogCoCo, to estimating code coverage measures using the readily available execution logs. Using program analysis techniques, LogCoCo matches the execution logs with their corresponding code paths and estimates three different code coverage criteria: method coverage, statement coverage, and branch coverage. Case studies on one open source system (HBase) and five commercial systems from Baidu and systems show that: (1) the results of LogCoCo are highly accurate (> 96% in seven out of nine experiments) under a variety of testing activities (unit testing, integration testing, and benchmarking); and (2) the results of LogCoCo can be used to evaluate and improve the existing test suites. Our collaborators at Baidu are currently considering adopting LogCoCo and use it on a daily basis.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"28 2 1","pages":"305-316"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79359958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

An Empirical Study of Android Test Generation Tools in Industrial Cases Android测试生成工具在工业案例中的实证研究

2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)

Pub Date : 2018-09-01 DOI: 10.1145/3238147.3240465

Wenyu Wang, Dengfeng Li, Wei Yang, Yurui Cao, Zhenwen Zhang, Yuetang Deng, Tao Xie

User Interface (UI) testing is a popular approach to ensure the quality of mobile apps. Numerous test generation tools have been developed to support UI testing on mobile apps, especially for Android apps. Previous work evaluates and compares different test generation tools using only relatively simple open-source apps, while real-world industrial apps tend to have more complex functionalities and implementations. There is no direct comparison among test generation tools with regard to effectiveness and ease-of-use on these industrial apps. To address such limitation, we study existing state-of-the-art or state-of-the-practice test generation tools on 68 widely-used industrial apps. We directly compare the tools with regard to code coverage and fault-detection ability. According to our results, Monkey, a state-of-the-practice tool from Google, achieves the highest method coverage on 22 of 41 apps whose method coverage data can be obtained. Of all 68 apps under study, Monkey also achieves the highest activity coverage on 35 apps, while Stoat, a state-of-the-art tool, is able to trigger the highest number of unique crashes on 23 apps. By analyzing the experimental results, we provide suggestions for combining different test generation tools to achieve better performance. We also report our experience in applying these tools to industrial apps under study. Our study results give insights on how Android UI test generation tools could be improved to better handle complex industrial apps.

用户界面(UI)测试是一种确保移动应用质量的流行方法。已经开发了许多测试生成工具来支持移动应用程序(特别是Android应用程序)的UI测试。以前的工作只使用相对简单的开源应用程序来评估和比较不同的测试生成工具，而现实世界的工业应用程序往往具有更复杂的功能和实现。在这些工业应用程序的有效性和易用性方面，测试生成工具之间没有直接的比较。为了解决这一限制，我们研究了68个广泛使用的工业应用程序上现有的最先进或最先进的测试生成工具。我们直接比较了这些工具的代码覆盖率和故障检测能力。根据我们的结果，来自Google的最先进的工具Monkey在41个可以获得方法覆盖率数据的应用程序中的22个应用程序中实现了最高的方法覆盖率。在我们所研究的68款应用中，Monkey在35款应用中获得了最高的活跃覆盖率，而Stoat(一款最先进的工具)在23款应用中触发了最高的唯一崩溃次数。通过对实验结果的分析，提出了如何结合不同的测试生成工具来获得更好的性能的建议。我们还报告了将这些工具应用于正在研究的工业应用程序的经验。我们的研究结果为如何改进Android UI测试生成工具以更好地处理复杂的工业应用提供了见解。

{"title":"An Empirical Study of Android Test Generation Tools in Industrial Cases","authors":"Wenyu Wang, Dengfeng Li, Wei Yang, Yurui Cao, Zhenwen Zhang, Yuetang Deng, Tao Xie","doi":"10.1145/3238147.3240465","DOIUrl":"https://doi.org/10.1145/3238147.3240465","url":null,"abstract":"User Interface (UI) testing is a popular approach to ensure the quality of mobile apps. Numerous test generation tools have been developed to support UI testing on mobile apps, especially for Android apps. Previous work evaluates and compares different test generation tools using only relatively simple open-source apps, while real-world industrial apps tend to have more complex functionalities and implementations. There is no direct comparison among test generation tools with regard to effectiveness and ease-of-use on these industrial apps. To address such limitation, we study existing state-of-the-art or state-of-the-practice test generation tools on 68 widely-used industrial apps. We directly compare the tools with regard to code coverage and fault-detection ability. According to our results, Monkey, a state-of-the-practice tool from Google, achieves the highest method coverage on 22 of 41 apps whose method coverage data can be obtained. Of all 68 apps under study, Monkey also achieves the highest activity coverage on 35 apps, while Stoat, a state-of-the-art tool, is able to trigger the highest number of unique crashes on 23 apps. By analyzing the experimental results, we provide suggestions for combining different test generation tools to achieve better performance. We also report our experience in applying these tools to industrial apps under study. Our study results give insights on how Android UI test generation tools could be improved to better handle complex industrial apps.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"14 1","pages":"738-748"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85971489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 78

Automatic Mining of Constraints for Monitoring Systems of Systems 系统的监控系统约束的自动挖掘

2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)

Pub Date : 2018-09-01 DOI: 10.1145/3238147.3241532

Thomas Krismayer

The behavior of complex software-intensive systems of systems often only fully emerges during operation, when all systems interact with each other and with their environment. Runtime monitoring approaches are thus used to detect deviations from the expected behavior, which is commonly defined by engineers, e.g., using temporal logic or domain-specific languages. However, the deep domain knowledge required to specify constraints is often not available during the development of systems of systems with multiple teams independently working on heterogeneous components. In this paper, we thus describe our ongoing PhD research to automatically mine constraints for runtime monitoring from recorded events. Our approach mines constraints on event occurrence, timing, data, and combinations of these properties. The approach further presents the mined constraints to users offering multiple ranking strategies and can also be used to support users in system evolution scenarios.

复杂的软件密集型系统的行为通常只有在所有系统相互作用并与其环境相互作用时才完全显现出来。因此，运行时监控方法用于检测与预期行为的偏差，这通常由工程师定义，例如，使用时间逻辑或特定于领域的语言。然而，在由多个团队独立处理异构组件的系统的系统的开发过程中，指定约束所需的深度领域知识通常是不可用的。因此，在本文中，我们描述了我们正在进行的博士研究，以从记录的事件中自动挖掘运行时监控的约束。我们的方法挖掘事件发生、时间、数据和这些属性组合的约束。该方法进一步向用户提供了多种排序策略，并可用于支持系统演进场景中的用户。

引用次数: 1

Neural-Machine-Translation-Based Commit Message Generation: How Far Are We? 基于神经机器翻译的提交信息生成:我们走了多远?

2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)

Pub Date : 2018-09-01 DOI: 10.1145/3238147.3238190

Zhongxin Liu, Xin Xia, A. Hassan, D. Lo, Zhenchang Xing, Xinyu Wang

Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected by developers. To address this problem, Jiang et al. proposed an approach (we refer to it as NMT), which leverages a neural machine translation algorithm to automatically generate short commit messages from code. The reported performance of their approach is promising, however, they did not explore why their approach performs well. Thus, in this paper, we first perform an in-depth analysis of their experimental results. We find that (1) Most of the test diffs from which NMT can generate high-quality messages are similar to one or more training diffs at the token level. (2) About 16% of the commit messages in Jiang et al.'s dataset are noisy due to being automatically generated or due to them describing repetitive trivial changes. (3) The performance of NMT declines by a large amount after removing such noisy commit messages. In addition, NMT is complicated and time-consuming. Inspired by our first finding, we proposed a simpler and faster approach, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm. Our experimental results show that NNGen is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU (an accuracy measure that is widely used to evaluate machine translation systems) by 21%. Finally, we also discuss some observations for the road ahead for automated commit message generation to inspire other researchers.

提交消息可以被视为软件更改的文档。这些消息描述了变更的内容和目的，因此对于程序理解和软件维护非常有用。然而，由于缺乏时间和直接动机，提交消息有时会被开发人员忽略。为了解决这个问题，Jiang等人提出了一种方法(我们称之为NMT)，它利用神经机器翻译算法从代码中自动生成简短的提交消息。据报道，他们的方法的表现是有希望的，然而，他们没有探讨为什么他们的方法表现良好。因此，在本文中，我们首先对他们的实验结果进行了深入的分析。我们发现(1)NMT可以生成高质量消息的大多数测试差异类似于令牌级别的一个或多个训练差异。(2) Jiang等人的数据集中大约16%的提交消息是嘈杂的，因为它们是自动生成的，或者因为它们描述了重复的琐碎变化。(3)在去除这些有噪声的提交消息后，NMT的性能下降了很多。此外，NMT复杂且耗时。受第一个发现的启发，我们提出了一种更简单、更快的方法，称为NNGen(最近邻生成器)，使用最近邻算法生成简洁的提交消息。我们的实验结果表明，NNGen比NMT快2600多倍，并且在BLEU(一种广泛用于评估机器翻译系统的精度度量)方面优于NMT 21%。最后，我们还讨论了自动化提交消息生成的一些观察结果，以激励其他研究人员。

{"title":"Neural-Machine-Translation-Based Commit Message Generation: How Far Are We?","authors":"Zhongxin Liu, Xin Xia, A. Hassan, D. Lo, Zhenchang Xing, Xinyu Wang","doi":"10.1145/3238147.3238190","DOIUrl":"https://doi.org/10.1145/3238147.3238190","url":null,"abstract":"Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected by developers. To address this problem, Jiang et al. proposed an approach (we refer to it as NMT), which leverages a neural machine translation algorithm to automatically generate short commit messages from code. The reported performance of their approach is promising, however, they did not explore why their approach performs well. Thus, in this paper, we first perform an in-depth analysis of their experimental results. We find that (1) Most of the test diffs from which NMT can generate high-quality messages are similar to one or more training diffs at the token level. (2) About 16% of the commit messages in Jiang et al.'s dataset are noisy due to being automatically generated or due to them describing repetitive trivial changes. (3) The performance of NMT declines by a large amount after removing such noisy commit messages. In addition, NMT is complicated and time-consuming. Inspired by our first finding, we proposed a simpler and faster approach, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm. Our experimental results show that NNGen is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU (an accuracy measure that is widely used to evaluate machine translation systems) by 21%. Finally, we also discuss some observations for the road ahead for automated commit message generation to inspire other researchers.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"2 1","pages":"373-384"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90184726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 168

DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems DeepRoad:基于gan的自动驾驶系统变形测试和输入验证框架

2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)

Pub Date : 2018-09-01 DOI: 10.1145/3238147.3238187

Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, S. Khurshid

While Deep Neural Networks (DNNs) have established the fundamentals of image-based autonomous driving systems, they may exhibit erroneous behaviors and cause fatal accidents. To address the safety issues in autonomous driving systems, a recent set of testing techniques have been designed to automatically generate artificial driving scenes to enrich test suite, e.g., generating new input images transformed from the original ones. However, these techniques are insufficient due to two limitations: first, many such synthetic images often lack diversity of driving scenes, and hence compromise the resulting efficacy and reliability. Second, for machine-learning-based systems, a mismatch between training and application domain can dramatically degrade system accuracy, such that it is necessary to validate inputs for improving system robustness. In this paper, we propose DeepRoad, an unsupervised DNN-based framework for automatically testing the consistency of DNN-based autonomous driving systems and online validation. First, DeepRoad automatically synthesizes large amounts of diverse driving scenes without using image transformation rules (e.g. scale, shear and rotation). In particular, DeepRoad is able to produce driving scenes with various weather conditions (including those with rather extreme conditions) by applying Generative Adversarial Networks (GANs) along with the corresponding real-world weather scenes. Second, DeepRoad utilizes metamorphic testing techniques to check the consistency of such systems using synthetic images. Third, DeepRoad validates input images for DNN-based systems by measuring the distance of the input and training images using their VGGNet features. We implement DeepRoad to test three well-recognized DNN-based autonomous driving systems in Udacity self-driving car challenge. The experimental results demonstrate that DeepRoad can detect thousands of inconsistent behaviors for these systems, and effectively validate input images to potentially enhance the system robustness as well.

虽然深度神经网络(dnn)已经建立了基于图像的自动驾驶系统的基础，但它们可能会出现错误行为并导致致命事故。为了解决自动驾驶系统中的安全问题，最近设计了一套测试技术来自动生成人工驾驶场景，以丰富测试套件，例如，从原始图像转换生成新的输入图像。然而，这些技术的不足之处主要有两点:第一，许多这样的合成图像往往缺乏驾驶场景的多样性，从而影响了合成图像的有效性和可靠性。其次，对于基于机器学习的系统，训练和应用领域之间的不匹配会极大地降低系统的准确性，因此有必要验证输入以提高系统的鲁棒性。在本文中，我们提出了DeepRoad，这是一个基于无监督dnn的框架，用于自动测试基于dnn的自动驾驶系统的一致性和在线验证。首先，DeepRoad自动合成大量不同的驾驶场景，不使用图像变换规则(如缩放，剪切和旋转)。特别是，DeepRoad能够通过应用生成对抗网络(gan)以及相应的现实世界天气场景，生成各种天气条件(包括极端条件)的驾驶场景。其次，DeepRoad利用变质测试技术使用合成图像来检查这些系统的一致性。第三，DeepRoad通过测量输入图像和使用VGGNet特征的训练图像的距离来验证基于dnn的系统的输入图像。在Udacity自动驾驶汽车挑战赛中，我们使用DeepRoad来测试三个公认的基于dnn的自动驾驶系统。实验结果表明，DeepRoad可以检测到这些系统的数千种不一致行为，并有效地验证输入图像，从而潜在地增强系统的鲁棒性。

{"title":"DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems","authors":"Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, S. Khurshid","doi":"10.1145/3238147.3238187","DOIUrl":"https://doi.org/10.1145/3238147.3238187","url":null,"abstract":"While Deep Neural Networks (DNNs) have established the fundamentals of image-based autonomous driving systems, they may exhibit erroneous behaviors and cause fatal accidents. To address the safety issues in autonomous driving systems, a recent set of testing techniques have been designed to automatically generate artificial driving scenes to enrich test suite, e.g., generating new input images transformed from the original ones. However, these techniques are insufficient due to two limitations: first, many such synthetic images often lack diversity of driving scenes, and hence compromise the resulting efficacy and reliability. Second, for machine-learning-based systems, a mismatch between training and application domain can dramatically degrade system accuracy, such that it is necessary to validate inputs for improving system robustness. In this paper, we propose DeepRoad, an unsupervised DNN-based framework for automatically testing the consistency of DNN-based autonomous driving systems and online validation. First, DeepRoad automatically synthesizes large amounts of diverse driving scenes without using image transformation rules (e.g. scale, shear and rotation). In particular, DeepRoad is able to produce driving scenes with various weather conditions (including those with rather extreme conditions) by applying Generative Adversarial Networks (GANs) along with the corresponding real-world weather scenes. Second, DeepRoad utilizes metamorphic testing techniques to check the consistency of such systems using synthetic images. Third, DeepRoad validates input images for DNN-based systems by measuring the distance of the input and training images using their VGGNet features. We implement DeepRoad to test three well-recognized DNN-based autonomous driving systems in Udacity self-driving car challenge. The experimental results demonstrate that DeepRoad can detect thousands of inconsistent behaviors for these systems, and effectively validate input images to potentially enhance the system robustness as well.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"1 1","pages":"132-142"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88837727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 416

Continuous Code Quality: Are We (Really) Doing That? 持续代码质量:我们(真的)在这样做吗?

2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)

Pub Date : 2018-09-01 DOI: 10.1145/3238147.3240729

Carmine Vassallo, Fabio Palomba, Alberto Bacchelli, H. Gall

Continuous Integration (CI) is a software engineering practice where developers constantly integrate their changes to a project through an automated build process. The goal of CI is to provide developers with prompt feedback on several quality dimensions after each change. Indeed, previous studies provided empirical evidence on a positive association between properly following CI principles and source code quality. A core principle behind CI is Continuous Code Quality (also known as CCQ, which includes automated testing and automated code inspection) may appear simple and effective, yet we know little about its practical adoption. In this paper, we propose a preliminary empirical investigation aimed at understanding how rigorously practitioners follow CCQ. Our study reveals a strong dichotomy between theory and practice: developers do not perform continuous inspection but rather control for quality only at the end of a sprint and most of the times only on the release branch. Preprint [https://doi.org/10.5281/zenodo.1341036]. Data and Materials [http://doi.org/10.5281/zenodo.1341015].

持续集成(CI)是一种软件工程实践，开发人员通过自动化构建过程不断地将他们的更改集成到项目中。CI的目标是在每次更改之后，向开发人员提供关于几个质量维度的及时反馈。事实上，以前的研究提供了经验证据，证明正确遵循CI原则与源代码质量之间存在正相关关系。CI背后的核心原则是持续代码质量(也称为CCQ，包括自动测试和自动代码检查)，它看起来简单有效，但我们对其实际应用知之甚少。在本文中，我们提出了一个初步的实证调查，旨在了解从业者如何严格遵循CCQ。我们的研究揭示了理论和实践之间的强烈二分法:开发人员不执行持续的检查，而是只在冲刺结束时进行质量控制，大多数时候只在发布分支上进行控制。预印本[https://doi.org/10.5281/zenodo.1341036]。数据和资料[http://doi.org/10.5281/zenodo.1341015]。

引用次数: 32

Understanding and Detecting Evolution-Induced Compatibility Issues in Android Apps 理解和检测Android应用程序中进化引起的兼容性问题

2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)

Pub Date : 2018-09-01 DOI: 10.1145/3238147.3238185

Dongjie He, Lian Li, Lei Wang, Hengjie Zheng, Guangwei Li, Jingling Xue

The frequent release of Android OS and its various versions bring many compatibility issues to Android Apps. This paper studies and addresses such evolution-induced compatibility problems. We conduct an extensive empirical study over 11 different Android versions and 4,936 Android Apps. Our study shows that there are drastic API changes between adjacent Android versions, with averagely 140.8 new types, 1,505.6 new methods, and 979.2 new fields being introduced in each release. However, the Android Support Library (provided by the Android OS) only supports less than 23% of the newly added methods, with much less support for new types and fields. As a result, 91.84% of Android Apps write additional code to support different OS versions. Furthermore, 88.65% of the supporting codes share a common pattern, which directly compares variable android.os.Build.VERSION.SDK_INT with a constant version number, to use an API of particular versions. Based on our findings, we develop a new tool called IctApiFinder, to detect incompatible API usages in Android applications. IctApiFinder effectively computes the OS versions on which an API may be invoked, using an inter-procedural data-flow analysis frame-work. It detects numerous incompatible API usages in 361 out of 1,425 Apps. Compared to Android Lint, IctApiFinder is sound and able to reduce the false positives by 82.1%. We have reported the issues to 13 Apps developers. At present, 5 of them have already been confirmed by the original developers and 3 of them have already been fixed.

Android操作系统及其不同版本的频繁发布给Android应用程序带来了许多兼容性问题。本文研究并解决了这类进化引起的兼容性问题。我们对11个不同的Android版本和4,936个Android应用程序进行了广泛的实证研究。我们的研究表明，相邻的Android版本之间存在巨大的API变化，每个版本平均引入140.8个新类型，1,505.6个新方法和979.2个新字段。然而，Android支持库(由Android操作系统提供)只支持不到23%的新增方法，对新类型和新字段的支持要少得多。因此，91.84%的Android应用需要编写额外的代码来支持不同的操作系统版本。此外，88.65%的支持代码共享一个共同的模式，直接比较变量android.os.Build.VERSION。SDK_INT与一个恒定的版本号，以使用特定版本的API。基于我们的发现，我们开发了一个名为IctApiFinder的新工具，用于检测Android应用程序中不兼容的API用法。IctApiFinder使用过程间数据流分析框架，有效地计算可能调用API的操作系统版本。它在1425个应用程序中的361个中检测到大量不兼容的API用法。与Android Lint相比，IctApiFinder更可靠，能够将误报率降低82.1%。我们已经向13个应用程序开发者报告了这些问题。目前有5个已经被原开发者确认，3个已经修复。

{"title":"Understanding and Detecting Evolution-Induced Compatibility Issues in Android Apps","authors":"Dongjie He, Lian Li, Lei Wang, Hengjie Zheng, Guangwei Li, Jingling Xue","doi":"10.1145/3238147.3238185","DOIUrl":"https://doi.org/10.1145/3238147.3238185","url":null,"abstract":"The frequent release of Android OS and its various versions bring many compatibility issues to Android Apps. This paper studies and addresses such evolution-induced compatibility problems. We conduct an extensive empirical study over 11 different Android versions and 4,936 Android Apps. Our study shows that there are drastic API changes between adjacent Android versions, with averagely 140.8 new types, 1,505.6 new methods, and 979.2 new fields being introduced in each release. However, the Android Support Library (provided by the Android OS) only supports less than 23% of the newly added methods, with much less support for new types and fields. As a result, 91.84% of Android Apps write additional code to support different OS versions. Furthermore, 88.65% of the supporting codes share a common pattern, which directly compares variable android.os.Build.VERSION.SDK_INT with a constant version number, to use an API of particular versions. Based on our findings, we develop a new tool called IctApiFinder, to detect incompatible API usages in Android applications. IctApiFinder effectively computes the OS versions on which an API may be invoked, using an inter-procedural data-flow analysis frame-work. It detects numerous incompatible API usages in 361 out of 1,425 Apps. Compared to Android Lint, IctApiFinder is sound and able to reduce the false positives by 82.1%. We have reported the issues to 13 Apps developers. At present, 5 of them have already been confirmed by the original developers and 3 of them have already been fixed.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"35 1","pages":"167-177"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81047670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 63

On Adopting Linters to Deal with Performance Concerns in Android Apps 在Android应用程序中采用linter来处理性能问题

2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)

Pub Date : 2018-09-01 DOI: 10.1145/3238147.3238197

Sarra Habchi, Xavier Blanc, Romain Rouvoy

With millions of applications (apps) distributed through mobile markets, engaging and retaining end-users challenge Android developers to deliver a nearly perfect user experience. As mobile apps run in resource-limited devices, performance is a critical criterion for the quality of experience. Therefore, developers are expected to pay much attention to limit performance bad practices. On the one hand, many studies already identified such performance bad practices and showed that they can heavily impact app performance. Hence, many static analysers, a.k.a. linters, have been proposed to detect and fix these bad practices. On the other hand, other studies have shown that Android developers tend to deal with performance reactively and they rarely build on linters to detect and fix performance bad practices. In this paper, we therefore perform a qualitative study to investigate this gap between research and development community. In particular, we performed interviews with 14 experienced Android developers to identify the perceived benefits and constraints of using linters to identify performance bad practices in Android apps. Our observations can have a direct impact on developers and the research community. Specifically, we describe why and how developers leverage static source code analysers to improve the performance of their apps. On top of that, we bring to light important challenges faced by developers when it comes to adopting static analysis for performance purposes.

随着数以百万计的应用程序(app)通过移动市场分发，吸引和留住终端用户对Android开发者提出了近乎完美的用户体验的挑战。由于移动应用运行在资源有限的设备上，性能是体验质量的关键标准。因此，开发人员应该非常注意限制性能不良实践。一方面，许多研究已经确定了这种性能不良做法，并表明它们会严重影响应用性能。因此，已经提出了许多静态分析器(又名linter)来检测和修复这些不良实践。另一方面，其他研究表明，Android开发人员倾向于被动地处理性能，他们很少建立在检测和修复性能不良实践的基础上。因此，在本文中，我们进行了一项定性研究来调查研究与开发社区之间的这种差距。特别是，我们采访了14位经验丰富的Android开发者，以确定在Android应用程序中使用linter识别性能不良做法的好处和限制。我们的观察可以对开发人员和研究社区产生直接影响。具体来说，我们描述了开发人员为什么以及如何利用静态源代码分析器来提高其应用程序的性能。除此之外，我们还揭示了开发人员在为性能目的采用静态分析时所面临的重要挑战。

{"title":"On Adopting Linters to Deal with Performance Concerns in Android Apps","authors":"Sarra Habchi, Xavier Blanc, Romain Rouvoy","doi":"10.1145/3238147.3238197","DOIUrl":"https://doi.org/10.1145/3238147.3238197","url":null,"abstract":"With millions of applications (apps) distributed through mobile markets, engaging and retaining end-users challenge Android developers to deliver a nearly perfect user experience. As mobile apps run in resource-limited devices, performance is a critical criterion for the quality of experience. Therefore, developers are expected to pay much attention to limit performance bad practices. On the one hand, many studies already identified such performance bad practices and showed that they can heavily impact app performance. Hence, many static analysers, a.k.a. linters, have been proposed to detect and fix these bad practices. On the other hand, other studies have shown that Android developers tend to deal with performance reactively and they rarely build on linters to detect and fix performance bad practices. In this paper, we therefore perform a qualitative study to investigate this gap between research and development community. In particular, we performed interviews with 14 experienced Android developers to identify the perceived benefits and constraints of using linters to identify performance bad practices in Android apps. Our observations can have a direct impact on developers and the research community. Specifically, we describe why and how developers leverage static source code analysers to improve the performance of their apps. On top of that, we bring to light important challenges faced by developers when it comes to adopting static analysis for performance purposes.","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"21 1","pages":"6-16"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88160725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Break the Dead End of Dynamic Slicing: Localizing Data and Control Omission Bug 打破动态切片的死胡同:数据定位和控制遗漏错误

2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)

Pub Date : 2018-09-01 DOI: 10.1145/3238147.3238163

Yun Lin, Jun Sun, Lyly Tran, Guangdong Bai, Haijun Wang, J. Dong

Dynamic slicing is a common way of identifying the root cause when a program fault is revealed. With the dynamic slicing technique, the programmers can follow data and control flow along the program execution trace to the root cause. However, the technique usually fails to work on omission bugs, i.e., the faults which are caused by missing executing some code. In many cases, dynamic slicing over-skips the root cause when an omission bug happens, leading the debugging process to a dead end. In this work, we conduct an empirical study on the omission bugs in the Defects4J bug repository. Our study shows that (1) omission bugs are prevalent (46.4%) among all the studied bugs; (2) there are repeating patterns on causes and fixes of the omission bugs; (3) the patterns of fixing omission bugs serve as a strong hint to break the slicing dead end. Based on our findings, we train a neural network model on the omission bugs in Defects4J repository to recommend where to approach when slicing can no long work. We conduct an experiment by applying our approach on 3193 mutated omission bugs which slicing fails to locate. The results show that our approach outperforms random benchmark on breaking the dead end and localizing the mutated omission bugs (63.8% over 2.8%).

动态切片是发现程序故障时识别根本原因的常用方法。利用动态切片技术，程序员可以沿着程序执行轨迹跟踪数据和控制流，找到根本原因。然而，该技术通常无法处理遗漏错误，即由于未执行某些代码而导致的错误。在许多情况下，当遗漏错误发生时，动态切片会跳过根本原因，导致调试过程陷入死胡同。在这项工作中，我们对缺陷4j错误存储库中的遗漏错误进行了实证研究。研究表明:(1)在所研究的所有错误中，遗漏错误普遍存在(46.4%);(2)遗漏错误的原因和修复存在重复模式;(3)遗漏bug的修复模式是打破切片死角的强烈提示。根据我们的发现，我们针对缺陷4j存储库中的遗漏错误训练了一个神经网络模型，以在切片无法长期工作时推荐在何处处理。我们将该方法应用于3193个切片无法定位的突变遗漏错误进行了实验。结果表明，我们的方法在打破死角和定位变异遗漏错误方面优于随机基准测试(63.8%比2.8%)。

{"title":"Break the Dead End of Dynamic Slicing: Localizing Data and Control Omission Bug","authors":"Yun Lin, Jun Sun, Lyly Tran, Guangdong Bai, Haijun Wang, J. Dong","doi":"10.1145/3238147.3238163","DOIUrl":"https://doi.org/10.1145/3238147.3238163","url":null,"abstract":"Dynamic slicing is a common way of identifying the root cause when a program fault is revealed. With the dynamic slicing technique, the programmers can follow data and control flow along the program execution trace to the root cause. However, the technique usually fails to work on omission bugs, i.e., the faults which are caused by missing executing some code. In many cases, dynamic slicing over-skips the root cause when an omission bug happens, leading the debugging process to a dead end. In this work, we conduct an empirical study on the omission bugs in the Defects4J bug repository. Our study shows that (1) omission bugs are prevalent (46.4%) among all the studied bugs; (2) there are repeating patterns on causes and fixes of the omission bugs; (3) the patterns of fixing omission bugs serve as a strong hint to break the slicing dead end. Based on our findings, we train a neural network model on the omission bugs in Defects4J repository to recommend where to approach when slicing can no long work. We conduct an experiment by applying our approach on 3193 mutated omission bugs which slicing fails to locate. The results show that our approach outperforms random benchmark on breaking the dead end and localizing the mutated omission bugs (63.8% over 2.8%).","PeriodicalId":6622,"journal":{"name":"2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"41 1","pages":"509-519"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89927763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22