2023 IEEE/ACM International Conference on Automation of Software Test (AST)最新文献

Debugging Flaky Tests using Spectrum-based Fault Localization 使用基于频谱的故障定位调试片状测试

2023 IEEE/ACM International Conference on Automation of Software Test (AST)

Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00017

Martin Gruber, G. Fraser

Non-deterministically behaving (i.e., flaky) tests hamper regression testing as they destroy trust and waste computational and human resources. Eradicating flakiness in test suites is therefore an important goal, but automated debugging tools are needed to support developers when trying to understand the causes of flakiness. A popular example for an automated approach to support regular debugging is spectrum-based fault localization (SFL), a technique that identifies software components that are most likely the causes of failures. While it is possible to also apply SFL for locating likely sources of flakiness in code, unfortunately the flakiness makes SFL both imprecise and non-deterministic. In this paper we introduce SFFL (Spectrum-based Flaky Fault Localization), an extension of traditional coverage-based SFL that exploits our observation that 80% of flaky tests exhibit varying coverage behavior between different runs. By distinguishing between stable and flaky coverage, SFFL is able to locate the sources of flakiness more precisely and keeps the localization itself deterministic. An evaluation on 101 flaky tests taken from 48 open-source Python projects demonstrates that SFFL is effective: Of five prominent SFL formulas, DStar, Ochiai, and Op2 yield the best overall performance. On average, they are able to narrow down the fault’s location to 3.5% of the project’s code base, which is 18.7% better than traditional SFL (for DStar). SFFL’s effectiveness, however, depends on the root causes of flakiness: The source of non-order-dependent flaky tests can be located far more precisely than order-dependent faults.

非确定性行为(即，不稳定的)测试阻碍回归测试，因为它们破坏信任并浪费计算和人力资源。因此，消除测试套件中的脆弱性是一个重要的目标，但是在试图理解脆弱性的原因时，需要自动化调试工具来支持开发人员。支持常规调试的自动化方法的一个流行示例是基于频谱的故障定位(SFL)，这是一种识别最有可能导致故障的软件组件的技术。虽然也可以应用SFL来定位代码中可能的碎片源，但不幸的是，碎片使SFL既不精确又不确定。在本文中，我们介绍了SFFL(基于频谱的片状故障定位)，这是传统基于覆盖率的SFL的扩展，它利用了我们的观察，即80%的片状测试在不同运行之间表现出不同的覆盖行为。通过区分稳定覆盖和片状覆盖，SFFL能够更精确地定位片状覆盖的来源，并保持定位本身的确定性。对来自48个开源Python项目的101个零散测试的评估表明，SFFL是有效的:在五个突出的SFL公式中，DStar、Ochiai和Op2产生了最佳的整体性能。平均而言，他们能够将错误的位置缩小到项目代码库的3.5%，这比传统的SFL(对于DStar)好18.7%。然而，SFFL的有效性取决于片状的根本原因:非顺序依赖的片状测试的来源可以比顺序依赖的错误更精确地定位。

{"title":"Debugging Flaky Tests using Spectrum-based Fault Localization","authors":"Martin Gruber, G. Fraser","doi":"10.1109/AST58925.2023.00017","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00017","url":null,"abstract":"Non-deterministically behaving (i.e., flaky) tests hamper regression testing as they destroy trust and waste computational and human resources. Eradicating flakiness in test suites is therefore an important goal, but automated debugging tools are needed to support developers when trying to understand the causes of flakiness. A popular example for an automated approach to support regular debugging is spectrum-based fault localization (SFL), a technique that identifies software components that are most likely the causes of failures. While it is possible to also apply SFL for locating likely sources of flakiness in code, unfortunately the flakiness makes SFL both imprecise and non-deterministic. In this paper we introduce SFFL (Spectrum-based Flaky Fault Localization), an extension of traditional coverage-based SFL that exploits our observation that 80% of flaky tests exhibit varying coverage behavior between different runs. By distinguishing between stable and flaky coverage, SFFL is able to locate the sources of flakiness more precisely and keeps the localization itself deterministic. An evaluation on 101 flaky tests taken from 48 open-source Python projects demonstrates that SFFL is effective: Of five prominent SFL formulas, DStar, Ochiai, and Op2 yield the best overall performance. On average, they are able to narrow down the fault’s location to 3.5% of the project’s code base, which is 18.7% better than traditional SFL (for DStar). SFFL’s effectiveness, however, depends on the root causes of flakiness: The source of non-order-dependent flaky tests can be located far more precisely than order-dependent faults.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125832419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Message from AST 2023 Chairs 来自AST 2023主席的信息

2023 IEEE/ACM International Conference on Automation of Software Test (AST)

Pub Date : 2023-05-01 DOI: 10.1109/ast58925.2023.00025

引用次数: 0

Test Case Prioritization using Transfer Learning in Continuous Integration Environments 在持续集成环境中使用迁移学习来确定测试用例的优先级

2023 IEEE/ACM International Conference on Automation of Software Test (AST)

Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00023

Rezwana Mamata, Akramul Azim, R. Liscano, Kevin Smith, Yee-Kang Chang, Gkerta Seferi, Qasim Tauseef

The continuous Integration (CI) process runs a large set of automated test cases to verify software builds. The testing phase in the CI systems has timing constraints to ensure software quality without significantly delaying the CI builds. Therefore, CI requires efficient testing techniques such as Test Case Prioritization (TCP) to run faulty test cases with priority. Recent research studies on TCP utilize different Machine Learning (ML) methods to adopt the dynamic and complex nature of CI. However, the performance of ML for TCP may decrease for a low volume of data and less failure rate, whereas using existing data with similar patterns from other domains can be valuable. We formulate this as a transfer learning (TL) problem. TL has proven to be beneficial for many real-world applications where source domains have plenty of data, but the target domains have a scarcity of it. Therefore, this research investigates leveraging the benefit of transfer learning for test case prioritization (TCP). However, only some industrial CI datasets are publicly available due to data privacy protection regulations. In such cases, model-based transfer learning is a potential solution to share knowledge among different projects without revealing data to other stakeholders. This paper applies TransBoost, a tree-kernel-based TL algorithm, to evaluate the TL approach for 24 study subjects and identify potential source datasets.

持续集成(CI)过程运行大量自动化测试用例来验证软件构建。CI系统中的测试阶段有时间限制，以确保软件质量，而不会显著延迟CI构建。因此，CI需要有效的测试技术，例如测试用例优先级(TCP)来运行有优先级的错误测试用例。最近对TCP的研究利用不同的机器学习(ML)方法来采用CI的动态性和复杂性。然而，对于低数据量和低故障率，TCP的ML性能可能会下降，而使用来自其他领域的具有类似模式的现有数据可能是有价值的。我们将其表述为迁移学习(TL)问题。TL已被证明对许多真实世界的应用程序是有益的，在这些应用程序中，源域具有大量数据，而目标域缺乏数据。因此，本研究探讨了利用迁移学习对测试用例优先级(TCP)的好处。然而，由于数据隐私保护规定，只有一些工业CI数据集是公开可用的。在这种情况下，基于模型的迁移学习是在不向其他利益相关者泄露数据的情况下在不同项目之间共享知识的潜在解决方案。本文采用基于树核的TL算法TransBoost对24个研究对象的TL方法进行评估，并识别潜在的源数据集。

{"title":"Test Case Prioritization using Transfer Learning in Continuous Integration Environments","authors":"Rezwana Mamata, Akramul Azim, R. Liscano, Kevin Smith, Yee-Kang Chang, Gkerta Seferi, Qasim Tauseef","doi":"10.1109/AST58925.2023.00023","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00023","url":null,"abstract":"The continuous Integration (CI) process runs a large set of automated test cases to verify software builds. The testing phase in the CI systems has timing constraints to ensure software quality without significantly delaying the CI builds. Therefore, CI requires efficient testing techniques such as Test Case Prioritization (TCP) to run faulty test cases with priority. Recent research studies on TCP utilize different Machine Learning (ML) methods to adopt the dynamic and complex nature of CI. However, the performance of ML for TCP may decrease for a low volume of data and less failure rate, whereas using existing data with similar patterns from other domains can be valuable. We formulate this as a transfer learning (TL) problem. TL has proven to be beneficial for many real-world applications where source domains have plenty of data, but the target domains have a scarcity of it. Therefore, this research investigates leveraging the benefit of transfer learning for test case prioritization (TCP). However, only some industrial CI datasets are publicly available due to data privacy protection regulations. In such cases, model-based transfer learning is a potential solution to share knowledge among different projects without revealing data to other stakeholders. This paper applies TransBoost, a tree-kernel-based TL algorithm, to evaluate the TL approach for 24 study subjects and identify potential source datasets.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124588051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SourceWarp: A scalable, SCM-driven testing and benchmarking approach to support data-driven and agile decision making for CI/CD tools and DevOps platforms SourceWarp:一个可扩展的、scm驱动的测试和基准测试方法，支持CI/CD工具和DevOps平台的数据驱动和敏捷决策制定

2023 IEEE/ACM International Conference on Automation of Software Test (AST)

Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00011

Julian Thomé, James Johnson, Isaac Dawson, Dinesh Bolkensteyn, Michael Henriksen, Mark Art

The rising popularity and adoption of source-code management systems in combination with Continuous Integration and Continuous Delivery (CI/CD) processes have contributed to the adoption of agile software development with short release and feedback cycles between software producers and their customers. DevOps platforms streamline and enhance automation around source-code management systems by providing a uniform interface for managing all the aspects of the software development lifecycle starting from software development until software deployment and by integrating and orchestrating various tools that provide automation around software development processes such as automated bug detection, security testing, dependency scanning, etc..Applying changes to the DevOps platform or to one of the integrated tools without providing data regarding its real world impact increases the risk of having to remove/revert the change. This could lead to service disruption or loss of confidence in the platform if it does not perform as expected. In addition, integrating alpha or beta features, which may not meet the robustness of a finalised feature, may pose security or stability risks to the entire platform. Hence, short release cycles require testing and benchmarking approaches that make it possible to prototype, test, and benchmark ideas quickly and at scale to support Data-Driven Decision Making, with respect to the features that are about to be integrated into the platform.In this paper, we propose a scalable testing and benchmarking approach called SourceWarp that is targeted towards DevOps platforms and supports both testing and benchmarking in a cost effective and reproducible manner. We have implemented the proposed approach in the publicly available SourceWarp tool which we have evaluated in the context of a real-world industrial case-study. We successfully applied SourceWarp to test and benchmark a newly developed feature at GitLab which has been successfully integrated into the product. In the case study we demonstrate that SourceWarp is scalable and highly effective in supporting agile Data-Driven Decision Making by providing automation for testing and benchmarking proof-of-concept ideas for CI/CD tools, chained CI/CD tools (also referred to as pipeline), for the DevOps platform or a combination of them without having to deploy features to the staging or production environments.

与持续集成和持续交付(CI/CD)过程相结合的源代码管理系统的日益流行和采用促进了敏捷软件开发的采用，软件生产者和他们的客户之间具有较短的发布和反馈周期。DevOps平台通过提供统一的接口来管理从软件开发到软件部署的软件开发生命周期的所有方面，并通过集成和编排各种工具来简化和增强围绕源代码管理系统的自动化，这些工具提供围绕软件开发过程的自动化，例如自动错误检测、安全测试、依赖项扫描、将更改应用到DevOps平台或其中一个集成工具而不提供有关其实际影响的数据会增加必须删除/恢复更改的风险。如果平台没有按预期运行，这可能导致服务中断或对平台失去信心。此外，集成alpha或beta功能可能无法满足最终功能的鲁棒性，可能会对整个平台带来安全性或稳定性风险。因此，较短的发布周期需要测试和基准测试方法，这些方法可以快速和大规模地对想法进行原型、测试和基准测试，以支持数据驱动的决策制定，以及即将集成到平台中的特性。在本文中，我们提出了一种可扩展的测试和基准测试方法，称为SourceWarp，它针对DevOps平台，并以经济有效和可重复的方式支持测试和基准测试。我们已经在公开可用的SourceWarp工具中实现了建议的方法，我们已经在现实世界的工业案例研究中对其进行了评估。我们成功地应用SourceWarp在GitLab测试和基准测试新开发的功能，该功能已成功集成到产品中。在案例研究中，我们展示了SourceWarp是可扩展的，并且在支持敏捷数据驱动决策方面非常有效，它为CI/CD工具、链式CI/CD工具(也称为管道)、DevOps平台或它们的组合提供自动化测试和基准测试，而无需将特性部署到阶段或生产环境中。

{"title":"SourceWarp: A scalable, SCM-driven testing and benchmarking approach to support data-driven and agile decision making for CI/CD tools and DevOps platforms","authors":"Julian Thomé, James Johnson, Isaac Dawson, Dinesh Bolkensteyn, Michael Henriksen, Mark Art","doi":"10.1109/AST58925.2023.00011","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00011","url":null,"abstract":"The rising popularity and adoption of source-code management systems in combination with Continuous Integration and Continuous Delivery (CI/CD) processes have contributed to the adoption of agile software development with short release and feedback cycles between software producers and their customers. DevOps platforms streamline and enhance automation around source-code management systems by providing a uniform interface for managing all the aspects of the software development lifecycle starting from software development until software deployment and by integrating and orchestrating various tools that provide automation around software development processes such as automated bug detection, security testing, dependency scanning, etc..Applying changes to the DevOps platform or to one of the integrated tools without providing data regarding its real world impact increases the risk of having to remove/revert the change. This could lead to service disruption or loss of confidence in the platform if it does not perform as expected. In addition, integrating alpha or beta features, which may not meet the robustness of a finalised feature, may pose security or stability risks to the entire platform. Hence, short release cycles require testing and benchmarking approaches that make it possible to prototype, test, and benchmark ideas quickly and at scale to support Data-Driven Decision Making, with respect to the features that are about to be integrated into the platform.In this paper, we propose a scalable testing and benchmarking approach called SourceWarp that is targeted towards DevOps platforms and supports both testing and benchmarking in a cost effective and reproducible manner. We have implemented the proposed approach in the publicly available SourceWarp tool which we have evaluated in the context of a real-world industrial case-study. We successfully applied SourceWarp to test and benchmark a newly developed feature at GitLab which has been successfully integrated into the product. In the case study we demonstrate that SourceWarp is scalable and highly effective in supporting agile Data-Driven Decision Making by providing automation for testing and benchmarking proof-of-concept ideas for CI/CD tools, chained CI/CD tools (also referred to as pipeline), for the DevOps platform or a combination of them without having to deploy features to the staging or production environments.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124696255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MuTCR: Test Case Recommendation via Multi-Level Signature Matching MuTCR:通过多级签名匹配推荐测试用例

2023 IEEE/ACM International Conference on Automation of Software Test (AST)

Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00022

Weisong Sun, Weidong Qian, Bin Luo, Zhenyu Chen

Off-the-shelf test cases provide developers with testing knowledge for their reference or reuse, which can help them reduce the effort of creating new test cases. Test case recommendation, a major way of achieving test case reuse, has been receiving the attention of researchers. The basic idea behind test case recommendation is that two similar test targets (methods under test) can reuse each other’s test cases. However, existing test case recommendation techniques either cannot be used in the cross-project scenario, or have low performance in terms of effectiveness and efficiency. In this paper, we propose a novel test case recommendation technique based on multi-level signature matching. The proposed multi-level signature matching consists of three matching strategies with different strict levels, including level-0 exact matching, level-1 fuzzy matching, and level-2 fuzzy matching. For the query test target given by the developer, level-0 exact matching helps to retrieve exact recommendations (test cases), while level-1 and level-2 fuzzy matching contribute to discovering richer relevant recommendations. We further develop a prototype called MuTCR for test case recommendation. We conduct comprehensive experiments to evaluate the effectiveness and efficiency of MuTCR. The experimental results demonstrate that compared with the state-of-the-art, MuTCR can recommend accurate test cases for more test targets. MuTCR is faster than the best baseline by three times based on the time cost. The user study is also performed to prove that the test cases recommended by MuTCR are useful in practice.

现成的测试用例为开发人员提供了可供参考或重用的测试知识，这可以帮助他们减少创建新测试用例的工作量。测试用例推荐作为实现测试用例重用的主要方法，一直受到研究者的关注。测试用例推荐背后的基本思想是两个相似的测试目标(测试中的方法)可以重用彼此的测试用例。然而，现有的测试用例推荐技术要么不能在跨项目场景中使用，要么在有效性和效率方面具有较低的性能。本文提出了一种基于多级签名匹配的测试用例推荐技术。本文提出的多级签名匹配包括三种严格级别不同的匹配策略，即0级精确匹配、1级模糊匹配和2级模糊匹配。对于开发人员给出的查询测试目标，0级精确匹配有助于检索准确的推荐(测试用例)，而1级和2级模糊匹配有助于发现更丰富的相关推荐。我们进一步开发了一个名为MuTCR的原型，用于推荐测试用例。我们进行了全面的实验来评估MuTCR的有效性和效率。实验结果表明，与现有的方法相比，MuTCR可以为更多的测试目标推荐准确的测试用例。基于时间成本，MuTCR比最佳基线快三倍。用户研究也被执行，以证明MuTCR推荐的测试用例在实践中是有用的。

{"title":"MuTCR: Test Case Recommendation via Multi-Level Signature Matching","authors":"Weisong Sun, Weidong Qian, Bin Luo, Zhenyu Chen","doi":"10.1109/AST58925.2023.00022","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00022","url":null,"abstract":"Off-the-shelf test cases provide developers with testing knowledge for their reference or reuse, which can help them reduce the effort of creating new test cases. Test case recommendation, a major way of achieving test case reuse, has been receiving the attention of researchers. The basic idea behind test case recommendation is that two similar test targets (methods under test) can reuse each other’s test cases. However, existing test case recommendation techniques either cannot be used in the cross-project scenario, or have low performance in terms of effectiveness and efficiency. In this paper, we propose a novel test case recommendation technique based on multi-level signature matching. The proposed multi-level signature matching consists of three matching strategies with different strict levels, including level-0 exact matching, level-1 fuzzy matching, and level-2 fuzzy matching. For the query test target given by the developer, level-0 exact matching helps to retrieve exact recommendations (test cases), while level-1 and level-2 fuzzy matching contribute to discovering richer relevant recommendations. We further develop a prototype called MuTCR for test case recommendation. We conduct comprehensive experiments to evaluate the effectiveness and efficiency of MuTCR. The experimental results demonstrate that compared with the state-of-the-art, MuTCR can recommend accurate test cases for more test targets. MuTCR is faster than the best baseline by three times based on the time cost. The user study is also performed to prove that the test cases recommended by MuTCR are useful in practice.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131464945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting Potential User-data Save & Export Losses due to Android App Termination 检测潜在的用户数据保存和导出损失由于Android应用程序终止

2023 IEEE/ACM International Conference on Automation of Software Test (AST)

Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00019

Sydur Rahaman, Umar Farooq, Iulian Neamtiu, Zhijia Zhao

A common feature in Android apps is saving, or exporting, user’s work (e.g., a drawing) as well as data (e.g., a spreadsheet) onto local storage, as a file. Due to the volatile nature of the OS and the mobile environment in general, the system can terminate apps without notice, which prevents the execution of file write operations; consequently, user data that was supposed to be saved/exported is instead lost. Testing apps for such potential losses raises several challenges: how to identify data originating from user input or resulting from user action (then check whether it is saved), and how to reproduce a potential error by terminating the app at the exact moment when unsaved changes are pending. We address these challenges via an approach that finds potential “lost writes”, i.e., user data supposed to be written to a file, but the file write does not take place due to system-initiated termination. Our approach consists of two phases: a static analysis that finds potential losses and a dynamic loss verification phase where we compare lossy and lossless system-level file write traces to confirm errors. We ran our analysis on 2,182 apps from Google Play and 38 apps from F-Droid. Our approach found 163 apps where termination caused losses, including losing user’s app-specific data, notes, photos, user’s work and settings. In contrast, two state-of-the-art tools aimed at finding volatility errors in Android apps failed to discover the issues we found.

Android应用程序的一个常见功能是保存或导出用户的工作(例如，绘图)以及数据(例如，电子表格)到本地存储，作为文件。由于操作系统和移动环境的不稳定性，系统可以在不通知的情况下终止应用程序，从而阻止文件写入操作的执行;因此，本来应该保存/导出的用户数据反而丢失了。对应用程序进行此类潜在损失的测试提出了几个挑战:如何识别源自用户输入或由用户操作产生的数据(然后检查它是否已保存)，以及如何通过在未保存的更改等待时终止应用程序来重现潜在的错误。我们通过一种方法来解决这些挑战，这种方法可以发现潜在的“丢失的写入”，即，用户数据应该被写入文件，但由于系统启动的终止而没有进行文件写入。我们的方法包括两个阶段:发现潜在损失的静态分析和动态损失验证阶段，其中我们比较有损和无损的系统级文件写入跟踪以确认错误。我们分析了来自Google Play的2182款应用和来自F-Droid的38款应用。我们的方法发现，163个应用程序的终止导致了损失，包括丢失用户特定应用程序的数据、笔记、照片、用户工作和设置。相比之下，两种旨在发现Android应用中波动性错误的先进工具未能发现我们发现的问题。

{"title":"Detecting Potential User-data Save & Export Losses due to Android App Termination","authors":"Sydur Rahaman, Umar Farooq, Iulian Neamtiu, Zhijia Zhao","doi":"10.1109/AST58925.2023.00019","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00019","url":null,"abstract":"A common feature in Android apps is saving, or exporting, user’s work (e.g., a drawing) as well as data (e.g., a spreadsheet) onto local storage, as a file. Due to the volatile nature of the OS and the mobile environment in general, the system can terminate apps without notice, which prevents the execution of file write operations; consequently, user data that was supposed to be saved/exported is instead lost. Testing apps for such potential losses raises several challenges: how to identify data originating from user input or resulting from user action (then check whether it is saved), and how to reproduce a potential error by terminating the app at the exact moment when unsaved changes are pending. We address these challenges via an approach that finds potential “lost writes”, i.e., user data supposed to be written to a file, but the file write does not take place due to system-initiated termination. Our approach consists of two phases: a static analysis that finds potential losses and a dynamic loss verification phase where we compare lossy and lossless system-level file write traces to confirm errors. We ran our analysis on 2,182 apps from Google Play and 38 apps from F-Droid. Our approach found 163 apps where termination caused losses, including losing user’s app-specific data, notes, photos, user’s work and settings. In contrast, two state-of-the-art tools aimed at finding volatility errors in Android apps failed to discover the issues we found.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116936551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Reinforcement Learning Approach to Generating Test Cases for Web Applications 为Web应用程序生成测试用例的强化学习方法

2023 IEEE/ACM International Conference on Automation of Software Test (AST)

Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00006

Xiaoning Chang, Zheheng Liang, Yifei Zhang, Lei Cui, Zhenyue Long, Guoquan Wu, Yu Gao, W. Chen, Jun Wei, Tao Huang

Web applications play an important role in modern society. Quality assurance of web applications requires lots of manual efforts. In this paper, we propose WebQT, an automatic test case generator for web applications based on reinforcement learning. Specifically, to increase testing efficiency, we design a new reward model, which encourages the agent to mimic human testers to interact with the web applications. To alleviate the problem of state redundancy, we further propose a novel state abstraction technique, which can identify different web pages with the same functionality as the same state, and yields a simplified state space. We evaluate WebQT on seven open-source web applications. The experimental results show that WebQT achieves 45.4% more code coverage along with higher efficiency than the state-of-the-art technique. In addition, WebQT also reveals 69 exceptions in 11 real-world web applications.

Web应用程序在现代社会中扮演着重要的角色。web应用程序的质量保证需要大量的手工工作。在本文中，我们提出了WebQT，一个基于强化学习的web应用程序自动测试用例生成器。具体来说，为了提高测试效率，我们设计了一个新的奖励模型，鼓励智能体模仿人类测试人员与web应用程序进行交互。为了缓解状态冗余问题，我们进一步提出了一种新的状态抽象技术，该技术可以识别具有相同功能的不同网页作为相同的状态，并产生一个简化的状态空间。我们在七个开源web应用程序上评估了WebQT。实验结果表明，WebQT实现了比现有技术多45.4%的代码覆盖率和更高的效率。此外，WebQT还揭示了11个真实web应用程序中的69个异常。

引用次数: 0

AST 2023 Program Committee AST 2023项目委员会

2023 IEEE/ACM International Conference on Automation of Software Test (AST)

Pub Date : 2023-05-01 DOI: 10.1109/ast58925.2023.00027

引用次数: 0

An Intelligent Duplicate Bug Report Detection Method Based on Technical Term Extraction 一种基于术语提取的智能重复Bug报告检测方法

2023 IEEE/ACM International Conference on Automation of Software Test (AST)

Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00005

Xiaoxue Wu, Wenjing Shan, Wei Zheng, Zhiguo Chen, Tao Ren, Xiaobing Sun

As the bug description data generated during the software maintenance cycle, bug reports are usually hastily written by different users, resulting in many redundant and duplicate bug reports (DBRs). Once the DBRs are repeatedly assigned to developers, it will inevitably lead to a serious waste of human resources, especially for large-scale open-source projects. Recently, many experts and scholars have devoted themselves to researching the detection of DBRs and put forward a series of detection methods for DBRs. However, there is still much room for improvement in the performance of DBR prediction. Therefore, this paper proposes a new method for detecting DBR based on technical term extraction, CTEDB (Combination of Term Extraction and DeBERTaV3) for short. This method first extracts technical terms from the text information of bug reports based on Word2Vec and TextRank algorithms. Then it calculates the semantic similarity of technical terms between different bug reports by combining Word2Vec and SBERT models. Finally, it completes the DBR detection task by combining the DeBERTaV3 model. The experimental results show that CTEDB has achieved good results in detecting DBR, and has obviously improved the accuracy, F1-score, recall and precision compared with the baseline approaches.

bug报告作为软件维护周期中产生的bug描述数据，通常由不同的用户匆忙编写，导致大量冗余和重复的bug报告。一旦将dbr反复分配给开发人员，必然会导致人力资源的严重浪费，特别是对于大型开源项目。近年来，许多专家学者致力于dbr的检测研究，提出了一系列dbr的检测方法。然而，DBR预测的性能仍有很大的提升空间。为此，本文提出了一种基于技术术语提取的DBR检测新方法，简称CTEDB (Combination of term extraction and DeBERTaV3)。该方法首先基于Word2Vec和TextRank算法从bug报告的文本信息中提取技术术语。然后结合Word2Vec和SBERT模型计算不同bug报告之间技术术语的语义相似度。最后结合DeBERTaV3模型完成DBR检测任务。实验结果表明，CTEDB在检测DBR方面取得了较好的效果，与基线方法相比，准确率、f1分数、查全率和查准率均有明显提高。

{"title":"An Intelligent Duplicate Bug Report Detection Method Based on Technical Term Extraction","authors":"Xiaoxue Wu, Wenjing Shan, Wei Zheng, Zhiguo Chen, Tao Ren, Xiaobing Sun","doi":"10.1109/AST58925.2023.00005","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00005","url":null,"abstract":"As the bug description data generated during the software maintenance cycle, bug reports are usually hastily written by different users, resulting in many redundant and duplicate bug reports (DBRs). Once the DBRs are repeatedly assigned to developers, it will inevitably lead to a serious waste of human resources, especially for large-scale open-source projects. Recently, many experts and scholars have devoted themselves to researching the detection of DBRs and put forward a series of detection methods for DBRs. However, there is still much room for improvement in the performance of DBR prediction. Therefore, this paper proposes a new method for detecting DBR based on technical term extraction, CTEDB (Combination of Term Extraction and DeBERTaV3) for short. This method first extracts technical terms from the text information of bug reports based on Word2Vec and TextRank algorithms. Then it calculates the semantic similarity of technical terms between different bug reports by combining Word2Vec and SBERT models. Finally, it completes the DBR detection task by combining the DeBERTaV3 model. The experimental results show that CTEDB has achieved good results in detecting DBR, and has obviously improved the accuracy, F1-score, recall and precision compared with the baseline approaches.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123216659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

On Comparing Mutation Testing Tools through Learning-based Mutant Selection 基于学习的突变选择的突变检测工具比较

2023 IEEE/ACM International Conference on Automation of Software Test (AST)

Pub Date : 2023-05-01 DOI: 10.1109/AST58925.2023.00008

Miloš Ojdanić, Ahmed Khanfir, Aayush Garg, Renzo Degiovanni, Mike Papadakis, Y. Le Traon

Recently many mutation testing tools have been proposed that rely on bug-fix patterns and natural language models trained on large code corpus. As these tools operate fundamentally differently from the grammar-based traditional approaches, a question arises of how these tools compare in terms of 1) fault detection and 2) cost-effectiveness. Simultaneously, mutation testing research proposes mutant selection approaches based on machine learning to mitigate its application cost. This raises another question: How do the existing mutation testing tools compare when guided by mutant selection approaches? To answer these questions, we compare four existing tools – μBERT (uses pre-trained language model for fault seeding), IBIR (relies on inverted fix-patterns), DeepMutation (generates mutants by employing Neural Machine Translation) and PIT (applies standard grammar-based rules) in terms of fault detection capability and cost-effectiveness, in conjunction with standard and deep learning based mutant selection strategies. Our results show that IBIR has the highest fault detection capability among the four tools; however, it is not the most cost-effective when considering different selection strategies. On the other hand, μBERT having a relatively lower fault detection capability, is the most cost-effective among the four tools. Our results also indicate that comparing mutation testing tools when using deep learning-based mutant selection strategies can lead to different conclusions than the standard mutant selection. For instance, our results demonstrate that combining μBERT with deep learning-based mutant selection yields 12% higher fault detection than the considered tools.

最近提出了许多依赖于bug修复模式和在大型代码语料库上训练的自然语言模型的突变测试工具。由于这些工具的操作与基于语法的传统方法有本质上的不同，因此出现了一个问题，即这些工具如何在1)故障检测和2)成本效益方面进行比较。同时，突变检测研究提出了基于机器学习的突变选择方法，以降低其应用成本。这就提出了另一个问题:在突变选择方法的指导下，现有的突变检测工具如何进行比较?为了回答这些问题，我们比较了四种现有的工具- μBERT(使用预训练的语言模型进行故障播种)，IBIR(依赖于倒立固定模式)，DeepMutation(通过使用神经机器翻译生成突变体)和PIT(应用基于语法的标准规则)在故障检测能力和成本效益方面，以及基于标准和深度学习的突变体选择策略。结果表明，IBIR在四种工具中具有最高的故障检测能力;然而，当考虑不同的选择策略时，它并不是最具成本效益的。另一方面，μBERT的故障检测能力相对较低，是四种工具中性价比最高的。我们的研究结果还表明，在使用基于深度学习的突变选择策略时，比较突变测试工具可能会得出与标准突变选择不同的结论。例如，我们的研究结果表明，μBERT与基于深度学习的突变体选择相结合的故障检测率比考虑的工具高12%。

{"title":"On Comparing Mutation Testing Tools through Learning-based Mutant Selection","authors":"Miloš Ojdanić, Ahmed Khanfir, Aayush Garg, Renzo Degiovanni, Mike Papadakis, Y. Le Traon","doi":"10.1109/AST58925.2023.00008","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00008","url":null,"abstract":"Recently many mutation testing tools have been proposed that rely on bug-fix patterns and natural language models trained on large code corpus. As these tools operate fundamentally differently from the grammar-based traditional approaches, a question arises of how these tools compare in terms of 1) fault detection and 2) cost-effectiveness. Simultaneously, mutation testing research proposes mutant selection approaches based on machine learning to mitigate its application cost. This raises another question: How do the existing mutation testing tools compare when guided by mutant selection approaches? To answer these questions, we compare four existing tools – μBERT (uses pre-trained language model for fault seeding), IBIR (relies on inverted fix-patterns), DeepMutation (generates mutants by employing Neural Machine Translation) and PIT (applies standard grammar-based rules) in terms of fault detection capability and cost-effectiveness, in conjunction with standard and deep learning based mutant selection strategies. Our results show that IBIR has the highest fault detection capability among the four tools; however, it is not the most cost-effective when considering different selection strategies. On the other hand, μBERT having a relatively lower fault detection capability, is the most cost-effective among the four tools. Our results also indicate that comparing mutation testing tools when using deep learning-based mutant selection strategies can lead to different conclusions than the standard mutant selection. For instance, our results demonstrate that combining μBERT with deep learning-based mutant selection yields 12% higher fault detection than the considered tools.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131695824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0