Pub Date : 2023-05-01DOI: 10.1109/AST58925.2023.00017
Martin Gruber, G. Fraser
Non-deterministically behaving (i.e., flaky) tests hamper regression testing as they destroy trust and waste computational and human resources. Eradicating flakiness in test suites is therefore an important goal, but automated debugging tools are needed to support developers when trying to understand the causes of flakiness. A popular example for an automated approach to support regular debugging is spectrum-based fault localization (SFL), a technique that identifies software components that are most likely the causes of failures. While it is possible to also apply SFL for locating likely sources of flakiness in code, unfortunately the flakiness makes SFL both imprecise and non-deterministic. In this paper we introduce SFFL (Spectrum-based Flaky Fault Localization), an extension of traditional coverage-based SFL that exploits our observation that 80% of flaky tests exhibit varying coverage behavior between different runs. By distinguishing between stable and flaky coverage, SFFL is able to locate the sources of flakiness more precisely and keeps the localization itself deterministic. An evaluation on 101 flaky tests taken from 48 open-source Python projects demonstrates that SFFL is effective: Of five prominent SFL formulas, DStar, Ochiai, and Op2 yield the best overall performance. On average, they are able to narrow down the fault’s location to 3.5% of the project’s code base, which is 18.7% better than traditional SFL (for DStar). SFFL’s effectiveness, however, depends on the root causes of flakiness: The source of non-order-dependent flaky tests can be located far more precisely than order-dependent faults.
{"title":"Debugging Flaky Tests using Spectrum-based Fault Localization","authors":"Martin Gruber, G. Fraser","doi":"10.1109/AST58925.2023.00017","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00017","url":null,"abstract":"Non-deterministically behaving (i.e., flaky) tests hamper regression testing as they destroy trust and waste computational and human resources. Eradicating flakiness in test suites is therefore an important goal, but automated debugging tools are needed to support developers when trying to understand the causes of flakiness. A popular example for an automated approach to support regular debugging is spectrum-based fault localization (SFL), a technique that identifies software components that are most likely the causes of failures. While it is possible to also apply SFL for locating likely sources of flakiness in code, unfortunately the flakiness makes SFL both imprecise and non-deterministic. In this paper we introduce SFFL (Spectrum-based Flaky Fault Localization), an extension of traditional coverage-based SFL that exploits our observation that 80% of flaky tests exhibit varying coverage behavior between different runs. By distinguishing between stable and flaky coverage, SFFL is able to locate the sources of flakiness more precisely and keeps the localization itself deterministic. An evaluation on 101 flaky tests taken from 48 open-source Python projects demonstrates that SFFL is effective: Of five prominent SFL formulas, DStar, Ochiai, and Op2 yield the best overall performance. On average, they are able to narrow down the fault’s location to 3.5% of the project’s code base, which is 18.7% better than traditional SFL (for DStar). SFFL’s effectiveness, however, depends on the root causes of flakiness: The source of non-order-dependent flaky tests can be located far more precisely than order-dependent faults.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125832419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/ast58925.2023.00025
{"title":"Message from AST 2023 Chairs","authors":"","doi":"10.1109/ast58925.2023.00025","DOIUrl":"https://doi.org/10.1109/ast58925.2023.00025","url":null,"abstract":"","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121221206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/AST58925.2023.00023
Rezwana Mamata, Akramul Azim, R. Liscano, Kevin Smith, Yee-Kang Chang, Gkerta Seferi, Qasim Tauseef
The continuous Integration (CI) process runs a large set of automated test cases to verify software builds. The testing phase in the CI systems has timing constraints to ensure software quality without significantly delaying the CI builds. Therefore, CI requires efficient testing techniques such as Test Case Prioritization (TCP) to run faulty test cases with priority. Recent research studies on TCP utilize different Machine Learning (ML) methods to adopt the dynamic and complex nature of CI. However, the performance of ML for TCP may decrease for a low volume of data and less failure rate, whereas using existing data with similar patterns from other domains can be valuable. We formulate this as a transfer learning (TL) problem. TL has proven to be beneficial for many real-world applications where source domains have plenty of data, but the target domains have a scarcity of it. Therefore, this research investigates leveraging the benefit of transfer learning for test case prioritization (TCP). However, only some industrial CI datasets are publicly available due to data privacy protection regulations. In such cases, model-based transfer learning is a potential solution to share knowledge among different projects without revealing data to other stakeholders. This paper applies TransBoost, a tree-kernel-based TL algorithm, to evaluate the TL approach for 24 study subjects and identify potential source datasets.
{"title":"Test Case Prioritization using Transfer Learning in Continuous Integration Environments","authors":"Rezwana Mamata, Akramul Azim, R. Liscano, Kevin Smith, Yee-Kang Chang, Gkerta Seferi, Qasim Tauseef","doi":"10.1109/AST58925.2023.00023","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00023","url":null,"abstract":"The continuous Integration (CI) process runs a large set of automated test cases to verify software builds. The testing phase in the CI systems has timing constraints to ensure software quality without significantly delaying the CI builds. Therefore, CI requires efficient testing techniques such as Test Case Prioritization (TCP) to run faulty test cases with priority. Recent research studies on TCP utilize different Machine Learning (ML) methods to adopt the dynamic and complex nature of CI. However, the performance of ML for TCP may decrease for a low volume of data and less failure rate, whereas using existing data with similar patterns from other domains can be valuable. We formulate this as a transfer learning (TL) problem. TL has proven to be beneficial for many real-world applications where source domains have plenty of data, but the target domains have a scarcity of it. Therefore, this research investigates leveraging the benefit of transfer learning for test case prioritization (TCP). However, only some industrial CI datasets are publicly available due to data privacy protection regulations. In such cases, model-based transfer learning is a potential solution to share knowledge among different projects without revealing data to other stakeholders. This paper applies TransBoost, a tree-kernel-based TL algorithm, to evaluate the TL approach for 24 study subjects and identify potential source datasets.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124588051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/AST58925.2023.00011
Julian Thomé, James Johnson, Isaac Dawson, Dinesh Bolkensteyn, Michael Henriksen, Mark Art
The rising popularity and adoption of source-code management systems in combination with Continuous Integration and Continuous Delivery (CI/CD) processes have contributed to the adoption of agile software development with short release and feedback cycles between software producers and their customers. DevOps platforms streamline and enhance automation around source-code management systems by providing a uniform interface for managing all the aspects of the software development lifecycle starting from software development until software deployment and by integrating and orchestrating various tools that provide automation around software development processes such as automated bug detection, security testing, dependency scanning, etc..Applying changes to the DevOps platform or to one of the integrated tools without providing data regarding its real world impact increases the risk of having to remove/revert the change. This could lead to service disruption or loss of confidence in the platform if it does not perform as expected. In addition, integrating alpha or beta features, which may not meet the robustness of a finalised feature, may pose security or stability risks to the entire platform. Hence, short release cycles require testing and benchmarking approaches that make it possible to prototype, test, and benchmark ideas quickly and at scale to support Data-Driven Decision Making, with respect to the features that are about to be integrated into the platform.In this paper, we propose a scalable testing and benchmarking approach called SourceWarp that is targeted towards DevOps platforms and supports both testing and benchmarking in a cost effective and reproducible manner. We have implemented the proposed approach in the publicly available SourceWarp tool which we have evaluated in the context of a real-world industrial case-study. We successfully applied SourceWarp to test and benchmark a newly developed feature at GitLab which has been successfully integrated into the product. In the case study we demonstrate that SourceWarp is scalable and highly effective in supporting agile Data-Driven Decision Making by providing automation for testing and benchmarking proof-of-concept ideas for CI/CD tools, chained CI/CD tools (also referred to as pipeline), for the DevOps platform or a combination of them without having to deploy features to the staging or production environments.
{"title":"SourceWarp: A scalable, SCM-driven testing and benchmarking approach to support data-driven and agile decision making for CI/CD tools and DevOps platforms","authors":"Julian Thomé, James Johnson, Isaac Dawson, Dinesh Bolkensteyn, Michael Henriksen, Mark Art","doi":"10.1109/AST58925.2023.00011","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00011","url":null,"abstract":"The rising popularity and adoption of source-code management systems in combination with Continuous Integration and Continuous Delivery (CI/CD) processes have contributed to the adoption of agile software development with short release and feedback cycles between software producers and their customers. DevOps platforms streamline and enhance automation around source-code management systems by providing a uniform interface for managing all the aspects of the software development lifecycle starting from software development until software deployment and by integrating and orchestrating various tools that provide automation around software development processes such as automated bug detection, security testing, dependency scanning, etc..Applying changes to the DevOps platform or to one of the integrated tools without providing data regarding its real world impact increases the risk of having to remove/revert the change. This could lead to service disruption or loss of confidence in the platform if it does not perform as expected. In addition, integrating alpha or beta features, which may not meet the robustness of a finalised feature, may pose security or stability risks to the entire platform. Hence, short release cycles require testing and benchmarking approaches that make it possible to prototype, test, and benchmark ideas quickly and at scale to support Data-Driven Decision Making, with respect to the features that are about to be integrated into the platform.In this paper, we propose a scalable testing and benchmarking approach called SourceWarp that is targeted towards DevOps platforms and supports both testing and benchmarking in a cost effective and reproducible manner. We have implemented the proposed approach in the publicly available SourceWarp tool which we have evaluated in the context of a real-world industrial case-study. We successfully applied SourceWarp to test and benchmark a newly developed feature at GitLab which has been successfully integrated into the product. In the case study we demonstrate that SourceWarp is scalable and highly effective in supporting agile Data-Driven Decision Making by providing automation for testing and benchmarking proof-of-concept ideas for CI/CD tools, chained CI/CD tools (also referred to as pipeline), for the DevOps platform or a combination of them without having to deploy features to the staging or production environments.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124696255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/AST58925.2023.00022
Weisong Sun, Weidong Qian, Bin Luo, Zhenyu Chen
Off-the-shelf test cases provide developers with testing knowledge for their reference or reuse, which can help them reduce the effort of creating new test cases. Test case recommendation, a major way of achieving test case reuse, has been receiving the attention of researchers. The basic idea behind test case recommendation is that two similar test targets (methods under test) can reuse each other’s test cases. However, existing test case recommendation techniques either cannot be used in the cross-project scenario, or have low performance in terms of effectiveness and efficiency. In this paper, we propose a novel test case recommendation technique based on multi-level signature matching. The proposed multi-level signature matching consists of three matching strategies with different strict levels, including level-0 exact matching, level-1 fuzzy matching, and level-2 fuzzy matching. For the query test target given by the developer, level-0 exact matching helps to retrieve exact recommendations (test cases), while level-1 and level-2 fuzzy matching contribute to discovering richer relevant recommendations. We further develop a prototype called MuTCR for test case recommendation. We conduct comprehensive experiments to evaluate the effectiveness and efficiency of MuTCR. The experimental results demonstrate that compared with the state-of-the-art, MuTCR can recommend accurate test cases for more test targets. MuTCR is faster than the best baseline by three times based on the time cost. The user study is also performed to prove that the test cases recommended by MuTCR are useful in practice.
{"title":"MuTCR: Test Case Recommendation via Multi-Level Signature Matching","authors":"Weisong Sun, Weidong Qian, Bin Luo, Zhenyu Chen","doi":"10.1109/AST58925.2023.00022","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00022","url":null,"abstract":"Off-the-shelf test cases provide developers with testing knowledge for their reference or reuse, which can help them reduce the effort of creating new test cases. Test case recommendation, a major way of achieving test case reuse, has been receiving the attention of researchers. The basic idea behind test case recommendation is that two similar test targets (methods under test) can reuse each other’s test cases. However, existing test case recommendation techniques either cannot be used in the cross-project scenario, or have low performance in terms of effectiveness and efficiency. In this paper, we propose a novel test case recommendation technique based on multi-level signature matching. The proposed multi-level signature matching consists of three matching strategies with different strict levels, including level-0 exact matching, level-1 fuzzy matching, and level-2 fuzzy matching. For the query test target given by the developer, level-0 exact matching helps to retrieve exact recommendations (test cases), while level-1 and level-2 fuzzy matching contribute to discovering richer relevant recommendations. We further develop a prototype called MuTCR for test case recommendation. We conduct comprehensive experiments to evaluate the effectiveness and efficiency of MuTCR. The experimental results demonstrate that compared with the state-of-the-art, MuTCR can recommend accurate test cases for more test targets. MuTCR is faster than the best baseline by three times based on the time cost. The user study is also performed to prove that the test cases recommended by MuTCR are useful in practice.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131464945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/AST58925.2023.00019
Sydur Rahaman, Umar Farooq, Iulian Neamtiu, Zhijia Zhao
A common feature in Android apps is saving, or exporting, user’s work (e.g., a drawing) as well as data (e.g., a spreadsheet) onto local storage, as a file. Due to the volatile nature of the OS and the mobile environment in general, the system can terminate apps without notice, which prevents the execution of file write operations; consequently, user data that was supposed to be saved/exported is instead lost. Testing apps for such potential losses raises several challenges: how to identify data originating from user input or resulting from user action (then check whether it is saved), and how to reproduce a potential error by terminating the app at the exact moment when unsaved changes are pending. We address these challenges via an approach that finds potential “lost writes”, i.e., user data supposed to be written to a file, but the file write does not take place due to system-initiated termination. Our approach consists of two phases: a static analysis that finds potential losses and a dynamic loss verification phase where we compare lossy and lossless system-level file write traces to confirm errors. We ran our analysis on 2,182 apps from Google Play and 38 apps from F-Droid. Our approach found 163 apps where termination caused losses, including losing user’s app-specific data, notes, photos, user’s work and settings. In contrast, two state-of-the-art tools aimed at finding volatility errors in Android apps failed to discover the issues we found.
{"title":"Detecting Potential User-data Save & Export Losses due to Android App Termination","authors":"Sydur Rahaman, Umar Farooq, Iulian Neamtiu, Zhijia Zhao","doi":"10.1109/AST58925.2023.00019","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00019","url":null,"abstract":"A common feature in Android apps is saving, or exporting, user’s work (e.g., a drawing) as well as data (e.g., a spreadsheet) onto local storage, as a file. Due to the volatile nature of the OS and the mobile environment in general, the system can terminate apps without notice, which prevents the execution of file write operations; consequently, user data that was supposed to be saved/exported is instead lost. Testing apps for such potential losses raises several challenges: how to identify data originating from user input or resulting from user action (then check whether it is saved), and how to reproduce a potential error by terminating the app at the exact moment when unsaved changes are pending. We address these challenges via an approach that finds potential “lost writes”, i.e., user data supposed to be written to a file, but the file write does not take place due to system-initiated termination. Our approach consists of two phases: a static analysis that finds potential losses and a dynamic loss verification phase where we compare lossy and lossless system-level file write traces to confirm errors. We ran our analysis on 2,182 apps from Google Play and 38 apps from F-Droid. Our approach found 163 apps where termination caused losses, including losing user’s app-specific data, notes, photos, user’s work and settings. In contrast, two state-of-the-art tools aimed at finding volatility errors in Android apps failed to discover the issues we found.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116936551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/AST58925.2023.00006
Xiaoning Chang, Zheheng Liang, Yifei Zhang, Lei Cui, Zhenyue Long, Guoquan Wu, Yu Gao, W. Chen, Jun Wei, Tao Huang
Web applications play an important role in modern society. Quality assurance of web applications requires lots of manual efforts. In this paper, we propose WebQT, an automatic test case generator for web applications based on reinforcement learning. Specifically, to increase testing efficiency, we design a new reward model, which encourages the agent to mimic human testers to interact with the web applications. To alleviate the problem of state redundancy, we further propose a novel state abstraction technique, which can identify different web pages with the same functionality as the same state, and yields a simplified state space. We evaluate WebQT on seven open-source web applications. The experimental results show that WebQT achieves 45.4% more code coverage along with higher efficiency than the state-of-the-art technique. In addition, WebQT also reveals 69 exceptions in 11 real-world web applications.
{"title":"A Reinforcement Learning Approach to Generating Test Cases for Web Applications","authors":"Xiaoning Chang, Zheheng Liang, Yifei Zhang, Lei Cui, Zhenyue Long, Guoquan Wu, Yu Gao, W. Chen, Jun Wei, Tao Huang","doi":"10.1109/AST58925.2023.00006","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00006","url":null,"abstract":"Web applications play an important role in modern society. Quality assurance of web applications requires lots of manual efforts. In this paper, we propose WebQT, an automatic test case generator for web applications based on reinforcement learning. Specifically, to increase testing efficiency, we design a new reward model, which encourages the agent to mimic human testers to interact with the web applications. To alleviate the problem of state redundancy, we further propose a novel state abstraction technique, which can identify different web pages with the same functionality as the same state, and yields a simplified state space. We evaluate WebQT on seven open-source web applications. The experimental results show that WebQT achieves 45.4% more code coverage along with higher efficiency than the state-of-the-art technique. In addition, WebQT also reveals 69 exceptions in 11 real-world web applications.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127499683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/ast58925.2023.00027
{"title":"AST 2023 Program Committee","authors":"","doi":"10.1109/ast58925.2023.00027","DOIUrl":"https://doi.org/10.1109/ast58925.2023.00027","url":null,"abstract":"","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123138684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/AST58925.2023.00005
Xiaoxue Wu, Wenjing Shan, Wei Zheng, Zhiguo Chen, Tao Ren, Xiaobing Sun
As the bug description data generated during the software maintenance cycle, bug reports are usually hastily written by different users, resulting in many redundant and duplicate bug reports (DBRs). Once the DBRs are repeatedly assigned to developers, it will inevitably lead to a serious waste of human resources, especially for large-scale open-source projects. Recently, many experts and scholars have devoted themselves to researching the detection of DBRs and put forward a series of detection methods for DBRs. However, there is still much room for improvement in the performance of DBR prediction. Therefore, this paper proposes a new method for detecting DBR based on technical term extraction, CTEDB (Combination of Term Extraction and DeBERTaV3) for short. This method first extracts technical terms from the text information of bug reports based on Word2Vec and TextRank algorithms. Then it calculates the semantic similarity of technical terms between different bug reports by combining Word2Vec and SBERT models. Finally, it completes the DBR detection task by combining the DeBERTaV3 model. The experimental results show that CTEDB has achieved good results in detecting DBR, and has obviously improved the accuracy, F1-score, recall and precision compared with the baseline approaches.
bug报告作为软件维护周期中产生的bug描述数据,通常由不同的用户匆忙编写,导致大量冗余和重复的bug报告。一旦将dbr反复分配给开发人员,必然会导致人力资源的严重浪费,特别是对于大型开源项目。近年来,许多专家学者致力于dbr的检测研究,提出了一系列dbr的检测方法。然而,DBR预测的性能仍有很大的提升空间。为此,本文提出了一种基于技术术语提取的DBR检测新方法,简称CTEDB (Combination of term extraction and DeBERTaV3)。该方法首先基于Word2Vec和TextRank算法从bug报告的文本信息中提取技术术语。然后结合Word2Vec和SBERT模型计算不同bug报告之间技术术语的语义相似度。最后结合DeBERTaV3模型完成DBR检测任务。实验结果表明,CTEDB在检测DBR方面取得了较好的效果,与基线方法相比,准确率、f1分数、查全率和查准率均有明显提高。
{"title":"An Intelligent Duplicate Bug Report Detection Method Based on Technical Term Extraction","authors":"Xiaoxue Wu, Wenjing Shan, Wei Zheng, Zhiguo Chen, Tao Ren, Xiaobing Sun","doi":"10.1109/AST58925.2023.00005","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00005","url":null,"abstract":"As the bug description data generated during the software maintenance cycle, bug reports are usually hastily written by different users, resulting in many redundant and duplicate bug reports (DBRs). Once the DBRs are repeatedly assigned to developers, it will inevitably lead to a serious waste of human resources, especially for large-scale open-source projects. Recently, many experts and scholars have devoted themselves to researching the detection of DBRs and put forward a series of detection methods for DBRs. However, there is still much room for improvement in the performance of DBR prediction. Therefore, this paper proposes a new method for detecting DBR based on technical term extraction, CTEDB (Combination of Term Extraction and DeBERTaV3) for short. This method first extracts technical terms from the text information of bug reports based on Word2Vec and TextRank algorithms. Then it calculates the semantic similarity of technical terms between different bug reports by combining Word2Vec and SBERT models. Finally, it completes the DBR detection task by combining the DeBERTaV3 model. The experimental results show that CTEDB has achieved good results in detecting DBR, and has obviously improved the accuracy, F1-score, recall and precision compared with the baseline approaches.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123216659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-01DOI: 10.1109/AST58925.2023.00008
Miloš Ojdanić, Ahmed Khanfir, Aayush Garg, Renzo Degiovanni, Mike Papadakis, Y. Le Traon
Recently many mutation testing tools have been proposed that rely on bug-fix patterns and natural language models trained on large code corpus. As these tools operate fundamentally differently from the grammar-based traditional approaches, a question arises of how these tools compare in terms of 1) fault detection and 2) cost-effectiveness. Simultaneously, mutation testing research proposes mutant selection approaches based on machine learning to mitigate its application cost. This raises another question: How do the existing mutation testing tools compare when guided by mutant selection approaches? To answer these questions, we compare four existing tools – μBERT (uses pre-trained language model for fault seeding), IBIR (relies on inverted fix-patterns), DeepMutation (generates mutants by employing Neural Machine Translation) and PIT (applies standard grammar-based rules) in terms of fault detection capability and cost-effectiveness, in conjunction with standard and deep learning based mutant selection strategies. Our results show that IBIR has the highest fault detection capability among the four tools; however, it is not the most cost-effective when considering different selection strategies. On the other hand, μBERT having a relatively lower fault detection capability, is the most cost-effective among the four tools. Our results also indicate that comparing mutation testing tools when using deep learning-based mutant selection strategies can lead to different conclusions than the standard mutant selection. For instance, our results demonstrate that combining μBERT with deep learning-based mutant selection yields 12% higher fault detection than the considered tools.
{"title":"On Comparing Mutation Testing Tools through Learning-based Mutant Selection","authors":"Miloš Ojdanić, Ahmed Khanfir, Aayush Garg, Renzo Degiovanni, Mike Papadakis, Y. Le Traon","doi":"10.1109/AST58925.2023.00008","DOIUrl":"https://doi.org/10.1109/AST58925.2023.00008","url":null,"abstract":"Recently many mutation testing tools have been proposed that rely on bug-fix patterns and natural language models trained on large code corpus. As these tools operate fundamentally differently from the grammar-based traditional approaches, a question arises of how these tools compare in terms of 1) fault detection and 2) cost-effectiveness. Simultaneously, mutation testing research proposes mutant selection approaches based on machine learning to mitigate its application cost. This raises another question: How do the existing mutation testing tools compare when guided by mutant selection approaches? To answer these questions, we compare four existing tools – μBERT (uses pre-trained language model for fault seeding), IBIR (relies on inverted fix-patterns), DeepMutation (generates mutants by employing Neural Machine Translation) and PIT (applies standard grammar-based rules) in terms of fault detection capability and cost-effectiveness, in conjunction with standard and deep learning based mutant selection strategies. Our results show that IBIR has the highest fault detection capability among the four tools; however, it is not the most cost-effective when considering different selection strategies. On the other hand, μBERT having a relatively lower fault detection capability, is the most cost-effective among the four tools. Our results also indicate that comparing mutation testing tools when using deep learning-based mutant selection strategies can lead to different conclusions than the standard mutant selection. For instance, our results demonstrate that combining μBERT with deep learning-based mutant selection yields 12% higher fault detection than the considered tools.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131695824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}