Bernhard Garn, Dominik Schreiber, D. Simos, D. R. Kuhn, J. Voas, R. Kacker
In this paper, we report on applying combinatorial testing to Internet of Things (IoT) home automation hub systems. We detail how to create a dedicated input parameter model of an IoT home automation hub system for use with combinatorial test case generation strategies. Further, we developed an automated test execution framework and two test oracles for evaluation purposes. We applied and evaluated our proposed methodological approach to a real‐world IoT system and analysed the obtained results of various combinatorial test sets with different properties generated based on the derived input model. Additionally, we compare these results to a random testing approach. Our empirical testing evaluations revealed multiple errors in the tested devices and also showed that all considered approaches performed nearly equally well.
{"title":"Combinatorial methods for testing Internet of Things smart home systems","authors":"Bernhard Garn, Dominik Schreiber, D. Simos, D. R. Kuhn, J. Voas, R. Kacker","doi":"10.1002/stvr.1805","DOIUrl":"https://doi.org/10.1002/stvr.1805","url":null,"abstract":"In this paper, we report on applying combinatorial testing to Internet of Things (IoT) home automation hub systems. We detail how to create a dedicated input parameter model of an IoT home automation hub system for use with combinatorial test case generation strategies. Further, we developed an automated test execution framework and two test oracles for evaluation purposes. We applied and evaluated our proposed methodological approach to a real‐world IoT system and analysed the obtained results of various combinatorial test sets with different properties generated based on the derived input model. Additionally, we compare these results to a random testing approach. Our empirical testing evaluations revealed multiple errors in the tested devices and also showed that all considered approaches performed nearly equally well.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"32 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79475363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Henrique Neves da Silva, Jackson A. Prado Lima, S. Vergilio, A. T. Endo
The use of mutation testing for mobile applications (apps for short) is still a challenge. Mobile apps are usually event‐driven and encompass graphical user interfaces (GUIs) and a complex execution environment. Then, they require mutant operators to describe specific apps faults, and the automation of the mutation process phases like execution and analysis of the mutants is not an easy task. To encourage research addressing such challenges, this paper presents results from a mapping study on mutation testing for mobile apps. Following a systematic plan, we found 16 primary studies that were analysed according to three aspects: (i) trends and statistics about the field; (ii) study characteristics such as focus, proposed operators and automated support for the mutation testing phases; and (iii) evaluation aspects. The great majority of studies (98%) have been published in the last 3 years. The most addressed language is Java, and Android is the only operating system considered. Mutant operators of GUI and configuration types are prevalent in a total of 138 operators found. Most studies implement a supporting tool, but few tools support mutant execution and analysis. The evaluation conducted by the studies includes apps mainly from the finance and utility domain. Nevertheless, there is a lack of benchmarks and more rigorous experiments. Future research should address other specific types of faults, languages, and operating systems. They should offer support for mutant execution and analysis, as well as to reduce the mutation testing cost and limitations in the mobile context.
{"title":"A mapping study on mutation testing for mobile applications","authors":"Henrique Neves da Silva, Jackson A. Prado Lima, S. Vergilio, A. T. Endo","doi":"10.1002/stvr.1801","DOIUrl":"https://doi.org/10.1002/stvr.1801","url":null,"abstract":"The use of mutation testing for mobile applications (apps for short) is still a challenge. Mobile apps are usually event‐driven and encompass graphical user interfaces (GUIs) and a complex execution environment. Then, they require mutant operators to describe specific apps faults, and the automation of the mutation process phases like execution and analysis of the mutants is not an easy task. To encourage research addressing such challenges, this paper presents results from a mapping study on mutation testing for mobile apps. Following a systematic plan, we found 16 primary studies that were analysed according to three aspects: (i) trends and statistics about the field; (ii) study characteristics such as focus, proposed operators and automated support for the mutation testing phases; and (iii) evaluation aspects. The great majority of studies (98%) have been published in the last 3 years. The most addressed language is Java, and Android is the only operating system considered. Mutant operators of GUI and configuration types are prevalent in a total of 138 operators found. Most studies implement a supporting tool, but few tools support mutant execution and analysis. The evaluation conducted by the studies includes apps mainly from the finance and utility domain. Nevertheless, there is a lack of benchmarks and more rigorous experiments. Future research should address other specific types of faults, languages, and operating systems. They should offer support for mutant execution and analysis, as well as to reduce the mutation testing cost and limitations in the mobile context.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"31 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85928361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fault localization techniques are used to deduce the exact source of a failure from a set of failure indications while debugging software and play a crucial role in improving software quality. Mutation‐based fault localization (MBFL) techniques are proposed to localize faults at a finer granularity and with higher accuracy than traditional fault localization techniques. Despite the technique's effectiveness, the immense cost of mutation analysis hinders MBFL's practical application in the industry. Various mutation alternative strategies are utilized to lower the cost of MBFL, but they sacrifice the accuracy of localization results. Higher‐order mutation testing was proposed to search for valuable mutants that drive testing harder and reduce the overall test effort. However, higher‐order mutants (HOMs) never have been used to address the cost problem of MBFL to the extent of our knowledge. This paper proposes a novel, cost‐effective MBFL technique called HOTFUZ, Higher‐Order muTation‐based FaUlt localiZation, that employs HOMs to reduce the cost while minimizing the accuracy degradation. HOTFUZ combines mutants of a program under test into HOMs to decrease the number of mutants by more than half, depending on the order of HOMs. An experimental study is conducted using 65 real‐world faults of CoREBench to assess the proposed approach's cost‐effectiveness. The experimental results show that HOTFUZ outperforms the extant mutation alternative strategies by localizing faults more accurately using the same number of mutants executed. HOTFUZ has three main benefits over existing mutant reduction techniques for MBFL: (a) It keeps the advantage of using the whole set of mutation operators; (b) it does not discard generated mutants randomly for the sake of efficiency; and, finally, (c) it significantly decreases the proportion of equivalent mutants.
{"title":"HOTFUZ: Cost‐effective higher‐order mutation‐based fault localization","authors":"Jong-In Jang, Duksan Ryu, Jong-Chan Baik","doi":"10.1002/stvr.1802","DOIUrl":"https://doi.org/10.1002/stvr.1802","url":null,"abstract":"Fault localization techniques are used to deduce the exact source of a failure from a set of failure indications while debugging software and play a crucial role in improving software quality. Mutation‐based fault localization (MBFL) techniques are proposed to localize faults at a finer granularity and with higher accuracy than traditional fault localization techniques. Despite the technique's effectiveness, the immense cost of mutation analysis hinders MBFL's practical application in the industry. Various mutation alternative strategies are utilized to lower the cost of MBFL, but they sacrifice the accuracy of localization results. Higher‐order mutation testing was proposed to search for valuable mutants that drive testing harder and reduce the overall test effort. However, higher‐order mutants (HOMs) never have been used to address the cost problem of MBFL to the extent of our knowledge. This paper proposes a novel, cost‐effective MBFL technique called HOTFUZ, Higher‐Order muTation‐based FaUlt localiZation, that employs HOMs to reduce the cost while minimizing the accuracy degradation. HOTFUZ combines mutants of a program under test into HOMs to decrease the number of mutants by more than half, depending on the order of HOMs. An experimental study is conducted using 65 real‐world faults of CoREBench to assess the proposed approach's cost‐effectiveness. The experimental results show that HOTFUZ outperforms the extant mutation alternative strategies by localizing faults more accurately using the same number of mutants executed. HOTFUZ has three main benefits over existing mutant reduction techniques for MBFL: (a) It keeps the advantage of using the whole set of mutation operators; (b) it does not discard generated mutants randomly for the sake of efficiency; and, finally, (c) it significantly decreases the proportion of equivalent mutants.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"70 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82148627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This issue contains four papers. The first paper focuses on model checking, the second and third papers focus on testing and the last paper focuses on debugging. The first paper, ‘Model checking C++ programs’ by Felipe R. Monteiro, Mikhail R. Gadelha and Lucas C. Cordeiro, is motivated by memory safety issues and how these have proved to be a source of security vulnerabilities. The authors devised a novel bounded model checking approach. The first step was to encode a number of C++ features in a decidable fragment of first-order logic. SMT solvers were then used to carry out verification. In experiments, the proposed approach was found to outperform state-of-the-art verifiers. The prototype tool also found arithmeticoverflow errors in a commercial application. (Recommended by Professor Pretscher) The second paper, ‘GPU acceleration of finite state machine input execution: Improving scale and performance’, by Vanya Yaneva, Ajitha Rajan and Christophe Dubach looks at the problem of executing a large number of tests on a finite state machine (FSM). The motivation for this work is model validation. The approach devised uses GPUs to allow multiple tests to be run in parallel. The authors built on their previous work, which showed how FSM execution can be performed on a GPU, by addressing a number of limitations. In particular, the authors addressed the data transfer overhead and they also performed experiments with FSMs that were too large to fit into GPU memory. In the experiments, the novel optimisations led to further improvements, with the GPU being over four times faster, on average, than a 16-core CPU. (Recommended by Professor Pretscher) The third paper, ‘Survey on test case generation, selection and prioritization for cyber-physical systems’, by Zahra Sadri-Moshkenani, Justin Bradley and Gregg Rothermel, presents a survey of approaches that generate, select or prioritise test cases for cyber-physical systems. The authors identified 34 related papers (26 papers on test generation, 6 papers on test selection and 7 papers on test prioritisation) and classified them according to 8 properties distilled by the authors from past experience. From the survey results, the authors identified a number of open challenges. To address some of these challenges, existing approaches may be adapted or new approaches may be developed. (Recommended by Professor Phil McMinn) The fourth paper, ‘Effective fault localization and context-aware debugging for concurrent programs’, by Justin Chu, Tingting Yu, Jane Huffman Hayes, Xue Han and Yu Zhao, presents Coadec, an approach for automatically generating interthread control flow paths to diagnose concurrency bugs. Coadec consists of two phases: concurrency fault localization and context-aware debugging. The authors evaluated Coadec on 10 real-world multithreaded Java applications and showed that Coadec outperforms state-of-the-art approaches for localising concurrency faults and that Coadec’s context debugging can help
这期有四篇论文。第一篇论文的重点是模型检查,第二篇和第三篇论文的重点是测试,最后一篇论文的重点是调试。第一篇论文是由Felipe R. Monteiro、Mikhail R. Gadelha和Lucas C. Cordeiro撰写的《c++程序的模型检查》,其动机是内存安全问题,以及这些问题如何被证明是安全漏洞的来源。作者设计了一种新的有界模型检验方法。第一步是在一阶逻辑的可确定片段中编码一些c++特性。然后使用SMT求解器进行验证。在实验中,所提出的方法被发现优于最先进的验证器。原型工具还在商业应用中发现了算术覆盖流错误。第二篇论文,“有限状态机输入执行的GPU加速:改进规模和性能”,由Vanya Yaneva, Ajitha Rajan和Christophe Dubach撰写,着眼于在有限状态机(FSM)上执行大量测试的问题。这项工作的动机是模型验证。设计的方法使用gpu来允许多个测试并行运行。作者在他们之前的工作的基础上,通过解决一些限制,展示了如何在GPU上执行FSM。特别地,作者解决了数据传输开销,他们还对太大而无法装入GPU内存的fsm进行了实验。在实验中,新的优化带来了进一步的改进,GPU的平均速度是16核CPU的四倍多。第三篇论文,“对网络物理系统的测试用例生成、选择和优先级的调查”,由Zahra Sadri-Moshkenani、Justin Bradley和Gregg Rothermel撰写,介绍了为网络物理系统生成、选择或优先级测试用例的方法的调查。作者选取了34篇相关论文(26篇关于测试生成,6篇关于测试选择,7篇关于测试优先级),并根据作者从过去经验中提炼出来的8个属性对其进行了分类。从调查结果中,作者确定了一些开放的挑战。为了应对其中的一些挑战,可能会调整现有的方法或开发新的方法。第四篇论文,“并发程序的有效故障定位和上下文感知调试”,由Justin Chu、Tingting Yu、Jane Huffman Hayes、Xue Han和Yu Zhao撰写,介绍了Coadec,一种自动生成线程间控制流路径来诊断并发错误的方法。Coadec包括两个阶段:并发错误定位和上下文感知调试。作者在10个真实的多线程Java应用程序上评估了Coadec,并表明Coadec在定位并发性错误方面优于最先进的方法,而且Coadec的上下文调试可以帮助开发人员通过检查一小部分代码来理解并发性错误。(Marc Roper推荐)
{"title":"Model checking, testing and debugging","authors":"R. Hierons, Tao Xie","doi":"10.1002/stvr.1803","DOIUrl":"https://doi.org/10.1002/stvr.1803","url":null,"abstract":"This issue contains four papers. The first paper focuses on model checking, the second and third papers focus on testing and the last paper focuses on debugging. The first paper, ‘Model checking C++ programs’ by Felipe R. Monteiro, Mikhail R. Gadelha and Lucas C. Cordeiro, is motivated by memory safety issues and how these have proved to be a source of security vulnerabilities. The authors devised a novel bounded model checking approach. The first step was to encode a number of C++ features in a decidable fragment of first-order logic. SMT solvers were then used to carry out verification. In experiments, the proposed approach was found to outperform state-of-the-art verifiers. The prototype tool also found arithmeticoverflow errors in a commercial application. (Recommended by Professor Pretscher) The second paper, ‘GPU acceleration of finite state machine input execution: Improving scale and performance’, by Vanya Yaneva, Ajitha Rajan and Christophe Dubach looks at the problem of executing a large number of tests on a finite state machine (FSM). The motivation for this work is model validation. The approach devised uses GPUs to allow multiple tests to be run in parallel. The authors built on their previous work, which showed how FSM execution can be performed on a GPU, by addressing a number of limitations. In particular, the authors addressed the data transfer overhead and they also performed experiments with FSMs that were too large to fit into GPU memory. In the experiments, the novel optimisations led to further improvements, with the GPU being over four times faster, on average, than a 16-core CPU. (Recommended by Professor Pretscher) The third paper, ‘Survey on test case generation, selection and prioritization for cyber-physical systems’, by Zahra Sadri-Moshkenani, Justin Bradley and Gregg Rothermel, presents a survey of approaches that generate, select or prioritise test cases for cyber-physical systems. The authors identified 34 related papers (26 papers on test generation, 6 papers on test selection and 7 papers on test prioritisation) and classified them according to 8 properties distilled by the authors from past experience. From the survey results, the authors identified a number of open challenges. To address some of these challenges, existing approaches may be adapted or new approaches may be developed. (Recommended by Professor Phil McMinn) The fourth paper, ‘Effective fault localization and context-aware debugging for concurrent programs’, by Justin Chu, Tingting Yu, Jane Huffman Hayes, Xue Han and Yu Zhao, presents Coadec, an approach for automatically generating interthread control flow paths to diagnose concurrency bugs. Coadec consists of two phases: concurrency fault localization and context-aware debugging. The authors evaluated Coadec on 10 real-world multithreaded Java applications and showed that Coadec outperforms state-of-the-art approaches for localising concurrency faults and that Coadec’s context debugging can help ","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"12 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88219290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This issue contains four papers. The first and third papers focus on property generation and property verification, respectively, while the second and fourth papers focus on empirical studies of a fault prediction algorithm and test flakiness, respectively. The first paper, “Documentation-based functional constraint generation for library methods,” by Renhe Jiang, Zhengzhao Chen, Yu Pei, Minxue Pan, Tian Zhang, and Xuandong Li, proposes DOC2SMT, an approach that generates functional constraints for library methods based on their documentations. DOC2SMT first translates a method’s documentation into candidate constraint clauses, which are then filtered based on static and dynamic validations. The experimental results show the effectiveness and efficiency of DOC2SMT and also show the benefits of the generated constraints for symbolic-execution-based test generation (recommended by Peter Müller). The second paper, “An empirical study of Linespots: A novel past-fault algorithm,” by Maximilian Scholz and Richard Torkar, proposes a new fault prediction algorithm called Linespots. The authors focus on fault prediction based on past faults and refine a previous algorithm (Bugspots). Interestingly, they used a different granularity: line as opposed to file, and this necessitated the development of a benchmark set of experimental subjects. In experiments, Linespots was found to outperform Bugspots (recommended by Xiaoyin Wang). The third paper, “Integrating pattern matching and abstract interpretation for verifying cautions of microcontrollers,” by Thuy Nguyen, Takashi Tomita, Junpei Endo, and Toshiaki Aoki, proposes a semi-automatic approach for verifying cautions, which are hardware-dependent properties described in microcontrollers hardware manuals. For this approach, the authors integrate pattern matching and abstract interpretation, two static program analysis techniques. The experimental results show the feasibility and applicability of the approach (recommended by Marcio Delamaro). The fourth paper, “Empirical analysis of practitioners’ perceptions of test flakiness factors,” by Azeem Ahmad, Ola Leifler, and Kristian Sandahl, concerns flaky tests. A flaky test is one where different executions with the same test can lead to different outcomes/verdicts. The authors explore developer perception regarding factors that affect flakiness, concentrating on developers of closed-source software. They also examine two test suites and identify the test smells that lead to flakiness (recommended by Mike Papadakis).
{"title":"Property generation/verification and empirical studies","authors":"R. Hierons, Tao Xie","doi":"10.1002/stvr.1800","DOIUrl":"https://doi.org/10.1002/stvr.1800","url":null,"abstract":"This issue contains four papers. The first and third papers focus on property generation and property verification, respectively, while the second and fourth papers focus on empirical studies of a fault prediction algorithm and test flakiness, respectively. The first paper, “Documentation-based functional constraint generation for library methods,” by Renhe Jiang, Zhengzhao Chen, Yu Pei, Minxue Pan, Tian Zhang, and Xuandong Li, proposes DOC2SMT, an approach that generates functional constraints for library methods based on their documentations. DOC2SMT first translates a method’s documentation into candidate constraint clauses, which are then filtered based on static and dynamic validations. The experimental results show the effectiveness and efficiency of DOC2SMT and also show the benefits of the generated constraints for symbolic-execution-based test generation (recommended by Peter Müller). The second paper, “An empirical study of Linespots: A novel past-fault algorithm,” by Maximilian Scholz and Richard Torkar, proposes a new fault prediction algorithm called Linespots. The authors focus on fault prediction based on past faults and refine a previous algorithm (Bugspots). Interestingly, they used a different granularity: line as opposed to file, and this necessitated the development of a benchmark set of experimental subjects. In experiments, Linespots was found to outperform Bugspots (recommended by Xiaoyin Wang). The third paper, “Integrating pattern matching and abstract interpretation for verifying cautions of microcontrollers,” by Thuy Nguyen, Takashi Tomita, Junpei Endo, and Toshiaki Aoki, proposes a semi-automatic approach for verifying cautions, which are hardware-dependent properties described in microcontrollers hardware manuals. For this approach, the authors integrate pattern matching and abstract interpretation, two static program analysis techniques. The experimental results show the feasibility and applicability of the approach (recommended by Marcio Delamaro). The fourth paper, “Empirical analysis of practitioners’ perceptions of test flakiness factors,” by Azeem Ahmad, Ola Leifler, and Kristian Sandahl, concerns flaky tests. A flaky test is one where different executions with the same test can lead to different outcomes/verdicts. The authors explore developer perception regarding factors that affect flakiness, concentrating on developers of closed-source software. They also examine two test suites and identify the test smells that lead to flakiness (recommended by Mike Papadakis).","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"21 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83656010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Concurrent programs are difficult to debug because concurrency faults usually occur under specific inputs and thread interleavings. Fault localization techniques for sequential programs are often ineffective because the root causes of concurrency faults involve memory accesses across multiple threads rather than single statements. Previous research has proposed techniques to analyse passing and failing executions obtained from running a set of test cases for identifying faulty memory access patterns. However, stand‐alone access patterns do not provide enough contextual information, such as the path leading to the failure, for developers to understand the bug. We present an approach, Coadec, to automatically generate interthread control flow paths that can link memory access patterns that occurred most frequently in the failing executions to better diagnose concurrency bugs. Coadec consists of two phases. In the first phase, we use feature selection techniques from machine learning to localize suspicious memory access patterns based on failing and passing executions. The patterns with maximum feature diversity information can point to the most suspicious pattern. We then apply a data mining technique and identify the memory access patterns that occurred most frequently in the failing executions. Finally, Coadec identifies faulty program paths by connecting both the frequent patterns and the suspicious pattern. We also evaluate the effectiveness of fault localization using test suites generated from different test adequacy criteria. We introduce and have evaluated Coadec on 10 real‐world multithreaded Java applications. Results indicate that Coadec outperforms state‐of‐the‐art approaches for localizing concurrency faults and that Coadec's context debugging can help developers understand concurrency fault by inspecting a small percentage of code.
{"title":"Effective fault localization and context‐aware debugging for concurrent programs","authors":"J. Chu, Tingting Yu, J. Hayes, Xue Han, Yu Zhao","doi":"10.1002/stvr.1797","DOIUrl":"https://doi.org/10.1002/stvr.1797","url":null,"abstract":"Concurrent programs are difficult to debug because concurrency faults usually occur under specific inputs and thread interleavings. Fault localization techniques for sequential programs are often ineffective because the root causes of concurrency faults involve memory accesses across multiple threads rather than single statements. Previous research has proposed techniques to analyse passing and failing executions obtained from running a set of test cases for identifying faulty memory access patterns. However, stand‐alone access patterns do not provide enough contextual information, such as the path leading to the failure, for developers to understand the bug. We present an approach, Coadec, to automatically generate interthread control flow paths that can link memory access patterns that occurred most frequently in the failing executions to better diagnose concurrency bugs. Coadec consists of two phases. In the first phase, we use feature selection techniques from machine learning to localize suspicious memory access patterns based on failing and passing executions. The patterns with maximum feature diversity information can point to the most suspicious pattern. We then apply a data mining technique and identify the memory access patterns that occurred most frequently in the failing executions. Finally, Coadec identifies faulty program paths by connecting both the frequent patterns and the suspicious pattern. We also evaluate the effectiveness of fault localization using test suites generated from different test adequacy criteria. We introduce and have evaluated Coadec on 10 real‐world multithreaded Java applications. Results indicate that Coadec outperforms state‐of‐the‐art approaches for localizing concurrency faults and that Coadec's context debugging can help developers understand concurrency fault by inspecting a small percentage of code.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"46 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80645927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This issue contains four papers. The first paper provides a survey of work on testing adaptive and context-aware systems, while the second one concerns testing embedded systems. The remaining two papers explore particular problems associated with an area well known to most STVR readers: mutation testing. The first paper, ‘Testing of adaptive and context-aware systems: Approaches and challenges’, by Bento R. Siqueira, Fabiano C. Ferrari, Kathiani E. Souza, Valter V. Camargo and Rogério de Lemos, introduces a systematic literature review and a thematic analysis of studies to characterize the state of the art in testing adaptive systems (ASs) and context-aware systems (CASs) and discuss approaches, challenges, observed trends and research limitations and directions. The authors discover recurring research concerns related to AS and CAS testing (such as generation of test cases and built-in tests), recurring testing challenges (such as context monitoring and runtime decisions), some trends (such as model-based testing and hybrid techniques) and some little investigated issues (such as uncertainty and prediction of changes). (Recommended by T.Y. Chen) The second paper, ‘Remote embedded devices test framework on the cloud’, by Il-Seok (Benjamin) Choi and Chang-Sung Jeong, introduces a remote embedded device test framework on the cloud named RED-TFC, whose reliability test manager component can automatically perform various tests for evaluating reliability and performance of distributed shared devices by utilizing the cloud concept. RED-TFC includes two major techniques: the adaptive sample scale for reliability test (ASRT) and the mass sample reliability test (MSRT). The authors analyse two Android smartphone models that include many embedded components and show that RED-TFC can help detect a high number of reliability problems in smartphones. (Recommended by Tanja Vos) The third paper, ‘Analysing the combination of cost reduction techniques in Android mutation testing’, by Macario Polo-Usaola and Isyed Rodríguez-Trujillo, concerns the use of mutation testing when testing mobile apps. As the authors note, when testing an app, one typically deploys the app and its mutants on mobile devices or executes them on an emulator. Doing so increases the test execution time. Naturally, it can also significantly increase the cost of mutation testing, especially when there are many mutants. The authors investigate several techniques that have been devised for reducing execution time in mutation testing and produce a mathematical model with the aim of predicting the time taken when some combination of these techniques is used. (Recommended by Mike Papadakis) The final paper is ‘An ensemble-based predictive mutation testing approach that considers impact of unreached mutants’ by Alireza Aghamohammadi and Seyed-Hassan Mirian-Hosseinabadi. This paper also concerns both mutation testing and prediction. However, the authors look at a different prediction problem: that of
这期有四篇论文。第一篇论文概述了测试自适应和上下文感知系统的工作,而第二篇论文则涉及测试嵌入式系统。其余两篇论文探讨了与大多数STVR读者所熟知的领域相关的特定问题:突变检测。第一篇论文,“自适应和上下文感知系统的测试:方法和挑战”,由Bento R. Siqueira, Fabiano C. Ferrari, Kathiani E. Souza, Valter V. Camargo和rogacimrio de Lemos撰写,介绍了系统的文献综述和研究的专题分析,以表征测试自适应系统(ASs)和上下文感知系统(CASs)的最新技术,并讨论了方法,挑战,观察到的趋势以及研究的局限性和方向。作者发现了与AS和CAS测试(如测试用例和内置测试的生成)、反复出现的测试挑战(如上下文监控和运行时决策)、一些趋势(如基于模型的测试和混合技术)和一些很少调查的问题(如不确定性和变化预测)相关的反复出现的研究问题。第二篇论文,“云上的远程嵌入式设备测试框架”,作者Il-Seok (Benjamin) Choi和Chang-Sung Jeong介绍了一个名为RED-TFC的云上远程嵌入式设备测试框架,其可靠性测试管理器组件可以利用云概念自动执行各种测试,以评估分布式共享设备的可靠性和性能。RED-TFC包括两大技术:自适应样本信度测试(ASRT)和质量样本信度测试(MSRT)。作者分析了两种包含许多嵌入式组件的安卓智能手机模型,并表明RED-TFC可以帮助检测智能手机中的大量可靠性问题。Macario Polo-Usaola和Isyed Rodríguez-Trujillo撰写的第三篇论文《分析Android突变测试中成本降低技术的组合》关注的是在测试手机应用时使用突变测试。正如作者所指出的那样,在测试应用程序时,通常会在移动设备上部署应用程序及其变体,或者在模拟器上执行它们。这样做会增加测试执行时间。当然,它也会显著增加突变检测的成本,尤其是在存在许多突变的情况下。作者研究了几种用于减少突变测试执行时间的技术,并建立了一个数学模型,以预测使用这些技术的某些组合所花费的时间。最后一篇论文是Alireza Aghamohammadi和Seyed-Hassan Mirian-Hosseinabadi的“基于集合的预测突变测试方法,考虑了未到达突变的影响”。本文还涉及突变检测和预测。然而,作者着眼于一个不同的预测问题:预测一个突变体是否会被杀死。作者注意到以前的工作没有考虑不可到达的突变的影响:那些突变点没有被任何使用的测试用例覆盖的地方。有人认为,由于许多突变工具排除了无法到达的突变体,因此这些突变体也应该从任何经验评估中删除。作者报告了重复先前研究的结果,同时也消除了无法到达的突变,发现预测技术的最终性能远低于报道的结果。作者随后提出了另一种预测模型,当无法到达的突变体被移除时,该模型被证明是有效的。(推荐人:Tanja Vos)
{"title":"Adaptive or embedded software testing and mutation testing","authors":"R. Hierons, Tao Xie","doi":"10.1002/stvr.1798","DOIUrl":"https://doi.org/10.1002/stvr.1798","url":null,"abstract":"This issue contains four papers. The first paper provides a survey of work on testing adaptive and context-aware systems, while the second one concerns testing embedded systems. The remaining two papers explore particular problems associated with an area well known to most STVR readers: mutation testing. The first paper, ‘Testing of adaptive and context-aware systems: Approaches and challenges’, by Bento R. Siqueira, Fabiano C. Ferrari, Kathiani E. Souza, Valter V. Camargo and Rogério de Lemos, introduces a systematic literature review and a thematic analysis of studies to characterize the state of the art in testing adaptive systems (ASs) and context-aware systems (CASs) and discuss approaches, challenges, observed trends and research limitations and directions. The authors discover recurring research concerns related to AS and CAS testing (such as generation of test cases and built-in tests), recurring testing challenges (such as context monitoring and runtime decisions), some trends (such as model-based testing and hybrid techniques) and some little investigated issues (such as uncertainty and prediction of changes). (Recommended by T.Y. Chen) The second paper, ‘Remote embedded devices test framework on the cloud’, by Il-Seok (Benjamin) Choi and Chang-Sung Jeong, introduces a remote embedded device test framework on the cloud named RED-TFC, whose reliability test manager component can automatically perform various tests for evaluating reliability and performance of distributed shared devices by utilizing the cloud concept. RED-TFC includes two major techniques: the adaptive sample scale for reliability test (ASRT) and the mass sample reliability test (MSRT). The authors analyse two Android smartphone models that include many embedded components and show that RED-TFC can help detect a high number of reliability problems in smartphones. (Recommended by Tanja Vos) The third paper, ‘Analysing the combination of cost reduction techniques in Android mutation testing’, by Macario Polo-Usaola and Isyed Rodríguez-Trujillo, concerns the use of mutation testing when testing mobile apps. As the authors note, when testing an app, one typically deploys the app and its mutants on mobile devices or executes them on an emulator. Doing so increases the test execution time. Naturally, it can also significantly increase the cost of mutation testing, especially when there are many mutants. The authors investigate several techniques that have been devised for reducing execution time in mutation testing and produce a mathematical model with the aim of predicting the time taken when some combination of these techniques is used. (Recommended by Mike Papadakis) The final paper is ‘An ensemble-based predictive mutation testing approach that considers impact of unreached mutants’ by Alireza Aghamohammadi and Seyed-Hassan Mirian-Hosseinabadi. This paper also concerns both mutation testing and prediction. However, the authors look at a different prediction problem: that of","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"113 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84905434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Model‐based development is a popular development approach in which software is implemented and verified based on a model of the required system. Finite state machines (FSMs) are widely used as models for systems in several domains. Validating that a model accurately represents the required behaviour involves the generation and execution of a large number of input sequences, which is often an expensive and time‐consuming process. In this paper, we speed up the execution of input sequences for FSM validation, by leveraging the high degree of parallelism of modern graphics processing units (GPUs) for the automatic execution of FSM input sequences in parallel on the GPU threads. We expand our existing work by providing techniques that improve the performance and scalability of this approach. We conduct extensive empirical evaluation using 15 large FSMs from the networking domain and measure GPU speed‐up over a 16‐core CPU, taking into account total GPU time, which includes both data transfer and kernel execution time. We found that GPUs execute FSM input sequences up to 9.28× faster than a 16‐core CPU, with an average speed‐up of 4.53× across all subjects. Our optimizations achieve an average improvement over existing work of 58.95% for speed‐up and scalability to large FSMs with over 2K states and 500K transitions. We also found that techniques aimed at reducing the number of required input sequences for large FSMs with high density were ineffective when applied to all‐transition pair coverage, thus emphasizing the need for approaches like ours that speed up input execution.
{"title":"GPU acceleration of finite state machine input execution: Improving scale and performance","authors":"Vanya Yaneva, A. Rajan, Christophe Dubach","doi":"10.1002/stvr.1796","DOIUrl":"https://doi.org/10.1002/stvr.1796","url":null,"abstract":"Model‐based development is a popular development approach in which software is implemented and verified based on a model of the required system. Finite state machines (FSMs) are widely used as models for systems in several domains. Validating that a model accurately represents the required behaviour involves the generation and execution of a large number of input sequences, which is often an expensive and time‐consuming process. In this paper, we speed up the execution of input sequences for FSM validation, by leveraging the high degree of parallelism of modern graphics processing units (GPUs) for the automatic execution of FSM input sequences in parallel on the GPU threads. We expand our existing work by providing techniques that improve the performance and scalability of this approach. We conduct extensive empirical evaluation using 15 large FSMs from the networking domain and measure GPU speed‐up over a 16‐core CPU, taking into account total GPU time, which includes both data transfer and kernel execution time. We found that GPUs execute FSM input sequences up to 9.28× faster than a 16‐core CPU, with an average speed‐up of 4.53× across all subjects. Our optimizations achieve an average improvement over existing work of 58.95% for speed‐up and scalability to large FSMs with over 2K states and 500K transitions. We also found that techniques aimed at reducing the number of required input sequences for large FSMs with high density were ineffective when applied to all‐transition pair coverage, thus emphasizing the need for approaches like ours that speed up input execution.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"9 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90833737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Metamorphic testing (MT) is widely used for testing programs that face the oracle problem. It uses a set of metamorphic relations (MRs), which are relations among multiple inputs and their corresponding outputs to determine whether the program under test is faulty. Typically, MRs vary in their ability to detect faults in the program under test, and some MRs tend to detect the same set of faults. In this paper, we propose approaches to prioritize MRs to improve the efficiency and effectiveness of MT for regression testing. We present two MR prioritization approaches: (i) fault‐based and (ii) coverage‐based. To evaluate these MR prioritization approaches, we conduct experiments on three complex open‐source software systems. Our results show that the MR prioritization approaches developed by us significantly outperform the current practice of executing the source and follow‐up test cases of the MRs in an ad‐hoc manner in terms of fault detection effectiveness. Further, fault‐based MR prioritization leads to reducing the number of source and follow‐up test cases that needs to be executed as well as reducing the average time taken to detect a fault, which would result in saving time and cost during the testing process.
{"title":"Metamorphic relation prioritization for effective regression testing","authors":"Madhusudan Srinivasan, Upulee Kanewala","doi":"10.1002/stvr.1807","DOIUrl":"https://doi.org/10.1002/stvr.1807","url":null,"abstract":"Metamorphic testing (MT) is widely used for testing programs that face the oracle problem. It uses a set of metamorphic relations (MRs), which are relations among multiple inputs and their corresponding outputs to determine whether the program under test is faulty. Typically, MRs vary in their ability to detect faults in the program under test, and some MRs tend to detect the same set of faults. In this paper, we propose approaches to prioritize MRs to improve the efficiency and effectiveness of MT for regression testing. We present two MR prioritization approaches: (i) fault‐based and (ii) coverage‐based. To evaluate these MR prioritization approaches, we conduct experiments on three complex open‐source software systems. Our results show that the MR prioritization approaches developed by us significantly outperform the current practice of executing the source and follow‐up test cases of the MRs in an ad‐hoc manner in terms of fault detection effectiveness. Further, fault‐based MR prioritization leads to reducing the number of source and follow‐up test cases that needs to be executed as well as reducing the average time taken to detect a fault, which would result in saving time and cost during the testing process.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"42 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73771014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zahra Sadri‐Moshkenani, Justin Bradley, G. Rothermel
A cyber‐physical system (CPS) is a collection of computing devices that communicate with each other, operate in the target environment via actuators and interact with the physical world through sensors in a feedback loop. CPSs need to be safe and reliable and function in accordance with their requirements. Testing, focusing on a CPS model and/or its code, is the primary approach used by engineers to achieve this. Generating, selecting and prioritizing test cases that can reveal faults in CPSs, from the wide range of possible input values and stimuli that affect their operation, are of central importance in this process. To date, however, in our search of the literature, we have found no comprehensive survey of research on test case generation, selection and prioritization for CPSs. In this article, therefore, we report the results of a survey of approaches for generating, selecting and prioritizing test cases for CPSs; the results illustrate the progress that has been made on these approaches to date, the properties that characterize the approaches and the challenges that remain open in these areas of research.
{"title":"Survey on test case generation, selection and prioritization for cyber‐physical systems","authors":"Zahra Sadri‐Moshkenani, Justin Bradley, G. Rothermel","doi":"10.1002/stvr.1794","DOIUrl":"https://doi.org/10.1002/stvr.1794","url":null,"abstract":"A cyber‐physical system (CPS) is a collection of computing devices that communicate with each other, operate in the target environment via actuators and interact with the physical world through sensors in a feedback loop. CPSs need to be safe and reliable and function in accordance with their requirements. Testing, focusing on a CPS model and/or its code, is the primary approach used by engineers to achieve this. Generating, selecting and prioritizing test cases that can reveal faults in CPSs, from the wide range of possible input values and stimuli that affect their operation, are of central importance in this process. To date, however, in our search of the literature, we have found no comprehensive survey of research on test case generation, selection and prioritization for CPSs. In this article, therefore, we report the results of a survey of approaches for generating, selecting and prioritizing test cases for CPSs; the results illustrate the progress that has been made on these approaches to date, the properties that characterize the approaches and the challenges that remain open in these areas of research.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"3 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84500814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}