Handling hardware‐dependent properties at a low level is usually required in developing microcontroller‐based applications. One of these hardware‐dependent properties is cautions, which are described in microcontrollers hardware manuals. The process of verifying these cautions is performed manually, as there is currently no single tool that can directly handle this task. This research aims at automating the verification of these cautions. To obtain the typical cautions of microcontrollers, we investigate two sections which have a considerable number of required cautions in the hardware manual of a popular microcontroller. Subsequently, we analyse these cautions and categorize them into several groups. Based on this analysis, we propose a semi‐automatic approach for verifying the cautions which integrates two static programme analysis techniques (i.e., pattern matching and abstract interpretation). To evaluate our approach, we conducted experiments with generated source code, benchmark source code, and industrial source code. The generated source code, which was created automatically based on several aspects of the C programme, was used to evaluate the performance of the approach based on these aspects. The benchmark and the industrial source code, which were provided by Aisin Software Co., Ltd., were used to assess the feasibility and applicability of the approach. The results show that all expected violations in the benchmark source code were detected. Unexpected but real violations in the benchmark programme were also detected. For the industrial source code, the approach successfully handled and detected most of the expected violations. These results show that the approach is promising in verifying the cautions.
{"title":"Integrating pattern matching and abstract interpretation for verifying cautions of microcontrollers","authors":"Thuy Nguyen, Takashi Tomita, Junpei Endo, Toshiaki Aoki","doi":"10.1002/stvr.1788","DOIUrl":"https://doi.org/10.1002/stvr.1788","url":null,"abstract":"Handling hardware‐dependent properties at a low level is usually required in developing microcontroller‐based applications. One of these hardware‐dependent properties is cautions, which are described in microcontrollers hardware manuals. The process of verifying these cautions is performed manually, as there is currently no single tool that can directly handle this task. This research aims at automating the verification of these cautions. To obtain the typical cautions of microcontrollers, we investigate two sections which have a considerable number of required cautions in the hardware manual of a popular microcontroller. Subsequently, we analyse these cautions and categorize them into several groups. Based on this analysis, we propose a semi‐automatic approach for verifying the cautions which integrates two static programme analysis techniques (i.e., pattern matching and abstract interpretation). To evaluate our approach, we conducted experiments with generated source code, benchmark source code, and industrial source code. The generated source code, which was created automatically based on several aspects of the C programme, was used to evaluate the performance of the approach based on these aspects. The benchmark and the industrial source code, which were provided by Aisin Software Co., Ltd., were used to assess the feasibility and applicability of the approach. The results show that all expected violations in the benchmark source code were detected. Unexpected but real violations in the benchmark programme were also detected. For the industrial source code, the approach successfully handled and detected most of the expected violations. These results show that the approach is promising in verifying the cautions.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"44 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85489349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tests can be derived from extended finite state machine (EFSM) specifications considering the coverage of single‐transfer faults, all transitions using a transition tour, all‐uses, edge‐pair, and prime path with side trip. We provide novel empirical assessments of the effectiveness of these test suites. The first assessment determines for each pair of test suites if there is a difference between the pair in covering EFSM faults of six EFSM specifications. If the difference is found significant, we determine which test suite outperforms the other. The second assessment is similar to the first; yet, it is carried out against code faults of 12 Java implementations of the specifications. Besides, two assessments are provided to determine whether test suites have better coverage of certain classes of EFSM (or code) faults than others. The evaluation uses proper data transformation of mutation scores and p‐value adjustments for controlling Type I error due to multiple tests. Furthermore, we show that subsuming mutants have an impact on mutation scores of both EFSM and code faults; and accordingly, we use a score that removes them in order not to invalidate the obtained results. The assessments show that all‐uses tests were outperformed by all other tests; transition tours outperformed both edge‐pair and prime path with side trips; and single‐transfer fault tests outperformed all other test suites. Similar results are obtained over the considered EFSM and code fault domains, and there were no significant differences between the test suites coverage of different classes of EFSM and code faults.
{"title":"Assessing test suites of extended finite state machines against model‐ and code‐based faults","authors":"K. El-Fakih, A. Alzaatreh, Uraz Cengiz Türker","doi":"10.1002/stvr.1789","DOIUrl":"https://doi.org/10.1002/stvr.1789","url":null,"abstract":"Tests can be derived from extended finite state machine (EFSM) specifications considering the coverage of single‐transfer faults, all transitions using a transition tour, all‐uses, edge‐pair, and prime path with side trip. We provide novel empirical assessments of the effectiveness of these test suites. The first assessment determines for each pair of test suites if there is a difference between the pair in covering EFSM faults of six EFSM specifications. If the difference is found significant, we determine which test suite outperforms the other. The second assessment is similar to the first; yet, it is carried out against code faults of 12 Java implementations of the specifications. Besides, two assessments are provided to determine whether test suites have better coverage of certain classes of EFSM (or code) faults than others. The evaluation uses proper data transformation of mutation scores and p‐value adjustments for controlling Type I error due to multiple tests. Furthermore, we show that subsuming mutants have an impact on mutation scores of both EFSM and code faults; and accordingly, we use a score that removes them in order not to invalidate the obtained results. The assessments show that all‐uses tests were outperformed by all other tests; transition tours outperformed both edge‐pair and prime path with side trips; and single‐transfer fault tests outperformed all other test suites. Similar results are obtained over the considered EFSM and code fault domains, and there were no significant differences between the test suites coverage of different classes of EFSM and code faults.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"22 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77635654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This issue includes three papers, covering software verification, software reliability modelling and performance assessment, respectively. The first paper, ‘Verification algebra for multi-tenant applications in VaaS architecture’, by Kai Hu, Ji Wan, Kan Luo, Yuzhuang Xu, Zijing Cheng and Wei-Tek Tsai, concerns verification in multi-tenant architectures. Multi-tenant architectures support composition of services and so the rapid development of applications. The issue addressed is the potentially massive number of possible applications formed by composing a given set of services. The authors propose a verification algebra that can determine the results of verification of new combinations of property/application on the basis of different combinations of services already verified and/or the verification of different, but related, properties. The overall approach was evaluated through simulations. (Recommended by Professor Paul Strooper) The second paper, ‘Entropy based enhanced particle swarm optimization on multi-objective software reliability modelling for optimal testing resources allocation’, by Pooja Rani and G. S. Mahapatra, concerns the optimum resource allocation problem to obtain the maximum reliability and minimum total cost under the testing effort constraint. The authors formulate a multi-objective software reliability model of testing resources for a new generalized exponential reliability function to characterize dynamic allocation of total expected cost and testing effort. The authors further propose an enhanced particle swarm optimization (EPSO) to maximize software reliability and minimize allocation cost. The authors conduct experiments to demonstrate the potential of the proposed approach to predict software reliability with greater accuracy. (Recommended by Professor Moonzoo Kim) The third paper, ‘Performance assessment based on stochastic differential equation and effort data for edge computing’, by Yoshinobu Tamura and Shigeru Yamada, concerns performance assessment based on the relationship between the cloud and edge services operated by using open-source software. The authors propose a two-dimensional stochastic differential equation model that considers the unique features with uncertainty from big data under the operation of cloud and edge services. The authors analyse actual data to show numerical examples of performance assessments considering the network connectivity as characteristics of cloud and edge services and compare the noise terms of the proposed model for actual data. (Recommended by Professor Min Xie)
本课题包括三篇论文,分别包括软件验证、软件可靠性建模和性能评估。第一篇论文,“VaaS体系结构中多租户应用的验证代数”,由胡凯、万吉、罗阚、徐宇庄、程子静和蔡伟德撰写,涉及多租户体系结构中的验证。多租户体系结构支持服务组合,因此支持应用程序的快速开发。解决的问题是,通过组合一组给定的服务,可能会形成大量潜在的应用程序。作者提出了一种验证代数,可以根据已验证的服务的不同组合和/或不同但相关的属性的验证,确定对新属性/应用组合的验证结果。通过仿真对整个方案进行了评价。Pooja Rani和G. S. Mahapatra的第二篇论文“基于熵的增强粒子群优化多目标软件可靠性建模的最优测试资源分配”,研究了在测试工作量约束下获得最大可靠性和最小总成本的最优资源分配问题。本文提出了一种新的广义指数可靠性函数,建立了测试资源的多目标软件可靠性模型,以表征总期望成本和测试工作量的动态分配。作者进一步提出了一种增强粒子群优化算法(EPSO),以实现软件可靠性最大化和分配成本最小化。作者进行了实验,以证明所提出的方法以更高的精度预测软件可靠性的潜力。第三篇论文,“基于随机微分方程和边缘计算努力数据的性能评估”,由田村吉诺和山田茂撰写,涉及基于使用开源软件运行的云和边缘服务之间关系的性能评估。作者提出了一个二维随机微分方程模型,该模型考虑了云和边缘服务运行下大数据具有不确定性的独特特征。作者分析了实际数据,以显示考虑网络连接作为云和边缘服务特征的性能评估的数值示例,并比较了实际数据中所提出模型的噪声项。(谢敏教授推荐)
{"title":"Editorial: Verification, reliability and performance","authors":"R. Hierons, Tao Xie","doi":"10.1002/stvr.1790","DOIUrl":"https://doi.org/10.1002/stvr.1790","url":null,"abstract":"This issue includes three papers, covering software verification, software reliability modelling and performance assessment, respectively. The first paper, ‘Verification algebra for multi-tenant applications in VaaS architecture’, by Kai Hu, Ji Wan, Kan Luo, Yuzhuang Xu, Zijing Cheng and Wei-Tek Tsai, concerns verification in multi-tenant architectures. Multi-tenant architectures support composition of services and so the rapid development of applications. The issue addressed is the potentially massive number of possible applications formed by composing a given set of services. The authors propose a verification algebra that can determine the results of verification of new combinations of property/application on the basis of different combinations of services already verified and/or the verification of different, but related, properties. The overall approach was evaluated through simulations. (Recommended by Professor Paul Strooper) The second paper, ‘Entropy based enhanced particle swarm optimization on multi-objective software reliability modelling for optimal testing resources allocation’, by Pooja Rani and G. S. Mahapatra, concerns the optimum resource allocation problem to obtain the maximum reliability and minimum total cost under the testing effort constraint. The authors formulate a multi-objective software reliability model of testing resources for a new generalized exponential reliability function to characterize dynamic allocation of total expected cost and testing effort. The authors further propose an enhanced particle swarm optimization (EPSO) to maximize software reliability and minimize allocation cost. The authors conduct experiments to demonstrate the potential of the proposed approach to predict software reliability with greater accuracy. (Recommended by Professor Moonzoo Kim) The third paper, ‘Performance assessment based on stochastic differential equation and effort data for edge computing’, by Yoshinobu Tamura and Shigeru Yamada, concerns performance assessment based on the relationship between the cloud and edge services operated by using open-source software. The authors propose a two-dimensional stochastic differential equation model that considers the unique features with uncertainty from big data under the operation of cloud and edge services. The authors analyse actual data to show numerical examples of performance assessments considering the network connectivity as characteristics of cloud and edge services and compare the noise terms of the proposed model for actual data. (Recommended by Professor Min Xie)","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"7 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87148288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Neto, A. Moreira, Genoveva Vargas-Solar, M. A. Musicante
This paper proposes TRANSMUT‐Spark for automating mutation testing of big data processing code within Spark programs. Apache Spark is an engine for big data analytics/processing that hides the inherent complexity of parallel big data programming. Nonetheless, programmers must cleverly combine Spark built‐in functions within programs and guide the engine to use the right data management strategies to exploit the computational resources required by big data processing and avoid substantial production losses. Many programming details in Spark data processing code are prone to false statements that must be correctly and automatically tested. This paper explores the application of mutation testing in Spark programs, a fault‐based testing technique that relies on fault simulation to evaluate and design test sets. The paper introduces TRANSMUT‐Spark for testing Spark programs by automating the most laborious steps of the process and fully executing the mutation testing process. The paper describes how the TRANSMUT‐Spark automates the mutant generation, test execution and adequacy analysis phases of mutation testing. It also discusses the results of experiments to validate the tool and argues its scope and limitations.
{"title":"TRANSMUT‐Spark: Transformation mutation for Apache Spark","authors":"J. Neto, A. Moreira, Genoveva Vargas-Solar, M. A. Musicante","doi":"10.1002/stvr.1809","DOIUrl":"https://doi.org/10.1002/stvr.1809","url":null,"abstract":"This paper proposes TRANSMUT‐Spark for automating mutation testing of big data processing code within Spark programs. Apache Spark is an engine for big data analytics/processing that hides the inherent complexity of parallel big data programming. Nonetheless, programmers must cleverly combine Spark built‐in functions within programs and guide the engine to use the right data management strategies to exploit the computational resources required by big data processing and avoid substantial production losses. Many programming details in Spark data processing code are prone to false statements that must be correctly and automatically tested. This paper explores the application of mutation testing in Spark programs, a fault‐based testing technique that relies on fault simulation to evaluate and design test sets. The paper introduces TRANSMUT‐Spark for testing Spark programs by automating the most laborious steps of the process and fully executing the mutation testing process. The paper describes how the TRANSMUT‐Spark automates the mutant generation, test execution and adequacy analysis phases of mutation testing. It also discusses the results of experiments to validate the tool and argues its scope and limitations.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"93 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87502544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This issue includes four papers, covering performance mutation testing, performance regression localization, fault detection and localization, and defect prediction, respectively. The first paper, by Pedro Delgado-Pérez, Ana Belén Sánchez, Sergio Segura and Inmaculada Medina-Bulo, concerns feasibility of applying performance mutation testing (i.e. applying mutation testing to assess performance tests) at the source-code level in general-purpose languages. To successfully apply performance mutation testing, the authors find it necessary to design specific mutation operators and mechanisms to evaluate the outputs. The authors define and evaluate seven new performance mutation operators to model known bug-inducing patterns. The authors report the results of experimental evaluation on open-source C++ programs. (Recommended by Professor Hyunsook Do) The second paper, by Frolin S. Ocariza Jr. and Boyang Zhao, considers the problem of finding the causes of performance regression in software. Here, a performance regression is an increase in response time as a result of changes to the software. The paper describes a design, called ZAM, that automates the process of comparing execution timelines collected from web applications. Such timelines are used as the basis for finding the causes of performance regression. A number of challenges are introduced by the context in which, for example, timing information is typically noisy. The authors report the results of experimental evaluation and also experience in using the approach. (Recommended by Professor T. H. Tse) The third paper, by Rawad Abou Assi, Wes Masri and Chadi Trad, concerns coincidental correctness and its impact on fault detection and localization. The authors consider weak coincidental correctness, in which a faulty statement is executed but this does not lead to an infected state. They also consider strong coincidental correctness, in which the execution of a faulty statement leads to an infected state but does not lead to incorrect output. The authors empirically investigated the effect of coincidental correctness on three classes of technique: spectrum-based fault localization (SBFL), test suite reduction (TSR) and test case prioritization (TCP). Interestingly, there was significant variation with, for example, evidence that coincidental correctness has a greater impact on TSR and TCP than on SBFL. (Recommended by Professor Hyunsook Do) The fourth paper, by Zeinab Eivazpour and Mohammad Reza Keyvanpour, concerns the cost issue when handling the class imbalance problem over the training dataset in software defect prediction. The authors propose the cost-sensitive stacked generalization (CSSG) approach. This approach combines the staking ensemble learning method with cost-sensitive learning, which aims to reduce misclassification costs. In the CSSG approach, the logistic regression classifier and extra randomized trees ensemble method in cost-sensitive learning and cost-insensitive conditions ar
{"title":"Editorial: Testing, Debugging, and Defect Prediction","authors":"R. Hierons, Tao Xie","doi":"10.1002/stvr.1775","DOIUrl":"https://doi.org/10.1002/stvr.1775","url":null,"abstract":"This issue includes four papers, covering performance mutation testing, performance regression localization, fault detection and localization, and defect prediction, respectively. The first paper, by Pedro Delgado-Pérez, Ana Belén Sánchez, Sergio Segura and Inmaculada Medina-Bulo, concerns feasibility of applying performance mutation testing (i.e. applying mutation testing to assess performance tests) at the source-code level in general-purpose languages. To successfully apply performance mutation testing, the authors find it necessary to design specific mutation operators and mechanisms to evaluate the outputs. The authors define and evaluate seven new performance mutation operators to model known bug-inducing patterns. The authors report the results of experimental evaluation on open-source C++ programs. (Recommended by Professor Hyunsook Do) The second paper, by Frolin S. Ocariza Jr. and Boyang Zhao, considers the problem of finding the causes of performance regression in software. Here, a performance regression is an increase in response time as a result of changes to the software. The paper describes a design, called ZAM, that automates the process of comparing execution timelines collected from web applications. Such timelines are used as the basis for finding the causes of performance regression. A number of challenges are introduced by the context in which, for example, timing information is typically noisy. The authors report the results of experimental evaluation and also experience in using the approach. (Recommended by Professor T. H. Tse) The third paper, by Rawad Abou Assi, Wes Masri and Chadi Trad, concerns coincidental correctness and its impact on fault detection and localization. The authors consider weak coincidental correctness, in which a faulty statement is executed but this does not lead to an infected state. They also consider strong coincidental correctness, in which the execution of a faulty statement leads to an infected state but does not lead to incorrect output. The authors empirically investigated the effect of coincidental correctness on three classes of technique: spectrum-based fault localization (SBFL), test suite reduction (TSR) and test case prioritization (TCP). Interestingly, there was significant variation with, for example, evidence that coincidental correctness has a greater impact on TSR and TCP than on SBFL. (Recommended by Professor Hyunsook Do) The fourth paper, by Zeinab Eivazpour and Mohammad Reza Keyvanpour, concerns the cost issue when handling the class imbalance problem over the training dataset in software defect prediction. The authors propose the cost-sensitive stacked generalization (CSSG) approach. This approach combines the staking ensemble learning method with cost-sensitive learning, which aims to reduce misclassification costs. In the CSSG approach, the logistic regression classifier and extra randomized trees ensemble method in cost-sensitive learning and cost-insensitive conditions ar","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"6 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90430547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the last three decades, memory safety issues in system programming languages such as C or C++ have been one of the most significant sources of security vulnerabilities. However, there exist only a few attempts with limited success to cope with the complexity of C++ program verification. We describe and evaluate a novel verification approach based on bounded model checking (BMC) and satisfiability modulo theories (SMT) to verify C++ programs. Our verification approach analyses bounded C++ programs by encoding into SMT various sophisticated features that the C++ programming language offers, such as templates, inheritance, polymorphism, exception handling, and the Standard Template Libraries. We formalize these features within our formal verification framework using a decidable fragment of first‐order logic and then show how state‐of‐the‐art SMT solvers can efficiently handle that. We implemented our verification approach on top of ESBMC. We compare ESBMC to LLBMC and DIVINE, which are state‐of‐the‐art verifiers to check C++ programs directly from the LLVM bitcode. Experimental results show that ESBMC can handle a wide range of C++ programs, presenting a higher number of correct verification results. Additionally, ESBMC has been applied to a commercial C++ application in the telecommunication domain and successfully detected arithmetic‐overflow errors, which could potentially lead to security vulnerabilities.
{"title":"Model checking C++ programs","authors":"Felipe R. Monteiro, M. R. Gadelha, L. Cordeiro","doi":"10.1002/stvr.1793","DOIUrl":"https://doi.org/10.1002/stvr.1793","url":null,"abstract":"In the last three decades, memory safety issues in system programming languages such as C or C++ have been one of the most significant sources of security vulnerabilities. However, there exist only a few attempts with limited success to cope with the complexity of C++ program verification. We describe and evaluate a novel verification approach based on bounded model checking (BMC) and satisfiability modulo theories (SMT) to verify C++ programs. Our verification approach analyses bounded C++ programs by encoding into SMT various sophisticated features that the C++ programming language offers, such as templates, inheritance, polymorphism, exception handling, and the Standard Template Libraries. We formalize these features within our formal verification framework using a decidable fragment of first‐order logic and then show how state‐of‐the‐art SMT solvers can efficiently handle that. We implemented our verification approach on top of ESBMC. We compare ESBMC to LLBMC and DIVINE, which are state‐of‐the‐art verifiers to check C++ programs directly from the LLVM bitcode. Experimental results show that ESBMC can handle a wide range of C++ programs, presenting a higher number of correct verification results. Additionally, ESBMC has been applied to a commercial C++ application in the telecommunication domain and successfully detected arithmetic‐overflow errors, which could potentially lead to security vulnerabilities.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"77 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85732473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For partial, nondeterministic, finite state machines, a new conformance relation called strong reduction is presented. It complements other existing conformance relations in the sense that the new relation is well suited for model‐based testing of systems whose inputs are enabled or disabled, depending on the actual system state. Examples of such systems are graphical user interfaces and systems with interfaces that can be enabled or disabled in a mechanical way. We present a new test generation algorithm producing complete test suites for strong reduction. The suites are executed according to the grey‐box testing paradigm: it is assumed that the state‐dependent sets of enabled inputs can be identified during test execution, while the implementation states remain hidden, as in black‐box testing. We show that this grey‐box information is exploited by the generation algorithm in such a way that the resulting best‐case test suite size is only linear in the state space size of the reference model. Moreover, examples show that this may lead to significant reductions of test suite size in comparison to true black‐box testing for strong reduction.
{"title":"Effective grey‐box testing with partial FSM models","authors":"Robert Sachtleben, J. Peleska","doi":"10.1002/stvr.1806","DOIUrl":"https://doi.org/10.1002/stvr.1806","url":null,"abstract":"For partial, nondeterministic, finite state machines, a new conformance relation called strong reduction is presented. It complements other existing conformance relations in the sense that the new relation is well suited for model‐based testing of systems whose inputs are enabled or disabled, depending on the actual system state. Examples of such systems are graphical user interfaces and systems with interfaces that can be enabled or disabled in a mechanical way. We present a new test generation algorithm producing complete test suites for strong reduction. The suites are executed according to the grey‐box testing paradigm: it is assumed that the state‐dependent sets of enabled inputs can be identified during test execution, while the implementation states remain hidden, as in black‐box testing. We show that this grey‐box information is exploited by the generation algorithm in such a way that the resulting best‐case test suite size is only linear in the state space size of the reference model. Moreover, examples show that this may lead to significant reductions of test suite size in comparison to true black‐box testing for strong reduction.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"15 13 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81637401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xavier Devroey, Alessio Gambi, Juan P. Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, Sebastiano Panichella
Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present our JUnit Generation Benchmarking Infrastructure (JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place where JUGE was used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact of JUGE in improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, the JUGE infrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.
研究者和实践者已经设计并实现了各种自动化的测试用例生成器来支持有效的软件测试。这些生成器适用于各种语言(例如Java、c#或Python)和各种平台(例如桌面、web或移动应用程序)。生成器表现出不同的有效性和效率,这取决于它们旨在满足的测试目标(例如,库的单元测试与整个应用程序的系统测试)和它们实现的底层技术。在这种情况下,从业者需要能够比较不同的生成器,以确定最适合他们的需求,而研究人员则寻求确定未来的研究方向。这可以通过系统地执行不同生成器的大规模评估来实现。然而,执行这样的经验评估并不是微不足道的,并且需要大量的努力来选择适当的基准,设置评估基础结构,并收集和分析结果。在本软件说明中,我们介绍了我们的JUnit生成基准基础设施(JUGE),它支持生成器(基于搜索的、基于随机的、符号执行的等),旨在为各种目的(验证、回归测试、故障定位等)自动生成单元测试。主要目标是减少总体基准测试工作,简化几个生成器的比较,并通过标准化评估和比较过程加强学术界和工业界之间的知识转移。自2013年以来,已经举办了几个版本的单元测试工具竞赛,这些竞赛与基于搜索的软件测试研讨会(Search - Based Software testing Workshop)同地举行,在那里使用和发展了JUGE。因此,越来越多来自学术界和工业界的工具(超过10种)在JUGE上进行了评估,这些工具经过多年的发展已经成熟,并且可以确定未来的研究方向。基于从竞赛中获得的经验,我们讨论了JUGE在改善学术界和工业界之间测试生成工具和方法的知识转移方面的预期影响。实际上,JUGE基础架构展示了一种实现设计,它足够灵活,可以集成额外的单元测试生成工具,这对开发人员来说是实用的,并且允许研究人员试验新的和先进的单元测试工具和方法。
{"title":"JUGE: An infrastructure for benchmarking Java unit test generators","authors":"Xavier Devroey, Alessio Gambi, Juan P. Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, Sebastiano Panichella","doi":"10.1002/stvr.1838","DOIUrl":"https://doi.org/10.1002/stvr.1838","url":null,"abstract":"Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present our JUnit Generation Benchmarking Infrastructure (JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place where JUGE was used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact of JUGE in improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, the JUGE infrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"120 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75786178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Predictive mutation testing (PMT) is a technique to predict whether a mutant is killed, using machine learning approaches. Researchers have proposed various methods for PMT over the years. However, the impact of unreached mutants on PMT is not fully addressed. A mutant is unreached if the statement on which the mutant is generated is not executed by any test cases. We aim at showing that unreached mutants can inflate PMT results. Moreover, we propose an alternative approach to PMT, suggesting a different interpretation for PMT. To this end, we replicated the previous PMT research. We empirically evaluated the suggested approach on 654 Java projects provided by prior literature. Our results indicate that the performance of PMT drastically decreases in terms of area under a receiver operating characteristic curve (AUC) from 0.833 to 0.517. Furthermore, PMT performs worse than random guesses on 27% of the projects. The proposed approach improves the PMT results, achieving the average AUC value of 0.613. As a result, we recommend researchers to remove unreached mutants when reporting the results.
{"title":"An ensemble‐based predictive mutation testing approach that considers impact of unreached mutants","authors":"Alireza Aghamohammadi, S. Mirian-Hosseinabadi","doi":"10.1002/stvr.1784","DOIUrl":"https://doi.org/10.1002/stvr.1784","url":null,"abstract":"Predictive mutation testing (PMT) is a technique to predict whether a mutant is killed, using machine learning approaches. Researchers have proposed various methods for PMT over the years. However, the impact of unreached mutants on PMT is not fully addressed. A mutant is unreached if the statement on which the mutant is generated is not executed by any test cases. We aim at showing that unreached mutants can inflate PMT results. Moreover, we propose an alternative approach to PMT, suggesting a different interpretation for PMT. To this end, we replicated the previous PMT research. We empirically evaluated the suggested approach on 654 Java projects provided by prior literature. Our results indicate that the performance of PMT drastically decreases in terms of area under a receiver operating characteristic curve (AUC) from 0.833 to 0.517. Furthermore, PMT performs worse than random guesses on 27% of the projects. The proposed approach improves the PMT results, achieving the average AUC value of 0.613. As a result, we recommend researchers to remove unreached mutants when reporting the results.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"75 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86146196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The IEEE 12th International Conference on Software Testing, Verification & Validation (ICST 2019) was held in Xi’an, China. The aim of the ICST conference is to bring together researchers and practitioners who study the theory, techniques, technologies, and applications that concern all aspects of software testing, verification, and validation of software systems. The program committee rigorously reviewed 110 full papers using a double-blind reviewing policy. Each paper received at least three regular reviews and went through a discussion phase where the reviewers made final decisions on each paper, each discussion being led by a meta-reviewer. Out of this process, the committee selected 31 full-length papers that appeared in the conference. These were presented over nine sessions ranging from classical topics such as test generation and test coverage to emerging topics such as machine learning and security during the main conference track. Based on the original reviewers’ feedback, we selected five papers for consideration for this special issue of STVR. These papers were extended from their conference version by the authors and were reviewed according to the standard STVR reviewing process. We thank all the ICST and STVR reviewers for their hardwork. Three papers successfully completed the reviewprocess and are contained in this special issue. The rest of this editorial provides a brief overview of these three papers. The first paper, Automated Visual Classification of DOM-based Presentation Failure Reports for Responsive Web Pages, by Ibrahim Althomali, Gregory Kapfhammer, and Phil McMinn, introduces VERVE, a tool that automatically classifies all hard to detect response layout failures (RLFs) in web applications. An empirical study reveals that VERVE’s classification of all five types of RLFs frequently agrees with classifications produced manually by humans. The second paper, BugsJS: A Benchmark and Taxonomy of JavaScript Bugs, by Péter Gyimesi, Béla Vancsics, Andrea Stocco, Davood Mazinanian, Árpád Beszédes, Rudolf Ferenc, and Ali Mesbah, presents, BugsJS, a benchmark of 453 real, manually validated JavaScript bugs from 10 popular JavaScript server-side programs, comprising 444 k LOC in total. Each bug is accompanied by its bug report, the test cases that expose it, as well as the patch that fixes it. BugJS can help facilitate reproducible empirical studies and comparisons of JavaScript analysis and testing tools. The third paper, Statically Driven Generation of Concurrent Tests for Thread-Safe Classes, by Valerio Terragni andMauro Pezzè presentsDEPCON, a novel approach that reduces the search space ofconcurrent tests by leveraging statically computeddependencies amongpublicmethods.DEPCON exploits the intuition that concurrent tests can expose thread-safety violations thatmanifest exceptions or deadlocks, only if they exercise some specific method dependencies. The results show that DEPCON is more effective than state-of-the-art approaches
第十二届IEEE软件测试、验证与验证国际会议(ICST 2019)在中国西安举行。ICST会议的目的是将研究理论、技术、技术和应用的研究人员和实践者聚集在一起,这些理论、技术、技术和应用涉及软件测试、验证和软件系统验证的各个方面。项目委员会采用双盲审查政策严格审查了110篇完整论文。每篇论文至少接受三次定期评审,并经过一个讨论阶段,审稿人对每篇论文做出最终决定,每次讨论都由元审稿人领导。在这个过程中,委员会选出了31篇发表在会议上的论文。这些内容在9个会议上进行了介绍,从经典主题(如测试生成和测试覆盖)到新兴主题(如机器学习和安全)。根据原审稿人的反馈,我们选择了5篇论文作为本期《STVR》特刊的考虑对象。这些论文是作者从会议版本中扩展出来的,并按照标准的STVR审查程序进行了审查。我们感谢所有ICST和STVR审稿人的辛勤工作。三篇论文成功地完成了审查过程,并包含在本期特刊中。这篇社论的其余部分提供了这三篇论文的简要概述。第一篇论文是Ibrahim Althomali、Gregory Kapfhammer和Phil McMinn撰写的《基于dom的响应性网页呈现失败报告的自动视觉分类》,介绍了VERVE,一个自动分类Web应用程序中所有难以检测的响应布局失败(rlf)的工具。一项实证研究表明,VERVE对所有五种类型的rlf的分类与人类手动生成的分类经常一致。第二篇论文,《BugsJS: JavaScript bug的基准和分类》,由passater Gyimesi, bassala vancics, Andrea Stocco, Davood Mazinanian, Árpád besz, Rudolf Ferenc和Ali Mesbah撰写,介绍了BugsJS,一个来自10个流行的JavaScript服务器端程序的453个真实的,手动验证的JavaScript bug的基准,总共包含444 k LOC。每个错误都伴随着它的错误报告,暴露它的测试用例,以及修复它的补丁。BugJS可以帮助促进JavaScript分析和测试工具的可重复的经验研究和比较。Valerio Terragni和mauro的第三篇论文《线程安全类的静态驱动并发测试生成》(Pezzè)提出了depcon,这是一种通过利用公共方法之间的静态计算依赖关系来减少并发测试搜索空间的新方法。DEPCON利用了这样一种直觉:只有当并发测试执行一些特定的方法依赖时,它们才会暴露线程安全违反,从而显示异常或死锁。结果表明,在暴露并发性错误方面,DEPCON比最先进的方法更有效。
{"title":"The IEEE 12th International Conference on Software Testing, Verification & Validation","authors":"A. Memon, Myra B. Cohen","doi":"10.1002/stvr.1773","DOIUrl":"https://doi.org/10.1002/stvr.1773","url":null,"abstract":"The IEEE 12th International Conference on Software Testing, Verification & Validation (ICST 2019) was held in Xi’an, China. The aim of the ICST conference is to bring together researchers and practitioners who study the theory, techniques, technologies, and applications that concern all aspects of software testing, verification, and validation of software systems. The program committee rigorously reviewed 110 full papers using a double-blind reviewing policy. Each paper received at least three regular reviews and went through a discussion phase where the reviewers made final decisions on each paper, each discussion being led by a meta-reviewer. Out of this process, the committee selected 31 full-length papers that appeared in the conference. These were presented over nine sessions ranging from classical topics such as test generation and test coverage to emerging topics such as machine learning and security during the main conference track. Based on the original reviewers’ feedback, we selected five papers for consideration for this special issue of STVR. These papers were extended from their conference version by the authors and were reviewed according to the standard STVR reviewing process. We thank all the ICST and STVR reviewers for their hardwork. Three papers successfully completed the reviewprocess and are contained in this special issue. The rest of this editorial provides a brief overview of these three papers. The first paper, Automated Visual Classification of DOM-based Presentation Failure Reports for Responsive Web Pages, by Ibrahim Althomali, Gregory Kapfhammer, and Phil McMinn, introduces VERVE, a tool that automatically classifies all hard to detect response layout failures (RLFs) in web applications. An empirical study reveals that VERVE’s classification of all five types of RLFs frequently agrees with classifications produced manually by humans. The second paper, BugsJS: A Benchmark and Taxonomy of JavaScript Bugs, by Péter Gyimesi, Béla Vancsics, Andrea Stocco, Davood Mazinanian, Árpád Beszédes, Rudolf Ferenc, and Ali Mesbah, presents, BugsJS, a benchmark of 453 real, manually validated JavaScript bugs from 10 popular JavaScript server-side programs, comprising 444 k LOC in total. Each bug is accompanied by its bug report, the test cases that expose it, as well as the patch that fixes it. BugJS can help facilitate reproducible empirical studies and comparisons of JavaScript analysis and testing tools. The third paper, Statically Driven Generation of Concurrent Tests for Thread-Safe Classes, by Valerio Terragni andMauro Pezzè presentsDEPCON, a novel approach that reduces the search space ofconcurrent tests by leveraging statically computeddependencies amongpublicmethods.DEPCON exploits the intuition that concurrent tests can expose thread-safety violations thatmanifest exceptions or deadlocks, only if they exercise some specific method dependencies. The results show that DEPCON is more effective than state-of-the-art approaches ","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"28 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81871843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}