In this issue, we are pleased to present three papers on model-based testing, test case prioritization and testing of virtual reality applications. The first paper, ‘On transforming model-based tests into code: A systematic literature review’ by Fabiano C. Ferrari, Vinicius H. S. Durelli, Sten F. Andler, Jeff Offutt, Mehrdad Saadatmand and Nils Müllner, presents a systematic literature review based on 30 selected primary studies for computing source code coverage from test sets generated via model-based testing (MBT) approaches. The authors identify some common characteristics and limitations that may impact on MBT research and practice. The authors also discuss implications for future research related to these limitations. The authors find increasing adoption of MBT in industry, increasing application of model-to-code transformations and a complementary increasing need to understand how test cases designed for models achieve coverage on the code. (Recommended by Dan Hao). The second paper, ‘Research on hyper-level of hyper-heuristic framework for MOTCP’ by Junxia Guo, Rui Wang, Jinjin Han and Zheng Li, presents three evaluation strategies for the hyper-level of the hyper-heuristic framework for multi-objective test case prioritization (HH-MOTCP). The experimental results show that the selection method proposed by the authors performs best. In addition, the authors apply 18 selection strategies to dynamically select low-level heuristics during the evolution process of the HH-MOTCP. The results identify the best performing strategy for all test objects. Moreover, using the new strategies at the hyper-level makes HH-MOTCP more effective. (Recommended by Hyunsook Do). The third paper, ‘Exploiting deep reinforcement learning and metamorphic testing to automatically test virtual reality applications’ by Stevao Alves de Andrade, Fatima L. S. Nunes and Marcio Eduardo Delamaro, presents an approach to testing virtual reality (VR) applications. The experimental results show that it is feasible to adopting an automated approach of test generation with metamorphic testing and deep reinforcement learning for testing VR applications, especially serving as an effective alternative to identifying crashes related to collision and camera objects in VR applications. (Recommended by Yves Le Traon). We hope that these papers will inspire further research in related directions.
在本期中,我们很高兴介绍三篇关于基于模型的测试、测试用例优先排序和虚拟现实应用测试的论文。第一篇论文“关于将基于模型的测试转换为代码:系统文献综述”,作者是Fabiano C. Ferrari、Vinicius H. S. Durelli、Sten F. Andler、Jeff Offutt、Mehrdad Saadatmand和Nils mllner,该论文基于30个选定的主要研究,对通过基于模型的测试(MBT)方法生成的测试集计算源代码覆盖率进行了系统的文献综述。作者确定了可能影响MBT研究和实践的一些共同特征和限制。作者还讨论了与这些局限性相关的未来研究的含义。作者发现MBT在工业中的应用越来越多,模型到代码转换的应用越来越多,并且理解为模型设计的测试用例如何在代码上实现覆盖的需求也在增加。(郝丹推荐)。第二篇论文《MOTCP超启发式框架的超层次研究》,作者为郭俊霞、王睿、韩金金和李铮,提出了多目标测试用例优先级超启发式框架(HH-MOTCP)超层次的三种评价策略。实验结果表明,本文提出的选择方法效果最好。此外,作者还应用了18种选择策略来动态选择HH-MOTCP进化过程中的低级启发式。结果确定了所有测试对象的最佳执行策略。此外,在超高层使用新策略使HH-MOTCP更加有效。(推荐:杜贤淑)第三篇论文,“利用深度强化学习和变形测试来自动测试虚拟现实应用”,由Stevao Alves de Andrade, Fatima L. S. Nunes和Marcio Eduardo Delamaro提出了一种测试虚拟现实(VR)应用的方法。实验结果表明,将变形测试和深度强化学习相结合的测试生成自动化方法用于VR应用测试是可行的,特别是可以作为识别VR应用中与碰撞和相机对象相关的崩溃的有效替代方法。(Yves Le Traon推荐)。我们希望这些论文能够启发相关方向的进一步研究。
{"title":"Model‐based testing, test case prioritization and testing of virtual reality applications","authors":"Yves Le Traon, Tao Xie","doi":"10.1002/stvr.1868","DOIUrl":"https://doi.org/10.1002/stvr.1868","url":null,"abstract":"In this issue, we are pleased to present three papers on model-based testing, test case prioritization and testing of virtual reality applications. The first paper, ‘On transforming model-based tests into code: A systematic literature review’ by Fabiano C. Ferrari, Vinicius H. S. Durelli, Sten F. Andler, Jeff Offutt, Mehrdad Saadatmand and Nils Müllner, presents a systematic literature review based on 30 selected primary studies for computing source code coverage from test sets generated via model-based testing (MBT) approaches. The authors identify some common characteristics and limitations that may impact on MBT research and practice. The authors also discuss implications for future research related to these limitations. The authors find increasing adoption of MBT in industry, increasing application of model-to-code transformations and a complementary increasing need to understand how test cases designed for models achieve coverage on the code. (Recommended by Dan Hao). The second paper, ‘Research on hyper-level of hyper-heuristic framework for MOTCP’ by Junxia Guo, Rui Wang, Jinjin Han and Zheng Li, presents three evaluation strategies for the hyper-level of the hyper-heuristic framework for multi-objective test case prioritization (HH-MOTCP). The experimental results show that the selection method proposed by the authors performs best. In addition, the authors apply 18 selection strategies to dynamically select low-level heuristics during the evolution process of the HH-MOTCP. The results identify the best performing strategy for all test objects. Moreover, using the new strategies at the hyper-level makes HH-MOTCP more effective. (Recommended by Hyunsook Do). The third paper, ‘Exploiting deep reinforcement learning and metamorphic testing to automatically test virtual reality applications’ by Stevao Alves de Andrade, Fatima L. S. Nunes and Marcio Eduardo Delamaro, presents an approach to testing virtual reality (VR) applications. The experimental results show that it is feasible to adopting an automated approach of test generation with metamorphic testing and deep reinforcement learning for testing VR applications, especially serving as an effective alternative to identifying crashes related to collision and camera objects in VR applications. (Recommended by Yves Le Traon). We hope that these papers will inspire further research in related directions.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"125 41","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136351706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this issue, we are pleased to present two papers, one for in vivo testing and the other for integration of proving and testing. The first paper, ‘In vivo test and rollback of Java applications as they are’ by Antonia Bertolino, Guglielmo De Angelis, Breno Miranda and Paolo Tonella, presents the Groucho approach for in vivo testing, a specific kind of field software testing where testing activities are launched directly in the production environment during actual end-user sessions. The Groucho approach conducts in vivo testing of Java applications transparently, not necessarily requiring any source code modification nor even source code availability. Being an unobtrusive field testing framework, Groucho adopts a fully automated ‘test and rollback’ strategy. The empirical evaluations of Groucho show that its performance overhead can be kept to a negligible level by activating in vivo testing with low probability, along with showing the existence of faults that are unlikely exposed in-house and become easy to expose in the field and showing the quantified coverage increase gained when in vivo testing is added to complement in house testing. (Recommended by Xiaoyin Wang). The second paper, ‘A failed proof can yield a useful test’ by Li Huang and Bertrand Meyer, presents the Proof2Test tool, which takes advantage of the rich information that some automatic provers internally collect about the programme when attempting a proof. When the proof fails, Proof2Test uses the counterexample generated by the prover to produce a failed test, which provides the programmer with immediately exploitable information to correct the programme. The key assumption behind Proof2Test is that programme proofs (static) and programme tests (dynamic) are complementary rather than exclusive: proofs bring the absolute certainties that tests lack but are abstract and hard to get right; tests cannot guarantee correctness but, when they fail, bring the concreteness of counterexamples, immediately understandable to the programmer. (Recommended by Marcelo d'Amorim). We hope that these papers will inspire further research in related directions.
{"title":"In vivo testing and integration of proving and testing","authors":"Yves Le Traon, Tao Xie","doi":"10.1002/stvr.1866","DOIUrl":"https://doi.org/10.1002/stvr.1866","url":null,"abstract":"In this issue, we are pleased to present two papers, one for in vivo testing and the other for integration of proving and testing. The first paper, ‘In vivo test and rollback of Java applications as they are’ by Antonia Bertolino, Guglielmo De Angelis, Breno Miranda and Paolo Tonella, presents the Groucho approach for in vivo testing, a specific kind of field software testing where testing activities are launched directly in the production environment during actual end-user sessions. The Groucho approach conducts in vivo testing of Java applications transparently, not necessarily requiring any source code modification nor even source code availability. Being an unobtrusive field testing framework, Groucho adopts a fully automated ‘test and rollback’ strategy. The empirical evaluations of Groucho show that its performance overhead can be kept to a negligible level by activating in vivo testing with low probability, along with showing the existence of faults that are unlikely exposed in-house and become easy to expose in the field and showing the quantified coverage increase gained when in vivo testing is added to complement in house testing. (Recommended by Xiaoyin Wang). The second paper, ‘A failed proof can yield a useful test’ by Li Huang and Bertrand Meyer, presents the Proof2Test tool, which takes advantage of the rich information that some automatic provers internally collect about the programme when attempting a proof. When the proof fails, Proof2Test uses the counterexample generated by the prover to produce a failed test, which provides the programmer with immediately exploitable information to correct the programme. The key assumption behind Proof2Test is that programme proofs (static) and programme tests (dynamic) are complementary rather than exclusive: proofs bring the absolute certainties that tests lack but are abstract and hard to get right; tests cannot guarantee correctness but, when they fail, bring the concreteness of counterexamples, immediately understandable to the programmer. (Recommended by Marcelo d'Amorim). We hope that these papers will inspire further research in related directions.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135889909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sten Vercammen, Serge Demeyer, Markus Borg, Niklas Pettersson, Görel Hedin
Abstract Mutation testing is the state‐of‐the‐art technique for assessing the fault detection capacity of a test suite. Unfortunately, a full mutation analysis is often prohibitively expensive. The CppCheck project for instance, demands a build time of 5.8 min and a test execution time of 17 s on our desktop computer. An unoptimised mutation analysis, for 55,000 generated mutants took 11.8 days in total, of which 4.3 days is spent on (re)compiling the project. In this paper, we present a feasibility study, investigating how a number of optimisation strategies can be implemented based on the Clang front‐end. These optimisation strategies allow to eliminate the compilation and execution overhead in order to support efficient mutation testing for the C language family. We provide a proof‐of‐concept tool that achieves a speedup of between 2 and 30. We make a detailed analysis of the speedup induced by the optimisations, elaborate on the lessons learned and point out avenues for further improvements.
{"title":"Mutation testing optimisations using the Clang front‐end","authors":"Sten Vercammen, Serge Demeyer, Markus Borg, Niklas Pettersson, Görel Hedin","doi":"10.1002/stvr.1865","DOIUrl":"https://doi.org/10.1002/stvr.1865","url":null,"abstract":"Abstract Mutation testing is the state‐of‐the‐art technique for assessing the fault detection capacity of a test suite. Unfortunately, a full mutation analysis is often prohibitively expensive. The CppCheck project for instance, demands a build time of 5.8 min and a test execution time of 17 s on our desktop computer. An unoptimised mutation analysis, for 55,000 generated mutants took 11.8 days in total, of which 4.3 days is spent on (re)compiling the project. In this paper, we present a feasibility study, investigating how a number of optimisation strategies can be implemented based on the Clang front‐end. These optimisation strategies allow to eliminate the compilation and execution overhead in order to support efficient mutation testing for the C language family. We provide a proof‐of‐concept tool that achieves a speedup of between 2 and 30. We make a detailed analysis of the speedup induced by the optimisations, elaborate on the lessons learned and point out avenues for further improvements.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135992678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingling Li, Ziao Wang, Junjie Wang, Jie Chen, Rui Mou, Guibing Li
Summary Continuous integration (CI) is a widely applied development practice to allow frequent integration of software changes, detecting early faults. However, extremely frequent builds consume amounts of time and resources in such a scenario. It is quite challenging for existing test case prioritization (TCP) to address this issue due to the time‐consuming information collection (e.g. test coverage) or inaccurately modelling code semantics to result in the unsatisfied prioritization. In this paper, we propose a semantic‐aware two‐phase TCP framework, named SatTCP, which combines the coarse‐grained filtering and fine‐grained prioritization to perform the precise TCP with low time costs for CI. It consists of three parts: (1) code representation, parsing the programme changes and test cases to obtain the code change and test case representations; (2) coarse‐grained filtering, conducting the preliminary ranking and filtering of test cases based on information retrieval; and (3) fine‐grained prioritization, training a pretrained Siamese language model based on the filtered test set to further sort the test cases via semantic similarity. We evaluate SatTCP on a large‐scale, real‐world dataset with cross‐project validation from fault detection efficiency and time costs and compare it with five baselines. The results show that SatTCP outperforms all baselines by 6.3%–45.6% for mean average percentage of fault detected per cost (APFDc), representing an obvious upward trend as the project scale increases. Meanwhile, SatTCP can reduce the real CI testing by 71.4%, outperforming the best baseline by 17.2% for time costs on average. Furthermore, we discuss the impact of different configurations, flaky tests and hybrid techniques on the performance of SatTCP, respectively.
{"title":"Semantic‐aware two‐phase test case prioritization for continuous integration","authors":"Yingling Li, Ziao Wang, Junjie Wang, Jie Chen, Rui Mou, Guibing Li","doi":"10.1002/stvr.1864","DOIUrl":"https://doi.org/10.1002/stvr.1864","url":null,"abstract":"Summary Continuous integration (CI) is a widely applied development practice to allow frequent integration of software changes, detecting early faults. However, extremely frequent builds consume amounts of time and resources in such a scenario. It is quite challenging for existing test case prioritization (TCP) to address this issue due to the time‐consuming information collection (e.g. test coverage) or inaccurately modelling code semantics to result in the unsatisfied prioritization. In this paper, we propose a semantic‐aware two‐phase TCP framework, named SatTCP, which combines the coarse‐grained filtering and fine‐grained prioritization to perform the precise TCP with low time costs for CI. It consists of three parts: (1) code representation, parsing the programme changes and test cases to obtain the code change and test case representations; (2) coarse‐grained filtering, conducting the preliminary ranking and filtering of test cases based on information retrieval; and (3) fine‐grained prioritization, training a pretrained Siamese language model based on the filtered test set to further sort the test cases via semantic similarity. We evaluate SatTCP on a large‐scale, real‐world dataset with cross‐project validation from fault detection efficiency and time costs and compare it with five baselines. The results show that SatTCP outperforms all baselines by 6.3%–45.6% for mean average percentage of fault detected per cost (APFDc), representing an obvious upward trend as the project scale increases. Meanwhile, SatTCP can reduce the real CI testing by 71.4%, outperforming the best baseline by 17.2% for time costs on average. Furthermore, we discuss the impact of different configurations, flaky tests and hybrid techniques on the performance of SatTCP, respectively.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134885310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stevão Alves de Andrade, Fatima L. S. Nunes, Márcio Eduardo Delamaro
Summary Despite the rapid growth and popularization of virtual reality (VR) applications, which have enabled new concepts for handling and solving existing problems through VR in various domains, practices related to software engineering have not kept up with this growth. Recent studies indicate that one of the topics that is still little explored in this area is software testing, as VR applications can be built for practically any type of purpose, making it difficult to generalize knowledge to be applied. In this paper, we present an approach that combines metamorphic testing, agent‐based testing and machine learning to test VR applications, focusing on finding collision and camera‐related faults. Our approach proposes the use of metamorphic relations to detect faults in collision and camera components in VR applications, as well as the use of intelligent agents for the automatic generation of test data. To evaluate the proposed approach, we conducted an experimental study on four VR applications, and the results showed an of the solution ranging from 93% to 69%, depending on the complexity of the application tested. We also discussed the feasibility of extending the approach to identify other types of faults in VR applications. In conclusion, we discussed important trends and opportunities that can benefit both academics and practitioners.
{"title":"Exploiting deep reinforcement learning and metamorphic testing to automatically test virtual reality applications","authors":"Stevão Alves de Andrade, Fatima L. S. Nunes, Márcio Eduardo Delamaro","doi":"10.1002/stvr.1863","DOIUrl":"https://doi.org/10.1002/stvr.1863","url":null,"abstract":"Summary Despite the rapid growth and popularization of virtual reality (VR) applications, which have enabled new concepts for handling and solving existing problems through VR in various domains, practices related to software engineering have not kept up with this growth. Recent studies indicate that one of the topics that is still little explored in this area is software testing, as VR applications can be built for practically any type of purpose, making it difficult to generalize knowledge to be applied. In this paper, we present an approach that combines metamorphic testing, agent‐based testing and machine learning to test VR applications, focusing on finding collision and camera‐related faults. Our approach proposes the use of metamorphic relations to detect faults in collision and camera components in VR applications, as well as the use of intelligent agents for the automatic generation of test data. To evaluate the proposed approach, we conducted an experimental study on four VR applications, and the results showed an of the solution ranging from 93% to 69%, depending on the complexity of the application tested. We also discussed the feasibility of extending the approach to identify other types of faults in VR applications. In conclusion, we discussed important trends and opportunities that can benefit both academics and practitioners.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135014022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabiano C. Ferrari, Vinicius H. S. Durelli, Sten F. Andler, Jeff Offutt, Mehrdad Saadatmand, Nils Müllner
Model‐based test design is increasingly being applied in practice and studied in research. Model‐based testing (MBT) exploits abstract models of the software behaviour to generate abstract tests, which are then transformed into concrete tests ready to run on the code. Given that abstract tests are designed to cover models but are run on code (after transformation), the effectiveness of MBT is dependent on whether model coverage also ensures coverage of key functional code. In this article, we investigate how MBT approaches generate tests from model specifications and how the coverage of tests designed strictly based on the model translates to code coverage. We used snowballing to conduct a systematic literature review. We started with three primary studies, which we refer to as the initial seeds. At the end of our search iterations, we analysed 30 studies that helped answer our research questions. More specifically, this article characterizes how test sets generated at the model level are mapped and applied to the source code level, discusses how tests are generated from the model specifications, analyses how the test coverage of models relates to the test coverage of the code when the same test set is executed and identifies the technologies and software development tasks that are on focus in the selected studies. Finally, we identify common characteristics and limitations that impact the research and practice of MBT: (i) some studies did not fully describe how tools transform abstract tests into concrete tests, (ii) some studies overlooked the computational cost of model‐based approaches and (iii) some studies found evidence that bears out a robust correlation between decision coverage at the model level and branch coverage at the code level. We also noted that most primary studies omitted essential details about the experiments.
{"title":"On transforming model‐based tests into code: A systematic literature review","authors":"Fabiano C. Ferrari, Vinicius H. S. Durelli, Sten F. Andler, Jeff Offutt, Mehrdad Saadatmand, Nils Müllner","doi":"10.1002/stvr.1860","DOIUrl":"https://doi.org/10.1002/stvr.1860","url":null,"abstract":"Model‐based test design is increasingly being applied in practice and studied in research. Model‐based testing (MBT) exploits abstract models of the software behaviour to generate abstract tests, which are then transformed into concrete tests ready to run on the code. Given that abstract tests are designed to cover models but are run on code (after transformation), the effectiveness of MBT is dependent on whether model coverage also ensures coverage of key functional code. In this article, we investigate how MBT approaches generate tests from model specifications and how the coverage of tests designed strictly based on the model translates to code coverage. We used snowballing to conduct a systematic literature review. We started with three primary studies, which we refer to as the initial seeds. At the end of our search iterations, we analysed 30 studies that helped answer our research questions. More specifically, this article characterizes how test sets generated at the model level are mapped and applied to the source code level, discusses how tests are generated from the model specifications, analyses how the test coverage of models relates to the test coverage of the code when the same test set is executed and identifies the technologies and software development tasks that are on focus in the selected studies. Finally, we identify common characteristics and limitations that impact the research and practice of MBT: (i) some studies did not fully describe how tools transform abstract tests into concrete tests, (ii) some studies overlooked the computational cost of model‐based approaches and (iii) some studies found evidence that bears out a robust correlation between decision coverage at the model level and branch coverage at the code level. We also noted that most primary studies omitted essential details about the experiments.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"54 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73010629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heuristic algorithms are widely used to solve multi‐objective test case prioritization (MOTCP) problems. However, they perform differently for different test scenarios, which conducts difficulty in applying a suitable algorithm for new test requests in the industry. A concrete hyper‐heuristic framework for MOTCP (HH‐MOTCP) is proposed for addressing this problem. It mainly has two parts: low‐level encapsulating various algorithms and hyper‐level including an evaluation and selection mechanism that dynamically selects low‐level algorithms. This framework performs good but still difficult to keep in the best three. If the evaluation mechanism can more accurately analyse the current results, it will help the selection strategy to find more conducive algorithms for evolution. Meanwhile, if the selection strategy can find a more suitable algorithm for the next generation, the performance of the HH‐MOTCP framework will be better. In this paper, we first propose new strategies for evaluating the current generation results, then perform an extensive study on the selection strategies which decide the heuristic algorithm for the next generation. Experimental results show that the new evaluation and selection strategies proposed in this paper can make the HH‐MOTCP framework more effective and efficient, which makes it almost the best two except for one test object and ahead in about 40% of all test objects.
{"title":"Research on hyper‐level of hyper‐heuristic framework for MOTCP","authors":"Junxia Guo, Rui Wang, Jinjin Han, Zheng Li","doi":"10.1002/stvr.1861","DOIUrl":"https://doi.org/10.1002/stvr.1861","url":null,"abstract":"Heuristic algorithms are widely used to solve multi‐objective test case prioritization (MOTCP) problems. However, they perform differently for different test scenarios, which conducts difficulty in applying a suitable algorithm for new test requests in the industry. A concrete hyper‐heuristic framework for MOTCP (HH‐MOTCP) is proposed for addressing this problem. It mainly has two parts: low‐level encapsulating various algorithms and hyper‐level including an evaluation and selection mechanism that dynamically selects low‐level algorithms. This framework performs good but still difficult to keep in the best three. If the evaluation mechanism can more accurately analyse the current results, it will help the selection strategy to find more conducive algorithms for evolution. Meanwhile, if the selection strategy can find a more suitable algorithm for the next generation, the performance of the HH‐MOTCP framework will be better. In this paper, we first propose new strategies for evaluating the current generation results, then perform an extensive study on the selection strategies which decide the heuristic algorithm for the next generation. Experimental results show that the new evaluation and selection strategies proposed in this paper can make the HH‐MOTCP framework more effective and efficient, which makes it almost the best two except for one test object and ahead in about 40% of all test objects.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"15 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87769194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract A successful automated program proof is, in software verification, the ultimate triumph. In practice, however, the road to such success is paved with many failed proof attempts. Unlike a failed test, which provides concrete evidence of an actual bug in the program, a failed proof leaves the programmer in the dark. Can we instead learn something useful from it? The work reported here takes advantage of the rich information that some automatic provers internally collect about the program when attempting a proof. If the proof fails, the Proof2Test tool presented in this article uses the counterexample generated by the prover (specifically, the SMT solver underlying the Boogie tool used in the AutoProof system to perform correctness proofs of contract‐equipped Eiffel programs) to produce a failed test, which provides the programmer with immediately exploitable information to correct the program. The discussion presents Proof2Test and the application of the ideas and tool to a collection of representative examples.
{"title":"A failed proof can yield a useful test","authors":"Li Huang, Bertrand Meyer","doi":"10.1002/stvr.1859","DOIUrl":"https://doi.org/10.1002/stvr.1859","url":null,"abstract":"Abstract A successful automated program proof is, in software verification, the ultimate triumph. In practice, however, the road to such success is paved with many failed proof attempts. Unlike a failed test, which provides concrete evidence of an actual bug in the program, a failed proof leaves the programmer in the dark. Can we instead learn something useful from it? The work reported here takes advantage of the rich information that some automatic provers internally collect about the program when attempting a proof. If the proof fails, the Proof2Test tool presented in this article uses the counterexample generated by the prover (specifically, the SMT solver underlying the Boogie tool used in the AutoProof system to perform correctness proofs of contract‐equipped Eiffel programs) to produce a failed test, which provides the programmer with immediately exploitable information to correct the program. The discussion presents Proof2Test and the application of the ideas and tool to a collection of representative examples.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135783007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep neural network supervision and data flow testing","authors":"Y. Le Traon, Tao Xie","doi":"10.1002/stvr.1862","DOIUrl":"https://doi.org/10.1002/stvr.1862","url":null,"abstract":",","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"12 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84704537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this issue, we are pleased to present two papers: one for risk assessment for an industrial Internet of Things and the other for testing speech recognition systems. The first paper, ‘ HiRAM: A Hierarchical Risk Assessment Model and Its Implementation for an Industrial Internet of Things in the Cloud ’ by Wen-Lin Sun, Ying-Han Tang and Yu-Lun Huang, proposes Hierarchical Risk Assessment Model (HiRAM) for an IIoT cloud platform to enable self-evaluate its security status by leveraging analytic hierarchy processes (AHPs). The authors also realise HiRAM-RAS, a modular and responsive Risk Assessment System based on HiRAM, and evaluate it in a real-world IIoT cloud platform. The evaluation results show the changes in integrity and availability scores evaluated by HiRAM. (Recommended by Xiaoyin Wang). The second paper, ‘ Adversarial Example-based Test Case Generation for Black-box Speech Recognition Systems ’ by Hanbo Cai, Pengcheng Zhang, Hai Dong, Lars Grunske, Shunhui Ji and Tianhao Yuan, proposes methods for generating targeted adversarial examples for speech recognition systems, based on the firefly algorithm. These methods generate the targeted adversarial samples by continuously adding interference noise to the original speech samples. The evaluation results show that the proposed methods achieve satisfactory results on three speech datasets (Google Command, Common Voice and LibriSpeech), and compared with existing methods, these methods can effectively improve the success rate of the targeted adversarial example generation. (Recommended by Yves Le Traon). We hope that these papers will inspire further research in these directions of quality assurance.
在本期中,我们很高兴地介绍两篇论文:一篇用于工业物联网的风险评估,另一篇用于测试语音识别系统。第一篇论文《HiRAM:工业物联网云中的分层风险评估模型及其实现》由孙文林、唐英涵和黄玉伦撰写,提出了工业物联网云平台的分层风险评估模型(HiRAM),通过利用层次分析法(AHPs)对其安全状态进行自我评估。作者还实现了基于HiRAM的模块化响应式风险评估系统HiRAM- ras,并在现实世界的IIoT云平台中对其进行了评估。评估结果显示了HiRAM评估的完整性和可用性得分的变化。(推荐人:王小银)第二篇论文《黑箱语音识别系统的基于对抗性示例的测试用例生成》由蔡汉波、张鹏程、东海、Lars Grunske、纪顺辉和袁天豪撰写,提出了基于萤火虫算法生成语音识别系统目标对抗性示例的方法。这些方法通过在原始语音样本中不断添加干扰噪声来生成目标对抗样本。评价结果表明,所提方法在谷歌Command、Common Voice和librisspeech三个语音数据集上取得了满意的结果,与现有方法相比,能有效提高目标对抗样例生成的成功率。(Yves Le Traon推荐)。我们希望这些论文能对这些质量保证方向的进一步研究起到启发作用。
{"title":"Quality assurance for Internet of Things and speech recognition systems","authors":"Yves Le Traon, Tao Xie","doi":"10.1002/stvr.1858","DOIUrl":"https://doi.org/10.1002/stvr.1858","url":null,"abstract":"In this issue, we are pleased to present two papers: one for risk assessment for an industrial Internet of Things and the other for testing speech recognition systems. The first paper, ‘ HiRAM: A Hierarchical Risk Assessment Model and Its Implementation for an Industrial Internet of Things in the Cloud ’ by Wen-Lin Sun, Ying-Han Tang and Yu-Lun Huang, proposes Hierarchical Risk Assessment Model (HiRAM) for an IIoT cloud platform to enable self-evaluate its security status by leveraging analytic hierarchy processes (AHPs). The authors also realise HiRAM-RAS, a modular and responsive Risk Assessment System based on HiRAM, and evaluate it in a real-world IIoT cloud platform. The evaluation results show the changes in integrity and availability scores evaluated by HiRAM. (Recommended by Xiaoyin Wang). The second paper, ‘ Adversarial Example-based Test Case Generation for Black-box Speech Recognition Systems ’ by Hanbo Cai, Pengcheng Zhang, Hai Dong, Lars Grunske, Shunhui Ji and Tianhao Yuan, proposes methods for generating targeted adversarial examples for speech recognition systems, based on the firefly algorithm. These methods generate the targeted adversarial samples by continuously adding interference noise to the original speech samples. The evaluation results show that the proposed methods achieve satisfactory results on three speech datasets (Google Command, Common Voice and LibriSpeech), and compared with existing methods, these methods can effectively improve the success rate of the targeted adversarial example generation. (Recommended by Yves Le Traon). We hope that these papers will inspire further research in these directions of quality assurance.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"23 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84974210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}