Software Testing Verification & Reliability最新文献

Model‐based testing, test case prioritization and testing of virtual reality applications 基于模型的测试，测试用例优先级和虚拟现实应用的测试

4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2023-11-13 DOI: 10.1002/stvr.1868

Yves Le Traon, Tao Xie

In this issue, we are pleased to present three papers on model-based testing, test case prioritization and testing of virtual reality applications. The first paper, ‘On transforming model-based tests into code: A systematic literature review’ by Fabiano C. Ferrari, Vinicius H. S. Durelli, Sten F. Andler, Jeff Offutt, Mehrdad Saadatmand and Nils Müllner, presents a systematic literature review based on 30 selected primary studies for computing source code coverage from test sets generated via model-based testing (MBT) approaches. The authors identify some common characteristics and limitations that may impact on MBT research and practice. The authors also discuss implications for future research related to these limitations. The authors find increasing adoption of MBT in industry, increasing application of model-to-code transformations and a complementary increasing need to understand how test cases designed for models achieve coverage on the code. (Recommended by Dan Hao). The second paper, ‘Research on hyper-level of hyper-heuristic framework for MOTCP’ by Junxia Guo, Rui Wang, Jinjin Han and Zheng Li, presents three evaluation strategies for the hyper-level of the hyper-heuristic framework for multi-objective test case prioritization (HH-MOTCP). The experimental results show that the selection method proposed by the authors performs best. In addition, the authors apply 18 selection strategies to dynamically select low-level heuristics during the evolution process of the HH-MOTCP. The results identify the best performing strategy for all test objects. Moreover, using the new strategies at the hyper-level makes HH-MOTCP more effective. (Recommended by Hyunsook Do). The third paper, ‘Exploiting deep reinforcement learning and metamorphic testing to automatically test virtual reality applications’ by Stevao Alves de Andrade, Fatima L. S. Nunes and Marcio Eduardo Delamaro, presents an approach to testing virtual reality (VR) applications. The experimental results show that it is feasible to adopting an automated approach of test generation with metamorphic testing and deep reinforcement learning for testing VR applications, especially serving as an effective alternative to identifying crashes related to collision and camera objects in VR applications. (Recommended by Yves Le Traon). We hope that these papers will inspire further research in related directions.

在本期中，我们很高兴介绍三篇关于基于模型的测试、测试用例优先排序和虚拟现实应用测试的论文。第一篇论文“关于将基于模型的测试转换为代码:系统文献综述”，作者是Fabiano C. Ferrari、Vinicius H. S. Durelli、Sten F. Andler、Jeff Offutt、Mehrdad Saadatmand和Nils mllner，该论文基于30个选定的主要研究，对通过基于模型的测试(MBT)方法生成的测试集计算源代码覆盖率进行了系统的文献综述。作者确定了可能影响MBT研究和实践的一些共同特征和限制。作者还讨论了与这些局限性相关的未来研究的含义。作者发现MBT在工业中的应用越来越多，模型到代码转换的应用越来越多，并且理解为模型设计的测试用例如何在代码上实现覆盖的需求也在增加。(郝丹推荐)。第二篇论文《MOTCP超启发式框架的超层次研究》，作者为郭俊霞、王睿、韩金金和李铮，提出了多目标测试用例优先级超启发式框架(HH-MOTCP)超层次的三种评价策略。实验结果表明，本文提出的选择方法效果最好。此外，作者还应用了18种选择策略来动态选择HH-MOTCP进化过程中的低级启发式。结果确定了所有测试对象的最佳执行策略。此外，在超高层使用新策略使HH-MOTCP更加有效。(推荐:杜贤淑)第三篇论文，“利用深度强化学习和变形测试来自动测试虚拟现实应用”，由Stevao Alves de Andrade, Fatima L. S. Nunes和Marcio Eduardo Delamaro提出了一种测试虚拟现实(VR)应用的方法。实验结果表明，将变形测试和深度强化学习相结合的测试生成自动化方法用于VR应用测试是可行的，特别是可以作为识别VR应用中与碰撞和相机对象相关的崩溃的有效替代方法。(Yves Le Traon推荐)。我们希望这些论文能够启发相关方向的进一步研究。

{"title":"Model‐based testing, test case prioritization and testing of virtual reality applications","authors":"Yves Le Traon, Tao Xie","doi":"10.1002/stvr.1868","DOIUrl":"https://doi.org/10.1002/stvr.1868","url":null,"abstract":"In this issue, we are pleased to present three papers on model-based testing, test case prioritization and testing of virtual reality applications. The first paper, ‘On transforming model-based tests into code: A systematic literature review’ by Fabiano C. Ferrari, Vinicius H. S. Durelli, Sten F. Andler, Jeff Offutt, Mehrdad Saadatmand and Nils Müllner, presents a systematic literature review based on 30 selected primary studies for computing source code coverage from test sets generated via model-based testing (MBT) approaches. The authors identify some common characteristics and limitations that may impact on MBT research and practice. The authors also discuss implications for future research related to these limitations. The authors find increasing adoption of MBT in industry, increasing application of model-to-code transformations and a complementary increasing need to understand how test cases designed for models achieve coverage on the code. (Recommended by Dan Hao). The second paper, ‘Research on hyper-level of hyper-heuristic framework for MOTCP’ by Junxia Guo, Rui Wang, Jinjin Han and Zheng Li, presents three evaluation strategies for the hyper-level of the hyper-heuristic framework for multi-objective test case prioritization (HH-MOTCP). The experimental results show that the selection method proposed by the authors performs best. In addition, the authors apply 18 selection strategies to dynamically select low-level heuristics during the evolution process of the HH-MOTCP. The results identify the best performing strategy for all test objects. Moreover, using the new strategies at the hyper-level makes HH-MOTCP more effective. (Recommended by Hyunsook Do). The third paper, ‘Exploiting deep reinforcement learning and metamorphic testing to automatically test virtual reality applications’ by Stevao Alves de Andrade, Fatima L. S. Nunes and Marcio Eduardo Delamaro, presents an approach to testing virtual reality (VR) applications. The experimental results show that it is feasible to adopting an automated approach of test generation with metamorphic testing and deep reinforcement learning for testing VR applications, especially serving as an effective alternative to identifying crashes related to collision and camera objects in VR applications. (Recommended by Yves Le Traon). We hope that these papers will inspire further research in related directions.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"125 41","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136351706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

In vivo testing and integration of proving and testing 体内测试及验证与测试的整合

4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2023-10-18 DOI: 10.1002/stvr.1866

Yves Le Traon, Tao Xie

In this issue, we are pleased to present two papers, one for in vivo testing and the other for integration of proving and testing. The first paper, ‘In vivo test and rollback of Java applications as they are’ by Antonia Bertolino, Guglielmo De Angelis, Breno Miranda and Paolo Tonella, presents the Groucho approach for in vivo testing, a specific kind of field software testing where testing activities are launched directly in the production environment during actual end-user sessions. The Groucho approach conducts in vivo testing of Java applications transparently, not necessarily requiring any source code modification nor even source code availability. Being an unobtrusive field testing framework, Groucho adopts a fully automated ‘test and rollback’ strategy. The empirical evaluations of Groucho show that its performance overhead can be kept to a negligible level by activating in vivo testing with low probability, along with showing the existence of faults that are unlikely exposed in-house and become easy to expose in the field and showing the quantified coverage increase gained when in vivo testing is added to complement in house testing. (Recommended by Xiaoyin Wang). The second paper, ‘A failed proof can yield a useful test’ by Li Huang and Bertrand Meyer, presents the Proof2Test tool, which takes advantage of the rich information that some automatic provers internally collect about the programme when attempting a proof. When the proof fails, Proof2Test uses the counterexample generated by the prover to produce a failed test, which provides the programmer with immediately exploitable information to correct the programme. The key assumption behind Proof2Test is that programme proofs (static) and programme tests (dynamic) are complementary rather than exclusive: proofs bring the absolute certainties that tests lack but are abstract and hard to get right; tests cannot guarantee correctness but, when they fail, bring the concreteness of counterexamples, immediately understandable to the programmer. (Recommended by Marcelo d'Amorim). We hope that these papers will inspire further research in related directions.

在这一期，我们很高兴提供两篇论文，一篇是关于体内测试，另一篇是关于证明和测试的整合。第一篇论文，“Java应用程序的体内测试和回滚”，作者是Antonia Bertolino, Guglielmo De Angelis, Breno Miranda和Paolo Tonella，介绍了Groucho的体内测试方法，这是一种特定的现场软件测试，在实际的最终用户会话期间，测试活动直接在生产环境中启动。Groucho方法透明地进行Java应用程序的活体测试，不一定需要任何源代码修改，甚至不需要源代码可用性。作为一个不引人注目的现场测试框架，Groucho采用了完全自动化的“测试和回滚”策略。对Groucho的实证评估表明，通过以低概率激活体内测试，同时显示不太可能在内部暴露但在现场容易暴露的故障的存在，以及显示添加体内测试以补充室内测试时获得的量化覆盖率增加，可以将其性能开销保持在可忽略的水平。(推荐人:王小银)第二篇论文，“一个失败的证明可以产生一个有用的测试”，由Li Huang和Bertrand Meyer提出了Proof2Test工具，它利用了一些自动证明者在尝试证明时内部收集的关于程序的丰富信息。当证明失败时，Proof2Test使用由证明者生成的反例来生成失败的测试，这为程序员提供了立即可利用的信息来纠正程序。Proof2Test背后的关键假设是，程序证明(静态)和程序测试(动态)是互补的，而不是排斥的:证明带来了测试所缺乏的绝对确定性，但证明是抽象的，很难正确;测试不能保证正确性，但是，当测试失败时，会给程序员带来反例的具体性，让他们立即理解。(Marcelo d’amorim推荐)。我们希望这些论文能够启发相关方向的进一步研究。

{"title":"In vivo testing and integration of proving and testing","authors":"Yves Le Traon, Tao Xie","doi":"10.1002/stvr.1866","DOIUrl":"https://doi.org/10.1002/stvr.1866","url":null,"abstract":"In this issue, we are pleased to present two papers, one for in vivo testing and the other for integration of proving and testing. The first paper, ‘In vivo test and rollback of Java applications as they are’ by Antonia Bertolino, Guglielmo De Angelis, Breno Miranda and Paolo Tonella, presents the Groucho approach for in vivo testing, a specific kind of field software testing where testing activities are launched directly in the production environment during actual end-user sessions. The Groucho approach conducts in vivo testing of Java applications transparently, not necessarily requiring any source code modification nor even source code availability. Being an unobtrusive field testing framework, Groucho adopts a fully automated ‘test and rollback’ strategy. The empirical evaluations of Groucho show that its performance overhead can be kept to a negligible level by activating in vivo testing with low probability, along with showing the existence of faults that are unlikely exposed in-house and become easy to expose in the field and showing the quantified coverage increase gained when in vivo testing is added to complement in house testing. (Recommended by Xiaoyin Wang). The second paper, ‘A failed proof can yield a useful test’ by Li Huang and Bertrand Meyer, presents the Proof2Test tool, which takes advantage of the rich information that some automatic provers internally collect about the programme when attempting a proof. When the proof fails, Proof2Test uses the counterexample generated by the prover to produce a failed test, which provides the programmer with immediately exploitable information to correct the programme. The key assumption behind Proof2Test is that programme proofs (static) and programme tests (dynamic) are complementary rather than exclusive: proofs bring the absolute certainties that tests lack but are abstract and hard to get right; tests cannot guarantee correctness but, when they fail, bring the concreteness of counterexamples, immediately understandable to the programmer. (Recommended by Marcelo d'Amorim). We hope that these papers will inspire further research in related directions.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135889909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mutation testing optimisations using the Clang front‐end 使用Clang前端的突变测试优化

4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2023-10-17 DOI: 10.1002/stvr.1865

Sten Vercammen, Serge Demeyer, Markus Borg, Niklas Pettersson, Görel Hedin

Abstract Mutation testing is the state‐of‐the‐art technique for assessing the fault detection capacity of a test suite. Unfortunately, a full mutation analysis is often prohibitively expensive. The CppCheck project for instance, demands a build time of 5.8 min and a test execution time of 17 s on our desktop computer. An unoptimised mutation analysis, for 55,000 generated mutants took 11.8 days in total, of which 4.3 days is spent on (re)compiling the project. In this paper, we present a feasibility study, investigating how a number of optimisation strategies can be implemented based on the Clang front‐end. These optimisation strategies allow to eliminate the compilation and execution overhead in order to support efficient mutation testing for the C language family. We provide a proof‐of‐concept tool that achieves a speedup of between 2 and 30. We make a detailed analysis of the speedup induced by the optimisations, elaborate on the lessons learned and point out avenues for further improvements.

摘要突变测试是评估测试套件的故障检测能力的最新技术。不幸的是，一个完整的突变分析通常是非常昂贵的。例如，CppCheck项目在我们的桌面计算机上需要5.8分钟的构建时间和17秒的测试执行时间。对55,000个产生的突变进行非优化突变分析总共需要11.8天，其中4.3天用于(重新)编译项目。在本文中，我们提出了一项可行性研究，调查了如何基于Clang前端实施一些优化策略。这些优化策略允许消除编译和执行开销，以便支持C语言家族的高效突变测试。我们提供了一个概念验证工具，可以实现2到30之间的加速。我们对优化所带来的加速进行了详细的分析，详细阐述了经验教训，并指出了进一步改进的途径。

引用次数: 0

Semantic‐aware two‐phase test case prioritization for continuous integration 语义感知的两阶段测试用例优先级用于持续集成

4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2023-09-26 DOI: 10.1002/stvr.1864

Yingling Li, Ziao Wang, Junjie Wang, Jie Chen, Rui Mou, Guibing Li

Summary Continuous integration (CI) is a widely applied development practice to allow frequent integration of software changes, detecting early faults. However, extremely frequent builds consume amounts of time and resources in such a scenario. It is quite challenging for existing test case prioritization (TCP) to address this issue due to the time‐consuming information collection (e.g. test coverage) or inaccurately modelling code semantics to result in the unsatisfied prioritization. In this paper, we propose a semantic‐aware two‐phase TCP framework, named SatTCP, which combines the coarse‐grained filtering and fine‐grained prioritization to perform the precise TCP with low time costs for CI. It consists of three parts: (1) code representation, parsing the programme changes and test cases to obtain the code change and test case representations; (2) coarse‐grained filtering, conducting the preliminary ranking and filtering of test cases based on information retrieval; and (3) fine‐grained prioritization, training a pretrained Siamese language model based on the filtered test set to further sort the test cases via semantic similarity. We evaluate SatTCP on a large‐scale, real‐world dataset with cross‐project validation from fault detection efficiency and time costs and compare it with five baselines. The results show that SatTCP outperforms all baselines by 6.3%–45.6% for mean average percentage of fault detected per cost (APFDc), representing an obvious upward trend as the project scale increases. Meanwhile, SatTCP can reduce the real CI testing by 71.4%, outperforming the best baseline by 17.2% for time costs on average. Furthermore, we discuss the impact of different configurations, flaky tests and hybrid techniques on the performance of SatTCP, respectively.

持续集成(CI)是一种广泛应用的开发实践，它允许频繁集成软件更改，检测早期故障。然而，在这种情况下，极其频繁的构建会消耗大量的时间和资源。对于现有的测试用例优先级(TCP)来说，由于耗时的信息收集(例如测试覆盖率)或不准确的代码语义建模导致不满意的优先级，解决这个问题是相当具有挑战性的。在本文中，我们提出了一个语义感知的两阶段TCP框架，称为SatTCP，它结合了粗粒度过滤和细粒度优先级，以低时间成本为CI执行精确的TCP。它包括三个部分:(1)代码表示，解析程序变化和测试用例，获得代码变化和测试用例的表示;(2)粗粒度过滤，基于信息检索对测试用例进行初步排序和过滤;(3)细粒度优先级，基于过滤的测试集训练预训练的暹罗语模型，通过语义相似度对测试用例进行进一步排序。我们在大规模的真实世界数据集上评估了satcp，并从故障检测效率和时间成本方面进行了跨项目验证，并将其与五个基线进行了比较。结果表明，SatTCP的平均每成本故障检测百分比(APFDc)比所有基线高6.3% ~ 45.6%，且随着项目规模的扩大呈明显的上升趋势。同时，SatTCP可以将实际CI测试减少71.4%，平均时间成本比最佳基线高出17.2%。此外，我们还讨论了不同结构、薄片测试和混合技术对SatTCP性能的影响。

{"title":"Semantic‐aware two‐phase test case prioritization for continuous integration","authors":"Yingling Li, Ziao Wang, Junjie Wang, Jie Chen, Rui Mou, Guibing Li","doi":"10.1002/stvr.1864","DOIUrl":"https://doi.org/10.1002/stvr.1864","url":null,"abstract":"Summary Continuous integration (CI) is a widely applied development practice to allow frequent integration of software changes, detecting early faults. However, extremely frequent builds consume amounts of time and resources in such a scenario. It is quite challenging for existing test case prioritization (TCP) to address this issue due to the time‐consuming information collection (e.g. test coverage) or inaccurately modelling code semantics to result in the unsatisfied prioritization. In this paper, we propose a semantic‐aware two‐phase TCP framework, named SatTCP, which combines the coarse‐grained filtering and fine‐grained prioritization to perform the precise TCP with low time costs for CI. It consists of three parts: (1) code representation, parsing the programme changes and test cases to obtain the code change and test case representations; (2) coarse‐grained filtering, conducting the preliminary ranking and filtering of test cases based on information retrieval; and (3) fine‐grained prioritization, training a pretrained Siamese language model based on the filtered test set to further sort the test cases via semantic similarity. We evaluate SatTCP on a large‐scale, real‐world dataset with cross‐project validation from fault detection efficiency and time costs and compare it with five baselines. The results show that SatTCP outperforms all baselines by 6.3%–45.6% for mean average percentage of fault detected per cost (APFDc), representing an obvious upward trend as the project scale increases. Meanwhile, SatTCP can reduce the real CI testing by 71.4%, outperforming the best baseline by 17.2% for time costs on average. Furthermore, we discuss the impact of different configurations, flaky tests and hybrid techniques on the performance of SatTCP, respectively.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134885310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploiting deep reinforcement learning and metamorphic testing to automatically test virtual reality applications 利用深度强化学习和变形测试来自动测试虚拟现实应用

4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2023-09-19 DOI: 10.1002/stvr.1863

Stevão Alves de Andrade, Fatima L. S. Nunes, Márcio Eduardo Delamaro

Summary Despite the rapid growth and popularization of virtual reality (VR) applications, which have enabled new concepts for handling and solving existing problems through VR in various domains, practices related to software engineering have not kept up with this growth. Recent studies indicate that one of the topics that is still little explored in this area is software testing, as VR applications can be built for practically any type of purpose, making it difficult to generalize knowledge to be applied. In this paper, we present an approach that combines metamorphic testing, agent‐based testing and machine learning to test VR applications, focusing on finding collision and camera‐related faults. Our approach proposes the use of metamorphic relations to detect faults in collision and camera components in VR applications, as well as the use of intelligent agents for the automatic generation of test data. To evaluate the proposed approach, we conducted an experimental study on four VR applications, and the results showed an of the solution ranging from 93% to 69%, depending on the complexity of the application tested. We also discussed the feasibility of extending the approach to identify other types of faults in VR applications. In conclusion, we discussed important trends and opportunities that can benefit both academics and practitioners.

尽管虚拟现实(VR)应用的快速增长和普及，为通过VR处理和解决各个领域的现有问题提供了新的概念，但与软件工程相关的实践并没有跟上这种增长。最近的研究表明，在这一领域仍然很少探索的主题之一是软件测试，因为VR应用程序可以为几乎任何类型的目的而构建，因此很难概括应用的知识。在本文中，我们提出了一种结合变形测试、基于智能体的测试和机器学习的方法来测试VR应用程序，重点是发现碰撞和相机相关的故障。我们的方法提出了在VR应用中使用变形关系来检测碰撞和相机组件的故障，以及使用智能代理来自动生成测试数据。为了评估所提出的方法，我们对四种VR应用进行了实验研究，结果表明，根据测试应用的复杂性，解决方案的比例从93%到69%不等。我们还讨论了扩展该方法以识别VR应用中其他类型故障的可行性。最后，我们讨论了对学术界和实践者都有利的重要趋势和机会。

{"title":"Exploiting deep reinforcement learning and metamorphic testing to automatically test virtual reality applications","authors":"Stevão Alves de Andrade, Fatima L. S. Nunes, Márcio Eduardo Delamaro","doi":"10.1002/stvr.1863","DOIUrl":"https://doi.org/10.1002/stvr.1863","url":null,"abstract":"Summary Despite the rapid growth and popularization of virtual reality (VR) applications, which have enabled new concepts for handling and solving existing problems through VR in various domains, practices related to software engineering have not kept up with this growth. Recent studies indicate that one of the topics that is still little explored in this area is software testing, as VR applications can be built for practically any type of purpose, making it difficult to generalize knowledge to be applied. In this paper, we present an approach that combines metamorphic testing, agent‐based testing and machine learning to test VR applications, focusing on finding collision and camera‐related faults. Our approach proposes the use of metamorphic relations to detect faults in collision and camera components in VR applications, as well as the use of intelligent agents for the automatic generation of test data. To evaluate the proposed approach, we conducted an experimental study on four VR applications, and the results showed an of the solution ranging from 93% to 69%, depending on the complexity of the application tested. We also discussed the feasibility of extending the approach to identify other types of faults in VR applications. In conclusion, we discussed important trends and opportunities that can benefit both academics and practitioners.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135014022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On transforming model‐based tests into code: A systematic literature review 关于将基于模型的测试转换为代码:系统的文献综述

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2023-09-07 DOI: 10.1002/stvr.1860

Fabiano C. Ferrari, Vinicius H. S. Durelli, Sten F. Andler, Jeff Offutt, Mehrdad Saadatmand, Nils Müllner

Model‐based test design is increasingly being applied in practice and studied in research. Model‐based testing (MBT) exploits abstract models of the software behaviour to generate abstract tests, which are then transformed into concrete tests ready to run on the code. Given that abstract tests are designed to cover models but are run on code (after transformation), the effectiveness of MBT is dependent on whether model coverage also ensures coverage of key functional code. In this article, we investigate how MBT approaches generate tests from model specifications and how the coverage of tests designed strictly based on the model translates to code coverage. We used snowballing to conduct a systematic literature review. We started with three primary studies, which we refer to as the initial seeds. At the end of our search iterations, we analysed 30 studies that helped answer our research questions. More specifically, this article characterizes how test sets generated at the model level are mapped and applied to the source code level, discusses how tests are generated from the model specifications, analyses how the test coverage of models relates to the test coverage of the code when the same test set is executed and identifies the technologies and software development tasks that are on focus in the selected studies. Finally, we identify common characteristics and limitations that impact the research and practice of MBT: (i) some studies did not fully describe how tools transform abstract tests into concrete tests, (ii) some studies overlooked the computational cost of model‐based approaches and (iii) some studies found evidence that bears out a robust correlation between decision coverage at the model level and branch coverage at the code level. We also noted that most primary studies omitted essential details about the experiments.

基于模型的试验设计越来越多地应用于实践和研究中。基于模型的测试(MBT)利用软件行为的抽象模型来生成抽象测试，然后将其转换为准备在代码上运行的具体测试。假设抽象测试被设计为覆盖模型，但是在代码上运行(在转换之后)，MBT的有效性依赖于模型覆盖是否也确保了关键功能代码的覆盖。在本文中，我们研究了MBT方法如何从模型规范中生成测试，以及严格基于模型设计的测试的覆盖率如何转换为代码覆盖率。我们用滚雪球法进行了系统的文献综述。我们从三个主要的研究开始，我们称之为最初的种子。在我们的搜索迭代结束时，我们分析了30项研究，这些研究有助于回答我们的研究问题。更具体地说，本文描述了在模型级别生成的测试集是如何映射并应用到源代码级别的，讨论了如何从模型规范生成测试，分析了当执行相同的测试集时，模型的测试覆盖率如何与代码的测试覆盖率相关，并确定了所选研究中关注的技术和软件开发任务。最后，我们确定了影响MBT研究和实践的共同特征和限制:(i)一些研究没有完全描述工具如何将抽象测试转换为具体测试，(ii)一些研究忽略了基于模型的方法的计算成本，(iii)一些研究发现了证据，证明模型级别的决策覆盖率和代码级别的分支覆盖率之间存在强大的相关性。我们还注意到，大多数初级研究忽略了实验的基本细节。

{"title":"On transforming model‐based tests into code: A systematic literature review","authors":"Fabiano C. Ferrari, Vinicius H. S. Durelli, Sten F. Andler, Jeff Offutt, Mehrdad Saadatmand, Nils Müllner","doi":"10.1002/stvr.1860","DOIUrl":"https://doi.org/10.1002/stvr.1860","url":null,"abstract":"Model‐based test design is increasingly being applied in practice and studied in research. Model‐based testing (MBT) exploits abstract models of the software behaviour to generate abstract tests, which are then transformed into concrete tests ready to run on the code. Given that abstract tests are designed to cover models but are run on code (after transformation), the effectiveness of MBT is dependent on whether model coverage also ensures coverage of key functional code. In this article, we investigate how MBT approaches generate tests from model specifications and how the coverage of tests designed strictly based on the model translates to code coverage. We used snowballing to conduct a systematic literature review. We started with three primary studies, which we refer to as the initial seeds. At the end of our search iterations, we analysed 30 studies that helped answer our research questions. More specifically, this article characterizes how test sets generated at the model level are mapped and applied to the source code level, discusses how tests are generated from the model specifications, analyses how the test coverage of models relates to the test coverage of the code when the same test set is executed and identifies the technologies and software development tasks that are on focus in the selected studies. Finally, we identify common characteristics and limitations that impact the research and practice of MBT: (i) some studies did not fully describe how tools transform abstract tests into concrete tests, (ii) some studies overlooked the computational cost of model‐based approaches and (iii) some studies found evidence that bears out a robust correlation between decision coverage at the model level and branch coverage at the code level. We also noted that most primary studies omitted essential details about the experiments.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"54 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73010629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Research on hyper‐level of hyper‐heuristic framework for MOTCP MOTCP超层次超启发式框架研究

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2023-09-03 DOI: 10.1002/stvr.1861

Junxia Guo, Rui Wang, Jinjin Han, Zheng Li

Heuristic algorithms are widely used to solve multi‐objective test case prioritization (MOTCP) problems. However, they perform differently for different test scenarios, which conducts difficulty in applying a suitable algorithm for new test requests in the industry. A concrete hyper‐heuristic framework for MOTCP (HH‐MOTCP) is proposed for addressing this problem. It mainly has two parts: low‐level encapsulating various algorithms and hyper‐level including an evaluation and selection mechanism that dynamically selects low‐level algorithms. This framework performs good but still difficult to keep in the best three. If the evaluation mechanism can more accurately analyse the current results, it will help the selection strategy to find more conducive algorithms for evolution. Meanwhile, if the selection strategy can find a more suitable algorithm for the next generation, the performance of the HH‐MOTCP framework will be better. In this paper, we first propose new strategies for evaluating the current generation results, then perform an extensive study on the selection strategies which decide the heuristic algorithm for the next generation. Experimental results show that the new evaluation and selection strategies proposed in this paper can make the HH‐MOTCP framework more effective and efficient, which makes it almost the best two except for one test object and ahead in about 40% of all test objects.

启发式算法被广泛用于解决多目标测试用例优先级(MOTCP)问题。然而，对于不同的测试场景，它们的表现是不同的，这使得在行业中为新的测试请求应用合适的算法变得困难。提出了一个具体的MOTCP超启发式框架(HH - MOTCP)来解决这个问题。它主要分为两部分:底层封装各种算法，高层包含动态选择底层算法的评估和选择机制。这个框架表现良好，但仍难以保持在前三名之列。如果评估机制能够更准确地分析当前结果，将有助于选择策略找到更有利于进化的算法。同时，如果选择策略能够找到更适合下一代的算法，那么HH‐MOTCP框架的性能将会更好。在本文中，我们首先提出了评估当前生成结果的新策略，然后对决定下一代启发式算法的选择策略进行了广泛的研究。实验结果表明，本文提出的新的评价和选择策略使HH - MOTCP框架更加有效和高效，几乎是除一个测试对象外的最佳两个测试对象，在所有测试对象中领先约40%。

{"title":"Research on hyper‐level of hyper‐heuristic framework for MOTCP","authors":"Junxia Guo, Rui Wang, Jinjin Han, Zheng Li","doi":"10.1002/stvr.1861","DOIUrl":"https://doi.org/10.1002/stvr.1861","url":null,"abstract":"Heuristic algorithms are widely used to solve multi‐objective test case prioritization (MOTCP) problems. However, they perform differently for different test scenarios, which conducts difficulty in applying a suitable algorithm for new test requests in the industry. A concrete hyper‐heuristic framework for MOTCP (HH‐MOTCP) is proposed for addressing this problem. It mainly has two parts: low‐level encapsulating various algorithms and hyper‐level including an evaluation and selection mechanism that dynamically selects low‐level algorithms. This framework performs good but still difficult to keep in the best three. If the evaluation mechanism can more accurately analyse the current results, it will help the selection strategy to find more conducive algorithms for evolution. Meanwhile, if the selection strategy can find a more suitable algorithm for the next generation, the performance of the HH‐MOTCP framework will be better. In this paper, we first propose new strategies for evaluating the current generation results, then perform an extensive study on the selection strategies which decide the heuristic algorithm for the next generation. Experimental results show that the new evaluation and selection strategies proposed in this paper can make the HH‐MOTCP framework more effective and efficient, which makes it almost the best two except for one test object and ahead in about 40% of all test objects.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"15 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87769194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A failed proof can yield a useful test 一个失败的证明可以产生一个有用的测试

4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2023-08-31 DOI: 10.1002/stvr.1859

Li Huang, Bertrand Meyer

Abstract A successful automated program proof is, in software verification, the ultimate triumph. In practice, however, the road to such success is paved with many failed proof attempts. Unlike a failed test, which provides concrete evidence of an actual bug in the program, a failed proof leaves the programmer in the dark. Can we instead learn something useful from it? The work reported here takes advantage of the rich information that some automatic provers internally collect about the program when attempting a proof. If the proof fails, the Proof2Test tool presented in this article uses the counterexample generated by the prover (specifically, the SMT solver underlying the Boogie tool used in the AutoProof system to perform correctness proofs of contract‐equipped Eiffel programs) to produce a failed test, which provides the programmer with immediately exploitable information to correct the program. The discussion presents Proof2Test and the application of the ideas and tool to a collection of representative examples.

在软件验证中，一个成功的自动化程序证明是最终的胜利。然而，在实践中，通往这种成功的道路是由许多失败的证明尝试铺成的。与失败的测试不同，失败的测试提供了程序中实际错误的具体证据，而失败的证明使程序员处于黑暗中。我们能从中学到一些有用的东西吗?这里报告的工作利用了一些自动证明者在尝试证明时内部收集的关于程序的丰富信息。如果证明失败，本文中提供的Proof2Test工具使用由证明者生成的反例(具体来说，是AutoProof系统中使用的Boogie工具底层的SMT求解器，用于执行配备合约的Eiffel程序的正确性证明)来生成失败的测试，这为程序员提供了立即可利用的信息来纠正程序。讨论介绍了Proof2Test及其思想和工具在一系列代表性示例中的应用。

引用次数: 1

Deep neural network supervision and data flow testing 深度神经网络监督和数据流测试

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2023-08-24 DOI: 10.1002/stvr.1862

Y. Le Traon, Tao Xie

,

，

引用次数: 0

Quality assurance for Internet of Things and speech recognition systems 物联网和语音识别系统的质量保证

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2023-07-16 DOI: 10.1002/stvr.1858

Yves Le Traon, Tao Xie

In this issue, we are pleased to present two papers: one for risk assessment for an industrial Internet of Things and the other for testing speech recognition systems. The first paper, ‘ HiRAM: A Hierarchical Risk Assessment Model and Its Implementation for an Industrial Internet of Things in the Cloud ’ by Wen-Lin Sun, Ying-Han Tang and Yu-Lun Huang, proposes Hierarchical Risk Assessment Model (HiRAM) for an IIoT cloud platform to enable self-evaluate its security status by leveraging analytic hierarchy processes (AHPs). The authors also realise HiRAM-RAS, a modular and responsive Risk Assessment System based on HiRAM, and evaluate it in a real-world IIoT cloud platform. The evaluation results show the changes in integrity and availability scores evaluated by HiRAM. (Recommended by Xiaoyin Wang). The second paper, ‘ Adversarial Example-based Test Case Generation for Black-box Speech Recognition Systems ’ by Hanbo Cai, Pengcheng Zhang, Hai Dong, Lars Grunske, Shunhui Ji and Tianhao Yuan, proposes methods for generating targeted adversarial examples for speech recognition systems, based on the firefly algorithm. These methods generate the targeted adversarial samples by continuously adding interference noise to the original speech samples. The evaluation results show that the proposed methods achieve satisfactory results on three speech datasets (Google Command, Common Voice and LibriSpeech), and compared with existing methods, these methods can effectively improve the success rate of the targeted adversarial example generation. (Recommended by Yves Le Traon). We hope that these papers will inspire further research in these directions of quality assurance.

在本期中，我们很高兴地介绍两篇论文:一篇用于工业物联网的风险评估，另一篇用于测试语音识别系统。第一篇论文《HiRAM:工业物联网云中的分层风险评估模型及其实现》由孙文林、唐英涵和黄玉伦撰写，提出了工业物联网云平台的分层风险评估模型(HiRAM)，通过利用层次分析法(AHPs)对其安全状态进行自我评估。作者还实现了基于HiRAM的模块化响应式风险评估系统HiRAM- ras，并在现实世界的IIoT云平台中对其进行了评估。评估结果显示了HiRAM评估的完整性和可用性得分的变化。(推荐人:王小银)第二篇论文《黑箱语音识别系统的基于对抗性示例的测试用例生成》由蔡汉波、张鹏程、东海、Lars Grunske、纪顺辉和袁天豪撰写，提出了基于萤火虫算法生成语音识别系统目标对抗性示例的方法。这些方法通过在原始语音样本中不断添加干扰噪声来生成目标对抗样本。评价结果表明，所提方法在谷歌Command、Common Voice和librisspeech三个语音数据集上取得了满意的结果，与现有方法相比，能有效提高目标对抗样例生成的成功率。(Yves Le Traon推荐)。我们希望这些论文能对这些质量保证方向的进一步研究起到启发作用。

{"title":"Quality assurance for Internet of Things and speech recognition systems","authors":"Yves Le Traon, Tao Xie","doi":"10.1002/stvr.1858","DOIUrl":"https://doi.org/10.1002/stvr.1858","url":null,"abstract":"In this issue, we are pleased to present two papers: one for risk assessment for an industrial Internet of Things and the other for testing speech recognition systems. The first paper, ‘ HiRAM: A Hierarchical Risk Assessment Model and Its Implementation for an Industrial Internet of Things in the Cloud ’ by Wen-Lin Sun, Ying-Han Tang and Yu-Lun Huang, proposes Hierarchical Risk Assessment Model (HiRAM) for an IIoT cloud platform to enable self-evaluate its security status by leveraging analytic hierarchy processes (AHPs). The authors also realise HiRAM-RAS, a modular and responsive Risk Assessment System based on HiRAM, and evaluate it in a real-world IIoT cloud platform. The evaluation results show the changes in integrity and availability scores evaluated by HiRAM. (Recommended by Xiaoyin Wang). The second paper, ‘ Adversarial Example-based Test Case Generation for Black-box Speech Recognition Systems ’ by Hanbo Cai, Pengcheng Zhang, Hai Dong, Lars Grunske, Shunhui Ji and Tianhao Yuan, proposes methods for generating targeted adversarial examples for speech recognition systems, based on the firefly algorithm. These methods generate the targeted adversarial samples by continuously adding interference noise to the original speech samples. The evaluation results show that the proposed methods achieve satisfactory results on three speech datasets (Google Command, Common Voice and LibriSpeech), and compared with existing methods, these methods can effectively improve the success rate of the targeted adversarial example generation. (Recommended by Yves Le Traon). We hope that these papers will inspire further research in these directions of quality assurance.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"23 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84974210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0