We present a technique and automated toolbox for randomized testing of C compilers. Unlike prior compiler‐testing approaches, we generate concurrent test cases in which threads communicate using fine‐grained atomic operations, and we study actual compiler implementations rather than abstract mappings. Our approach is (1) to generate test cases with precise oracles directly from an axiomatization of the C concurrency model; (2) to apply metamorphic fuzzing to each test case, aiming to amplify the coverage they are likely to achieve on compiler codebases; and (3) to execute each fuzzed test case extensively on a range of real machines. Our tool, C4, benefits compiler developers in two ways. First, test cases generated by C4 can achieve line coverage of parts of the LLVM C compiler that are reached by neither the LLVM test suite nor an existing (sequential) C fuzzer. This information can be used to guide further development of the LLVM test suite and can also shed light on where and how concurrency‐related compiler optimizations are implemented. Second, C4 can be used to gain confidence that a compiler implements concurrency correctly. As evidence of this, we show that C4 achieves high strong mutation coverage with respect to a set of concurrency‐related mutants derived from a recent version of LLVM and that it can find historic concurrency‐related bugs in GCC. As a by‐product of concurrency‐focused testing, C4 also revealed two previously unknown sequential compiler bugs in recent versions of GCC and the IBM XL compiler.
{"title":"High‐coverage metamorphic testing of concurrency support in C compilers","authors":"Matt Windsor, A. Donaldson, John Wickerson","doi":"10.1002/stvr.1812","DOIUrl":"https://doi.org/10.1002/stvr.1812","url":null,"abstract":"We present a technique and automated toolbox for randomized testing of C compilers. Unlike prior compiler‐testing approaches, we generate concurrent test cases in which threads communicate using fine‐grained atomic operations, and we study actual compiler implementations rather than abstract mappings. Our approach is (1) to generate test cases with precise oracles directly from an axiomatization of the C concurrency model; (2) to apply metamorphic fuzzing to each test case, aiming to amplify the coverage they are likely to achieve on compiler codebases; and (3) to execute each fuzzed test case extensively on a range of real machines. Our tool, C4, benefits compiler developers in two ways. First, test cases generated by C4 can achieve line coverage of parts of the LLVM C compiler that are reached by neither the LLVM test suite nor an existing (sequential) C fuzzer. This information can be used to guide further development of the LLVM test suite and can also shed light on where and how concurrency‐related compiler optimizations are implemented. Second, C4 can be used to gain confidence that a compiler implements concurrency correctly. As evidence of this, we show that C4 achieves high strong mutation coverage with respect to a set of concurrency‐related mutants derived from a recent version of LLVM and that it can find historic concurrency‐related bugs in GCC. As a by‐product of concurrency‐focused testing, C4 also revealed two previously unknown sequential compiler bugs in recent versions of GCC and the IBM XL compiler.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"48 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79762517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Farewell after an 11‐year journey as joint editor‐in‐chief","authors":"R. Hierons","doi":"10.1002/stvr.1816","DOIUrl":"https://doi.org/10.1002/stvr.1816","url":null,"abstract":"","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"139 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79940594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The first paper, ‘ Towards using coupling measures to guide black-box integration testing in component-based systems ’ concerns integration testing in component-based systems. The authors investigate the correlation between component and interface coupling measures found in literature and the number of observed failures at two architectural levels: the component level and the software interface level. The finding serves as a first step towards an approach for systematic selection of test cases during integration testing of a distributed component-based software system with black-box components. For example, the number of coupled elements may be an indicator for failure-proneness and can be used to guide test case prioritisation during system integration testing; data-flow-based coupling measurements may not capture the nature of an automotive software system and thus are inapplicable; having a grey box model may improve system integration testing. Overall, prioritising testing of highly coupled components/interfaces can be a valid approach for systematic integration testing. ‘ High-coverage metamorphic testing of concurrency C compilers an approach and automated toolbox randomised testing of C compilers, checking whether C compilers concurrency in accordance the expected C11 semantics. ’ experimental results some interesting code relating concurrency, detects fence
{"title":"Integration testing and metamorphic testing","authors":"Yves Le Traon, Tao Xie","doi":"10.1002/stvr.1817","DOIUrl":"https://doi.org/10.1002/stvr.1817","url":null,"abstract":"The first paper, ‘ Towards using coupling measures to guide black-box integration testing in component-based systems ’ concerns integration testing in component-based systems. The authors investigate the correlation between component and interface coupling measures found in literature and the number of observed failures at two architectural levels: the component level and the software interface level. The finding serves as a first step towards an approach for systematic selection of test cases during integration testing of a distributed component-based software system with black-box components. For example, the number of coupled elements may be an indicator for failure-proneness and can be used to guide test case prioritisation during system integration testing; data-flow-based coupling measurements may not capture the nature of an automotive software system and thus are inapplicable; having a grey box model may improve system integration testing. Overall, prioritising testing of highly coupled components/interfaces can be a valid approach for systematic integration testing. ‘ High-coverage metamorphic testing of concurrency C compilers an approach and automated toolbox randomised testing of C compilers, checking whether C compilers concurrency in accordance the expected C11 semantics. ’ experimental results some interesting code relating concurrency, detects fence","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"14 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91274600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Function block diagram (FBD) is a standard programming language for programmable logic controllers (PLCs). PLCs have been widely used to develop safety‐critical systems such as nuclear reactor protection systems. It is crucial to test FBD programs for such systems effectively. This paper presents an automated test sequence generation approach using mutation testing techniques for FBD programs and the developed tool, MuFBDTester. Given an FBD program, MuFBDTester analyses the program and generates mutated programs based on mutation operators. MuFBDTester translates the given program and mutants into the input language of a satisfiability modulo theories (SMT) solver to derive a set of test sequences. The primary objective is to find the test data that can distinguish between the results of the given program and mutants. We conducted experiments with several examples including real industrial cases to evaluate the effectiveness and efficiency of our approach. With the control of test size, the results indicated that the mutation‐based test suites were statistically more effective at revealing artificial faults than structural coverage‐based test suites. Furthermore, the mutation‐based test suites detected more reproduced faults, found in industrial programs, than structural coverage‐based test suites. Compared to structural coverage‐based test generation time, the time required by MuFBDTester to generate one test sequence from industrial programs is approximately 1.3 times longer; however, it is considered to be worth paying the price for high effectiveness. Using MuFBDTester, the manual effort of creating test suites was significantly reduced from days to minutes due to automated test generation. MuFBDTester can provide highly effective test suites for FBD engineers.
{"title":"MuFBDTester: A mutation‐based test sequence generator for FBD programs implementing nuclear power plant software","authors":"Lingjun Liu, Eunkyoung Jee, Doo-Hwan Bae","doi":"10.1002/stvr.1815","DOIUrl":"https://doi.org/10.1002/stvr.1815","url":null,"abstract":"Function block diagram (FBD) is a standard programming language for programmable logic controllers (PLCs). PLCs have been widely used to develop safety‐critical systems such as nuclear reactor protection systems. It is crucial to test FBD programs for such systems effectively. This paper presents an automated test sequence generation approach using mutation testing techniques for FBD programs and the developed tool, MuFBDTester. Given an FBD program, MuFBDTester analyses the program and generates mutated programs based on mutation operators. MuFBDTester translates the given program and mutants into the input language of a satisfiability modulo theories (SMT) solver to derive a set of test sequences. The primary objective is to find the test data that can distinguish between the results of the given program and mutants. We conducted experiments with several examples including real industrial cases to evaluate the effectiveness and efficiency of our approach. With the control of test size, the results indicated that the mutation‐based test suites were statistically more effective at revealing artificial faults than structural coverage‐based test suites. Furthermore, the mutation‐based test suites detected more reproduced faults, found in industrial programs, than structural coverage‐based test suites. Compared to structural coverage‐based test generation time, the time required by MuFBDTester to generate one test sequence from industrial programs is approximately 1.3 times longer; however, it is considered to be worth paying the price for high effectiveness. Using MuFBDTester, the manual effort of creating test suites was significantly reduced from days to minutes due to automated test generation. MuFBDTester can provide highly effective test suites for FBD engineers.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"89 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79473208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This issue contains two papers. The first paper focuses on metamorphic testing and the second one focuses on test automation.Thefirst paper, ‘ Metamorphic relation prioritization for effective regression testing ’ by Madhusudan Srinivasan and Upulee Kanewala, concerns metamorphic testing. Metamorphic testing (MT) is an approach devised to support the testing of software that is untestable in the sense that it is not feasible to determine, in advance, the expected output for a given test input. The basic idea behind MT is that it is sometimes possible to provide a property (metamorphic relation) over multiple test runs that use inputs that are related in some way. A classic example is that we may not know what the cosine of x should be for some arbitrary x but we do know that cos( x ) should be the same as cos( (cid:1) x ). Previous work has proposed the use of multiple metamorphic relations (MRs), but the authors explore how one might prioritize (order) such MRs. Prioritization is based on information regarding a previous version of the software under test. The authors propose two approaches: prioritize on coverage or on fault detection. Optimization is achieved using a greedy algorithm that is sometimes called Additional Greedy. (Recommended by Dan Hao). The second paper, ‘ Improving test automation maturity: A multivocal literature review ’ by Yuqing Wang, Mika V. Mäntylä, Zihao Liu, Jouni Markkula and Päivi Raulamo-jurvanen, presents a multivocal literature review to survey and synthesize the guidelines given in the literature for improving test automation maturity. The authors select and review 81 primary studies (26 academic literature sources and 55 grey literature sources). From these primary studies, the authors extract 26 test automation best practices along with advice on how to conduct these best practices in forms of implementation/improvement approaches, actions, technical techniques, concepts and experience-based opinions. In particular, the literature review results contribute test automation best practices to suggest steps for improving test automation maturity, narrow the gap between practice and research in terms of the industry ’ s need to improve test automation maturity, provide a centralized knowledge base of existing guidelines for test automation maturity improvement and identify related research challenge and opportunities.
这一期有两篇论文。第一篇论文的重点是变形测试,第二篇论文的重点是测试自动化。第一篇论文,Madhusudan Srinivasan和Upulee Kanewala的“有效回归测试的变质关系优先级”,涉及变质测试。变形测试(MT)是一种设计用于支持不可测试的软件测试的方法,因为预先确定给定测试输入的预期输出是不可可行的。MT背后的基本思想是,有时可以在使用以某种方式相关的输入的多个测试运行中提供一个属性(变质关系)。一个经典的例子是我们可能不知道cos(x)对于任意的x应该是什么但是我们知道cos(x)应该和cos((cid:1) x)是一样的。先前的工作已经提出使用多重变形关系(MRs),但是作者探索了如何对这样的MRs进行优先级排序(排序)。优先级排序是基于有关被测软件的先前版本的信息。作者提出了两种方法:优先考虑覆盖或优先考虑故障检测。优化是通过贪心算法实现的,这种算法有时被称为附加贪心算法。(郝丹推荐)。第二篇论文,“提高测试自动化成熟度:多声音文献综述”,由王玉青、Mika V. Mäntylä、刘子好、Jouni Markkula和Päivi Raulamo-jurvanen撰写,提出了一篇多声音文献综述,以调查和综合文献中给出的提高测试自动化成熟度的指导方针。作者选择并回顾了81项主要研究(26项学术文献来源和55项灰色文献来源)。从这些主要的研究中,作者提取了26个测试自动化最佳实践,以及关于如何以实现/改进方法、行动、技术技巧、概念和基于经验的意见的形式执行这些最佳实践的建议。特别是,文献综述的结果提供了测试自动化最佳实践,以建议提高测试自动化成熟度的步骤,缩小了实践与研究之间的差距,就行业提高测试自动化成熟度的需求而言,为测试自动化成熟度改进提供了一个集中的知识库,并确定了相关的研究挑战和机遇。
{"title":"Metamorphic testing and test automation","authors":"R. Hierons, Tao Xie","doi":"10.1002/stvr.1814","DOIUrl":"https://doi.org/10.1002/stvr.1814","url":null,"abstract":"This issue contains two papers. The first paper focuses on metamorphic testing and the second one focuses on test automation.Thefirst paper, ‘ Metamorphic relation prioritization for effective regression testing ’ by Madhusudan Srinivasan and Upulee Kanewala, concerns metamorphic testing. Metamorphic testing (MT) is an approach devised to support the testing of software that is untestable in the sense that it is not feasible to determine, in advance, the expected output for a given test input. The basic idea behind MT is that it is sometimes possible to provide a property (metamorphic relation) over multiple test runs that use inputs that are related in some way. A classic example is that we may not know what the cosine of x should be for some arbitrary x but we do know that cos( x ) should be the same as cos( (cid:1) x ). Previous work has proposed the use of multiple metamorphic relations (MRs), but the authors explore how one might prioritize (order) such MRs. Prioritization is based on information regarding a previous version of the software under test. The authors propose two approaches: prioritize on coverage or on fault detection. Optimization is achieved using a greedy algorithm that is sometimes called Additional Greedy. (Recommended by Dan Hao). The second paper, ‘ Improving test automation maturity: A multivocal literature review ’ by Yuqing Wang, Mika V. Mäntylä, Zihao Liu, Jouni Markkula and Päivi Raulamo-jurvanen, presents a multivocal literature review to survey and synthesize the guidelines given in the literature for improving test automation maturity. The authors select and review 81 primary studies (26 academic literature sources and 55 grey literature sources). From these primary studies, the authors extract 26 test automation best practices along with advice on how to conduct these best practices in forms of implementation/improvement approaches, actions, technical techniques, concepts and experience-based opinions. In particular, the literature review results contribute test automation best practices to suggest steps for improving test automation maturity, narrow the gap between practice and research in terms of the industry ’ s need to improve test automation maturity, provide a centralized knowledge base of existing guidelines for test automation maturity improvement and identify related research challenge and opportunities.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"8 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79357656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Cabral, Breno Miranda, Igor Lima, Marcelo d’Amorim
Runtime verification (RV) helps to find software bugs by monitoring formally specified properties during testing. A key problem in using RV during testing is how to reduce the manual inspection effort for checking whether property violations are true bugs. To date, there was no automated approach for determining the likelihood that property violations were true bugs to reduce tedious and time‐consuming manual inspection. We present RVprio, the first automated approach for prioritizing RV violations in order of likelihood of being true bugs. RVprio uses machine learning classifiers to prioritize violations. For training, we used a labelled dataset of 1170 violations from 110 projects. On that dataset, (1) RVprio reached 90% of the effectiveness of a theoretically optimal prioritizer that ranks all true bugs at the top of the ranked list, and (2) 88.1% of true bugs were in the top 25% of RVprio‐ranked violations; 32.7% of true bugs were in the top 10%. RVprio was also effective when we applied it to new unlabelled violations, from which we found previously unknown bugs—54 bugs in 8 open‐source projects. Our dataset is publicly available online.
{"title":"RVprio: A tool for prioritizing runtime verification violations","authors":"Lucas Cabral, Breno Miranda, Igor Lima, Marcelo d’Amorim","doi":"10.1002/stvr.1813","DOIUrl":"https://doi.org/10.1002/stvr.1813","url":null,"abstract":"Runtime verification (RV) helps to find software bugs by monitoring formally specified properties during testing. A key problem in using RV during testing is how to reduce the manual inspection effort for checking whether property violations are true bugs. To date, there was no automated approach for determining the likelihood that property violations were true bugs to reduce tedious and time‐consuming manual inspection. We present RVprio, the first automated approach for prioritizing RV violations in order of likelihood of being true bugs. RVprio uses machine learning classifiers to prioritize violations. For training, we used a labelled dataset of 1170 violations from 110 projects. On that dataset, (1) RVprio reached 90% of the effectiveness of a theoretically optimal prioritizer that ranks all true bugs at the top of the ranked list, and (2) 88.1% of true bugs were in the top 25% of RVprio‐ranked violations; 32.7% of true bugs were in the top 10%. RVprio was also effective when we applied it to new unlabelled violations, from which we found previously unknown bugs—54 bugs in 8 open‐source projects. Our dataset is publicly available online.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"70 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83410992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dominik Hellhake, J. Bogner, Tobias Schmid, S. Wagner
In component‐based software development, integration testing is a crucial step in verifying the composite behaviour of a system. However, very few formally or empirically validated approaches are available for systematically testing if components have been successfully integrated. In practice, integration testing of component‐based systems is usually performed in a time‐ and resource‐limited context, which further increases the demand for effective test selection strategies. In this work, we therefore analyse the relationship between different component and interface coupling measures found in literature and the distribution of failures found during integration testing of an automotive system. By investigating the correlation for each measure at two architectural levels, we discuss its usefulness to guide integration testing at the software component level as well as for the hardware component level where coupling is measured among multiple electronic control units (ECUs) of a vehicle. Our results indicate that there is a positive correlation between coupling measures and failure‐proneness at both architectural level for all tested measures. However, at the hardware component level, all measures achieved a significantly higher correlation when compared to the software‐level correlation. Consequently, we conclude that prioritizing testing of highly coupled components and interfaces is a valid approach for systematic integration testing, as coupling proved to be a valid indicator for failure‐proneness.
{"title":"Towards using coupling measures to guide black‐box integration testing in component‐based systems","authors":"Dominik Hellhake, J. Bogner, Tobias Schmid, S. Wagner","doi":"10.1002/stvr.1811","DOIUrl":"https://doi.org/10.1002/stvr.1811","url":null,"abstract":"In component‐based software development, integration testing is a crucial step in verifying the composite behaviour of a system. However, very few formally or empirically validated approaches are available for systematically testing if components have been successfully integrated. In practice, integration testing of component‐based systems is usually performed in a time‐ and resource‐limited context, which further increases the demand for effective test selection strategies. In this work, we therefore analyse the relationship between different component and interface coupling measures found in literature and the distribution of failures found during integration testing of an automotive system. By investigating the correlation for each measure at two architectural levels, we discuss its usefulness to guide integration testing at the software component level as well as for the hardware component level where coupling is measured among multiple electronic control units (ECUs) of a vehicle. Our results indicate that there is a positive correlation between coupling measures and failure‐proneness at both architectural level for all tested measures. However, at the hardware component level, all measures achieved a significantly higher correlation when compared to the software‐level correlation. Consequently, we conclude that prioritizing testing of highly coupled components and interfaces is a valid approach for systematic integration testing, as coupling proved to be a valid indicator for failure‐proneness.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"57 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72686650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuqing Wang, M. Mäntylä, Zihao Liu, Jouni Markkula, Päivi Raulamo-Jurvanen
Mature test automation is key for achieving software quality at speed. In this paper, we present a multivocal literature review with the objective to survey and synthesize the guidelines given in the literature for improving test automation maturity. We selected and reviewed 81 primary studies, consisting of 26 academic literature and 55 grey literature sources. From primary studies, we extracted 26 test automation best practices (e.g., Define an effective test automation strategy, Set up good test environments, and Develop high‐quality test scripts) and collected many pieces of advice (e.g., in forms of implementation/improvement approaches, technical techniques, concepts, and experience‐based heuristics) on how to conduct these best practices. We made main observations: (1) There are only six best practices whose positive effect on maturity improvement have been evaluated by academic studies using formal empirical methods; (2) several technical related best practices in this MLR were not presented in test maturity models; (3) some best practices can be linked to success factors and maturity impediments proposed by other scholars; (4) most pieces of advice on how to conduct proposed best practices were identified from experience studies and their effectiveness need to be further evaluated with cross‐site empirical evidence using formal empirical methods; (5) in the literature, some advice on how to conduct certain best practices are conflicting, and some advice on how to conduct certain best practices still need further qualitative analysis.
{"title":"Improving test automation maturity: A multivocal literature review","authors":"Yuqing Wang, M. Mäntylä, Zihao Liu, Jouni Markkula, Päivi Raulamo-Jurvanen","doi":"10.1002/stvr.1804","DOIUrl":"https://doi.org/10.1002/stvr.1804","url":null,"abstract":"Mature test automation is key for achieving software quality at speed. In this paper, we present a multivocal literature review with the objective to survey and synthesize the guidelines given in the literature for improving test automation maturity. We selected and reviewed 81 primary studies, consisting of 26 academic literature and 55 grey literature sources. From primary studies, we extracted 26 test automation best practices (e.g., Define an effective test automation strategy, Set up good test environments, and Develop high‐quality test scripts) and collected many pieces of advice (e.g., in forms of implementation/improvement approaches, technical techniques, concepts, and experience‐based heuristics) on how to conduct these best practices. We made main observations: (1) There are only six best practices whose positive effect on maturity improvement have been evaluated by academic studies using formal empirical methods; (2) several technical related best practices in this MLR were not presented in test maturity models; (3) some best practices can be linked to success factors and maturity impediments proposed by other scholars; (4) most pieces of advice on how to conduct proposed best practices were identified from experience studies and their effectiveness need to be further evaluated with cross‐site empirical evidence using formal empirical methods; (5) in the literature, some advice on how to conduct certain best practices are conflicting, and some advice on how to conduct certain best practices still need further qualitative analysis.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"14 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86890026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This issue contains two papers. The first paper focuses on combinatorial testing and the second one focuses on model-based testing. The first paper, ‘Combinatorial methods for testing Internet of Things smart home systems’ by Bernhard Garn, Dominik-Philip Schreiber, Dimitris E. Simos, Rick Kuhn, Jeff Voas, and Raghu Kacker, presents an approach for applying combinatorial testing (CT) to the internal configuration and functionality of Internet of Things (IoT) home automation hub systems. The authors first create an input parameter model of an IoT home automation hub system for use with test generation strategies of combinatorial testing and then propose an automated test execution framework and two test oracles for evaluation purposes. The proposed approach makes use of the appropriately formulated model of the hub and generates test sets derived from this model satisfying certain combinatorial coverage conditions. The authors conduct an evaluation of the proposed approach on a real-world IoT system. The evaluation results show that the proposed approach reveals multiple errors in the devices under test, and all approaches under comparison perform nearly equally well (recommended by W. K. Chan). The second paper, ‘Effective grey-box testing with partial FSM models’ by Robert Sachtleben and Jan Peleska, explores the problem of testing from a finite state machine (FSM) and considers the scenario in which an input can be enabled in some states and disabled in other states. There is already a body of work on testing from FSMs in which inputs are not always defined (partial FSMs), but such work typically allows the system under test (SUT) to be such that some inputs are defined in a state of the SUT but are not defined in the corresponding state of the specification FSM (the SUT can be ‘more’ defined). The paper introduces a conformance relation, called strong reduction, that requires that exactly the same inputs are defined in the specification and the SUT. A new test generation technique is given for strong reduction, with this returning test suites that are complete: a test suite is guaranteed to fail if the SUT is faulty and also satisfies certain conditions that place an upper bound on the number of states of the SUT. The overall approach also requires that the tester can determine which inputs are enabled in the current state of the SUT and so testing is grey-box (recommended by Helene Waeselynck).
这一期有两篇论文。第一篇论文主要研究组合测试,第二篇论文主要研究基于模型的测试。第一篇论文“测试物联网智能家居系统的组合方法”由Bernhard Garn, Dominik-Philip Schreiber, Dimitris E. Simos, Rick Kuhn, Jeff Voas和Raghu Kacker撰写,提出了一种将组合测试(CT)应用于物联网(IoT)家庭自动化中心系统的内部配置和功能的方法。作者首先创建了物联网家庭自动化中心系统的输入参数模型,用于组合测试的测试生成策略,然后提出了一个自动化测试执行框架和两个测试预言机,用于评估目的。该方法利用适当表述的轮毂模型,由该模型生成满足一定组合覆盖条件的测试集。作者在现实世界的物联网系统上对所提出的方法进行了评估。评估结果表明,所提出的方法揭示了被测设备中的多个错误,并且所有比较方法的性能几乎相同(由w.k. Chan推荐)。第二篇论文,由Robert Sachtleben和Jan Peleska撰写的“部分FSM模型的有效灰盒测试”,探讨了从有限状态机(FSM)进行测试的问题,并考虑了输入可以在某些状态下启用而在其他状态下禁用的场景。已经有大量的FSM测试工作,其中输入并不总是定义的(部分FSM),但是这样的工作通常允许被测系统(SUT)是这样的,一些输入在SUT的状态下定义,而不是在规范FSM的相应状态下定义(SUT可以“更多”定义)。本文引入了一种一致性关系,称为强约简,它要求在规范和SUT中定义完全相同的输入。本文给出了一种新的测试生成技术用于强还原,它返回的测试套件是完整的:如果SUT有故障,测试套件就保证失败,并且还满足了SUT状态数量的上界的某些条件。总体方法还要求测试人员能够确定在SUT的当前状态下启用了哪些输入,因此测试是灰盒测试(由Helene Waeselynck推荐)。
{"title":"Combinatorial testing and model‐based testing","authors":"R. Hierons, Tao Xie","doi":"10.1002/stvr.1810","DOIUrl":"https://doi.org/10.1002/stvr.1810","url":null,"abstract":"This issue contains two papers. The first paper focuses on combinatorial testing and the second one focuses on model-based testing. The first paper, ‘Combinatorial methods for testing Internet of Things smart home systems’ by Bernhard Garn, Dominik-Philip Schreiber, Dimitris E. Simos, Rick Kuhn, Jeff Voas, and Raghu Kacker, presents an approach for applying combinatorial testing (CT) to the internal configuration and functionality of Internet of Things (IoT) home automation hub systems. The authors first create an input parameter model of an IoT home automation hub system for use with test generation strategies of combinatorial testing and then propose an automated test execution framework and two test oracles for evaluation purposes. The proposed approach makes use of the appropriately formulated model of the hub and generates test sets derived from this model satisfying certain combinatorial coverage conditions. The authors conduct an evaluation of the proposed approach on a real-world IoT system. The evaluation results show that the proposed approach reveals multiple errors in the devices under test, and all approaches under comparison perform nearly equally well (recommended by W. K. Chan). The second paper, ‘Effective grey-box testing with partial FSM models’ by Robert Sachtleben and Jan Peleska, explores the problem of testing from a finite state machine (FSM) and considers the scenario in which an input can be enabled in some states and disabled in other states. There is already a body of work on testing from FSMs in which inputs are not always defined (partial FSMs), but such work typically allows the system under test (SUT) to be such that some inputs are defined in a state of the SUT but are not defined in the corresponding state of the specification FSM (the SUT can be ‘more’ defined). The paper introduces a conformance relation, called strong reduction, that requires that exactly the same inputs are defined in the specification and the SUT. A new test generation technique is given for strong reduction, with this returning test suites that are complete: a test suite is guaranteed to fail if the SUT is faulty and also satisfies certain conditions that place an upper bound on the number of states of the SUT. The overall approach also requires that the tester can determine which inputs are enabled in the current state of the SUT and so testing is grey-box (recommended by Helene Waeselynck).","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"6 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75275755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Davide Corradini, Amedeo Zampieri, Michele Pasqua, Emanuele Viglianisi, Michael Dallago, M. Ceccato
RESTful APIs (or REST APIs for short) represent a mainstream approach to design and develop web APIs using the REpresentational State Transfer architectural style. Black‐box testing, which assumes only the access to the system under test with a specific interface, is the only viable option when white‐box testing is impracticable. This is the case for REST APIs: their source code is usually not (or just partially) available, or a white‐box analysis across many dynamically allocated distributed components (typical of a micro‐services architecture) is computationally challenging. This paper presents RestTestGen, a novel black‐box approach to automatically generate test cases for REST APIs, based on their interface definition (an OpenAPI specification). Input values and requests are generated for each operation of the API under test with the twofold objective of testing nominal execution scenarios and error scenarios. Two distinct oracles are deployed to detect when test cases reveal implementation defects. While this approach is mainly targeting the research community, it is also of interest to developers because, as a black‐box approach, it is universally applicable across different programming languages, or in the case external (compiled only) libraries are used in a REST API. The validation of our approach has been performed on more than 100 of real‐world REST APIs, highlighting the effectiveness of the approach in revealing actual faults in already deployed services.
{"title":"Automated black‐box testing of nominal and error scenarios in RESTful APIs","authors":"Davide Corradini, Amedeo Zampieri, Michele Pasqua, Emanuele Viglianisi, Michael Dallago, M. Ceccato","doi":"10.1002/stvr.1808","DOIUrl":"https://doi.org/10.1002/stvr.1808","url":null,"abstract":"RESTful APIs (or REST APIs for short) represent a mainstream approach to design and develop web APIs using the REpresentational State Transfer architectural style. Black‐box testing, which assumes only the access to the system under test with a specific interface, is the only viable option when white‐box testing is impracticable. This is the case for REST APIs: their source code is usually not (or just partially) available, or a white‐box analysis across many dynamically allocated distributed components (typical of a micro‐services architecture) is computationally challenging. This paper presents RestTestGen, a novel black‐box approach to automatically generate test cases for REST APIs, based on their interface definition (an OpenAPI specification). Input values and requests are generated for each operation of the API under test with the twofold objective of testing nominal execution scenarios and error scenarios. Two distinct oracles are deployed to detect when test cases reveal implementation defects. While this approach is mainly targeting the research community, it is also of interest to developers because, as a black‐box approach, it is universally applicable across different programming languages, or in the case external (compiled only) libraries are used in a REST API. The validation of our approach has been performed on more than 100 of real‐world REST APIs, highlighting the effectiveness of the approach in revealing actual faults in already deployed services.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"73 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2022-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73685752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}