We present mechanisms to specify and efficiently check, at runtime, assertions that express structural properties and aggregate measures of dynamically manipulated linkedlist data structures. Checking assertions involving the structure, disjointness, and aggregation measures on lists and list segments typically requires linear or quadratic time in the size of the heap. Our main contribution is an incrementalization instrumentation that tracks properties of data structures dynamically as the program executes and leads to orders of magnitude speedup in assertion checking in many scenarios. Our incrementalization incurs a constant overhead on updates to list structures but enables checking assertions in constant time, independent of the size of the heap. We define a general class of functions on lists, called linear measures, which are amenable to our incrementalization technique. We demonstrate the effectiveness of our technique by showing orders of magnitude speedup in two scenarios: one scenario stemming from assertions at the level of APIs of list-manipulating libraries and the other scenario stemming from providing dynamic detection of security attacks caused by malicious rootkits.
{"title":"Efficient Incrementalized Runtime Checking of Linear Measures on Lists","authors":"A. Gyori, P. Garg, E. Pek, P. Madhusudan","doi":"10.1109/ICST.2017.35","DOIUrl":"https://doi.org/10.1109/ICST.2017.35","url":null,"abstract":"We present mechanisms to specify and efficiently check, at runtime, assertions that express structural properties and aggregate measures of dynamically manipulated linkedlist data structures. Checking assertions involving the structure, disjointness, and aggregation measures on lists and list segments typically requires linear or quadratic time in the size of the heap. Our main contribution is an incrementalization instrumentation that tracks properties of data structures dynamically as the program executes and leads to orders of magnitude speedup in assertion checking in many scenarios. Our incrementalization incurs a constant overhead on updates to list structures but enables checking assertions in constant time, independent of the size of the heap. We define a general class of functions on lists, called linear measures, which are amenable to our incrementalization technique. We demonstrate the effectiveness of our technique by showing orders of magnitude speedup in two scenarios: one scenario stemming from assertions at the level of APIs of list-manipulating libraries and the other scenario stemming from providing dynamic detection of security attacks caused by malicious rootkits.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130284333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Testing and debugging is one of the most expensive and challenging phases in the software development life-cycle. One important cost factor in the debugging process is the time required to analyze failures and repair underlying faults. Two types of methods that can help testers to reduce this analysis time are Failure Clustering and Fault Localization. Although there is a plethora of these methods in the literature, there are still some gaps that prevent their operationalization in real world contexts. In addition, the abundance of these methods confuses the practitioners in selecting the suitable method for their own specific domain. To fill the gaps and bring state-of-art closer to practice, we develop a framework for failure diagnosis. To devise this framework, we evaluate existing methods to investigate the possibility of ?nding a method(s) that would be effective in different contexts. Then, we introduce a methodology for adapting this method(s) to different contexts with a priori parameter setting. This framework will empower practitioners to do fast and reliable debugging.
{"title":"A Framework for Failure Diagnosis","authors":"Mojdeh Golagha","doi":"10.1109/ICST.2017.75","DOIUrl":"https://doi.org/10.1109/ICST.2017.75","url":null,"abstract":"Testing and debugging is one of the most expensive and challenging phases in the software development life-cycle. One important cost factor in the debugging process is the time required to analyze failures and repair underlying faults. Two types of methods that can help testers to reduce this analysis time are Failure Clustering and Fault Localization. Although there is a plethora of these methods in the literature, there are still some gaps that prevent their operationalization in real world contexts. In addition, the abundance of these methods confuses the practitioners in selecting the suitable method for their own specific domain. To fill the gaps and bring state-of-art closer to practice, we develop a framework for failure diagnosis. To devise this framework, we evaluate existing methods to investigate the possibility of ?nding a method(s) that would be effective in different contexts. Then, we introduce a methodology for adapting this method(s) to different contexts with a priori parameter setting. This framework will empower practitioners to do fast and reliable debugging.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129715433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the SAGA toolbox. It centers around development of tests, and analysis of test results, on Guarded Assertions (GA) format. Such a test defines when to test, and what to expect in such a state. The SAGA toolbox lets the user describe the test, and at the same time get immediate feedback on the test result based on a trace from the System Under Test (SUT). The feedback is visual using plots of the trace. This enables the test engineer to play around with the data and use an agile development method, since the data is already there. Moreover, the SAGA toolbox also enables the test engineer to change test stimuli plots to study the effect they have on a test. It can later generate computer programs that can feed these test stimuli to the SUT. This enables an interactive feedback loop, where immediate feedback on changes to the test, or to the test stimuli, indicate whether the test is correct and it passed or failed.
本文介绍了SAGA工具箱。它以保护断言(GA)格式的测试开发和测试结果分析为中心。这样的测试定义了何时进行测试,以及在这种状态下会发生什么。SAGA工具箱允许用户描述测试,同时根据来自被测系统(System Under test, SUT)的跟踪获得测试结果的即时反馈。使用轨迹图的反馈是可视化的。这使得测试工程师能够处理数据并使用敏捷开发方法,因为数据已经存在了。此外,SAGA工具箱还使测试工程师能够更改测试刺激图,以研究它们对测试的影响。之后,它可以生成计算机程序,将这些测试刺激输入SUT。这就实现了一个交互式反馈循环,在这个循环中,对测试或测试刺激的变化的即时反馈表明测试是正确的,通过了还是失败了。
{"title":"SAGA Toolbox: Interactive Testing of Guarded Assertions","authors":"Daniel Flemström, T. Gustafsson, A. Kobetski","doi":"10.1109/ICST.2017.59","DOIUrl":"https://doi.org/10.1109/ICST.2017.59","url":null,"abstract":"This paper presents the SAGA toolbox. It centers around development of tests, and analysis of test results, on Guarded Assertions (GA) format. Such a test defines when to test, and what to expect in such a state. The SAGA toolbox lets the user describe the test, and at the same time get immediate feedback on the test result based on a trace from the System Under Test (SUT). The feedback is visual using plots of the trace. This enables the test engineer to play around with the data and use an agile development method, since the data is already there. Moreover, the SAGA toolbox also enables the test engineer to change test stimuli plots to study the effect they have on a test. It can later generate computer programs that can feed these test stimuli to the SUT. This enables an interactive feedback loop, where immediate feedback on changes to the test, or to the test stimuli, indicate whether the test is correct and it passed or failed.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121620227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Marcozzi, Mickaël Delahaye, Sébastien Bardin, N. Kosmatov, V. Prevosto
A large amount of research has been carried out to automate white-box testing. While a wide range of different and sometimes heterogeneous code-coverage criteria have been proposed, there exists no generic formalism to describe them all, and available test automation tools usually support only a small subset of them. We introduce a new specification language, called HTOL (Hyperlabel Test Objectives Language), providing a powerful generic mechanism to define a wide range of test objectives. HTOL comes with a formal semantics, and can encode all standard criteria but full mutations. Besides specification, HTOL is appealing in the context of test automation as it allows handling criteria in a unified way.
{"title":"Generic and Effective Specification of Structural Test Objectives","authors":"M. Marcozzi, Mickaël Delahaye, Sébastien Bardin, N. Kosmatov, V. Prevosto","doi":"10.1109/ICST.2017.48","DOIUrl":"https://doi.org/10.1109/ICST.2017.48","url":null,"abstract":"A large amount of research has been carried out to automate white-box testing. While a wide range of different and sometimes heterogeneous code-coverage criteria have been proposed, there exists no generic formalism to describe them all, and available test automation tools usually support only a small subset of them. We introduce a new specification language, called HTOL (Hyperlabel Test Objectives Language), providing a powerful generic mechanism to define a wide range of test objectives. HTOL comes with a formal semantics, and can encode all standard criteria but full mutations. Besides specification, HTOL is appealing in the context of test automation as it allows handling criteria in a unified way.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125529180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We can never be certain that a software system is correct simply by testing it, but with every additional successful test we become less uncertain about its correctness. In absence of source code or elaborate specifications and models, tests are usually generated or chosen randomly. However, rather than randomly choosing tests, it would be preferable to choose those tests that decrease our uncertainty about correctness the most. In order to guide test generation, we apply what is referred to in Machine Learning as "Query Strategy Framework": We infer a behavioural model of the system under test and select those tests which the inferred model is "least certain" about. Running these tests on the system under test thus directly targets those parts about which tests so far have failed to inform the model. We provide an implementation that uses a genetic programming engine for model inference in order to enable an uncertainty sampling technique known as "query by committee", and evaluate it on eight subject systems from the Apache Commons Math framework and JodaTime. The results indicate that test generation using uncertainty sampling outperforms conventional and Adaptive Random Testing.
{"title":"Uncertainty-Driven Black-Box Test Data Generation","authors":"Neil Walkinshaw, G. Fraser","doi":"10.1109/ICST.2017.30","DOIUrl":"https://doi.org/10.1109/ICST.2017.30","url":null,"abstract":"We can never be certain that a software system is correct simply by testing it, but with every additional successful test we become less uncertain about its correctness. In absence of source code or elaborate specifications and models, tests are usually generated or chosen randomly. However, rather than randomly choosing tests, it would be preferable to choose those tests that decrease our uncertainty about correctness the most. In order to guide test generation, we apply what is referred to in Machine Learning as \"Query Strategy Framework\": We infer a behavioural model of the system under test and select those tests which the inferred model is \"least certain\" about. Running these tests on the system under test thus directly targets those parts about which tests so far have failed to inform the model. We provide an implementation that uses a genetic programming engine for model inference in order to enable an uncertainty sampling technique known as \"query by committee\", and evaluate it on eight subject systems from the Apache Commons Math framework and JodaTime. The results indicate that test generation using uncertainty sampling outperforms conventional and Adaptive Random Testing.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126395080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Laurent, Anthony Ventresque, Mike Papadakis, Christopher Henard, Yves Le Traon
Mutation testing is extensively used in software testing studies. However, popular mutation testing tools use a restrictive set of mutants which does not conform to the community standards and mutation testing literature. This can be problematic since the effectiveness of mutation strongly depends on the used mutants. To investigate this issue we form an extended set of mutants and implement it on a popular mutation testing tool named PIT. We then show that in real-world projects the original mutants of PIT are easier to kill and lead to tests that score statistically lower than those of the extended set of mutants for a range of 35% to 70% of the studied classes. These results raise serious concerns regarding the validity of mutation-based experiments that use PIT. To further show the strengths of the extended mutants we also performed an analysis using a benchmark with mutation-adequate test cases and identified equivalent mutants. Our results confirmed that the extended mutants are more effective than a) the original version of PIT and b) two other popular mutation testing tools (major and muJava). In particular, our results demonstrate that the extended mutants are more effective by 23%, 12% and 7% than the mutants of the original PIT, major and muJava. They also show that the extended mutants are at least as strong as the mutants of all the other three tools together. To support future research, we make the new version of PIT, which is equipped with the extended mutants, publicly available.
{"title":"Assessing and Improving the Mutation Testing Practice of PIT","authors":"Thomas Laurent, Anthony Ventresque, Mike Papadakis, Christopher Henard, Yves Le Traon","doi":"10.1109/ICST.2017.47","DOIUrl":"https://doi.org/10.1109/ICST.2017.47","url":null,"abstract":"Mutation testing is extensively used in software testing studies. However, popular mutation testing tools use a restrictive set of mutants which does not conform to the community standards and mutation testing literature. This can be problematic since the effectiveness of mutation strongly depends on the used mutants. To investigate this issue we form an extended set of mutants and implement it on a popular mutation testing tool named PIT. We then show that in real-world projects the original mutants of PIT are easier to kill and lead to tests that score statistically lower than those of the extended set of mutants for a range of 35% to 70% of the studied classes. These results raise serious concerns regarding the validity of mutation-based experiments that use PIT. To further show the strengths of the extended mutants we also performed an analysis using a benchmark with mutation-adequate test cases and identified equivalent mutants. Our results confirmed that the extended mutants are more effective than a) the original version of PIT and b) two other popular mutation testing tools (major and muJava). In particular, our results demonstrate that the extended mutants are more effective by 23%, 12% and 7% than the mutants of the original PIT, major and muJava. They also show that the extended mutants are at least as strong as the mutants of all the other three tools together. To support future research, we make the new version of PIT, which is equipped with the extended mutants, publicly available.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125659390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Search-based test generation, if effective at fault detection, can lower the cost of testing. Such techniques rely on fitness functions to guide the search. Ultimately, such functions represent test goals that approximate — but do not ensure — fault detection. The need to rely on approximations leads to two questions — can fitness functions produce effective tests and, if so, which should be used to generate tests? To answer these questions, we have assessed the fault-detection capabilities of the EvoSuite framework and eight of its fitness functions on 353 real faults from the Defects4J database. Our analysis has found that the strongest indicator of effectiveness is a high level of code coverage. Consequently, the branch coverage fitness function is the most effective. Our findings indicate that fitness functions that thoroughly explore system structure should be used as primary generation objectives — supported by secondary fitness functions that vary the scenarios explored.
{"title":"The Fitness Function for the Job: Search-Based Generation of Test Suites That Detect Real Faults","authors":"Gregory Gay","doi":"10.1109/ICST.2017.38","DOIUrl":"https://doi.org/10.1109/ICST.2017.38","url":null,"abstract":"Search-based test generation, if effective at fault detection, can lower the cost of testing. Such techniques rely on fitness functions to guide the search. Ultimately, such functions represent test goals that approximate — but do not ensure — fault detection. The need to rely on approximations leads to two questions — can fitness functions produce effective tests and, if so, which should be used to generate tests? To answer these questions, we have assessed the fault-detection capabilities of the EvoSuite framework and eight of its fitness functions on 353 real faults from the Defects4J database. Our analysis has found that the strongest indicator of effectiveness is a high level of code coverage. Consequently, the branch coverage fitness function is the most effective. Our findings indicate that fitness functions that thoroughly explore system structure should be used as primary generation objectives — supported by secondary fitness functions that vary the scenarios explored.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123538705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}