The use of search algorithms for test data generation has seen many successful results. For structural criteria such as branch coverage, heuristics have been designed to help the search. The most common heuristic is the use of approach level (usually represented with an integer) to reward test cases whose executions get close (in the control flow graph)to the target branch. To solve the constraints of the predicates in the control flow graph, the branch distance is commonly employed. These two measures are linearly combined. Because the approach level is more important, the branch distance is normalised, often in the range [0,1]. In this paper, we analyse different types of normalising functions. We found out that the one that is usually employed in the literature has several flaws. We hence propose a different normalizing function that is very simple and that does not suffer of these limitations. We carried out empirical and analytical analyses to compare these two functions. In particular, we studied their effect on two commonly used search algorithms, namely Simulated Annealing and Genetic Algorithms.
{"title":"It Does Matter How You Normalise the Branch Distance in Search Based Software Testing","authors":"Andrea Arcuri","doi":"10.1109/ICST.2010.17","DOIUrl":"https://doi.org/10.1109/ICST.2010.17","url":null,"abstract":"The use of search algorithms for test data generation has seen many successful results. For structural criteria such as branch coverage, heuristics have been designed to help the search. The most common heuristic is the use of approach level (usually represented with an integer) to reward test cases whose executions get close (in the control flow graph)to the target branch. To solve the constraints of the predicates in the control flow graph, the branch distance is commonly employed. These two measures are linearly combined. Because the approach level is more important, the branch distance is normalised, often in the range [0,1]. In this paper, we analyse different types of normalising functions. We found out that the one that is usually employed in the literature has several flaws. We hence propose a different normalizing function that is very simple and that does not suffer of these limitations. We carried out empirical and analytical analyses to compare these two functions. In particular, we studied their effect on two commonly used search algorithms, namely Simulated Annealing and Genetic Algorithms.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128741376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mobile computing systems provide new challenges for verification. One of them is the dynamicity of the system structure, with mobility-induced connections and disconnections, dynamic creation and shutdown of nodes. Interaction scenarios have then to consider the spatial configuration of the nodes as a first class concept. This paper presents GraphSeq, a graph matching tool for sequences of configurations developed in the framework of testing research. It aims to analyze test traces to identify occurrences of the successive spatial configurations described in an abstract scenario. We present the GraphSeq algorithm, as well as first experiments using randomly generated graphs, outputs from a mobility simulator, and test traces from a case study in ad hoc networks.
{"title":"GraphSeq: A Graph Matching Tool for the Extraction of Mobility Patterns","authors":"Minh Duc Nguyen, H. Waeselynck, Nicolas Rivière","doi":"10.1109/ICST.2010.53","DOIUrl":"https://doi.org/10.1109/ICST.2010.53","url":null,"abstract":"Mobile computing systems provide new challenges for verification. One of them is the dynamicity of the system structure, with mobility-induced connections and disconnections, dynamic creation and shutdown of nodes. Interaction scenarios have then to consider the spatial configuration of the nodes as a first class concept. This paper presents GraphSeq, a graph matching tool for sequences of configurations developed in the framework of testing research. It aims to analyze test traces to identify occurrences of the successive spatial configurations described in an abstract scenario. We present the GraphSeq algorithm, as well as first experiments using randomly generated graphs, outputs from a mobility simulator, and test traces from a case study in ad hoc networks.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126241617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniele Grasso, A. Fantechi, Alessio Ferrari, C. Becheri, Stefano Bacherini
This article presents the experience of a railway signaling manufacturer in introducing the technologies of model based testing and abstract interpretation as part of its development process. Preliminary results show the better performance of these techniques with respect to the previously employed structural coverage based testing.
{"title":"Model Based Testing and Abstract Interpretation in the Railway Signaling Context","authors":"Daniele Grasso, A. Fantechi, Alessio Ferrari, C. Becheri, Stefano Bacherini","doi":"10.1109/ICST.2010.44","DOIUrl":"https://doi.org/10.1109/ICST.2010.44","url":null,"abstract":"This article presents the experience of a railway signaling manufacturer in introducing the technologies of model based testing and abstract interpretation as part of its development process. Preliminary results show the better performance of these techniques with respect to the previously employed structural coverage based testing.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"208 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131746033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advances in automated functional testing of Graphical User Interfaces (GUIs) rely on deriving graph models that approximate all possible sequences of events that may be executed on the GUI, and then use the graphs to generate test cases (event sequences) that achieve a specified coverage goal. However, because these models are only approximations of the actual event flows, the generated test cases may suffer from problems of infeasibility, i.e., some events may not be available for execution causing the test case to terminate prematurely. In this paper we develop a method to automatically repair GUI test suites, generating new test cases that are feasible. We use a genetic algorithm to evolve new test cases that increase our test suite's coverage while avoiding infeasible sequences. We experiment with this algorithm on a set of synthetic programs containing different types of constraints and for test sequences of varying lengths. Our results suggest that we can generate new test cases to cover most of the feasible coverage and that the genetic algorithm outperforms a random algorithm trying to achieve the same goal in almost all cases.
{"title":"Repairing GUI Test Suites Using a Genetic Algorithm","authors":"Si Huang, Myra B. Cohen, A. Memon","doi":"10.1109/ICST.2010.39","DOIUrl":"https://doi.org/10.1109/ICST.2010.39","url":null,"abstract":"Recent advances in automated functional testing of Graphical User Interfaces (GUIs) rely on deriving graph models that approximate all possible sequences of events that may be executed on the GUI, and then use the graphs to generate test cases (event sequences) that achieve a specified coverage goal. However, because these models are only approximations of the actual event flows, the generated test cases may suffer from problems of infeasibility, i.e., some events may not be available for execution causing the test case to terminate prematurely. In this paper we develop a method to automatically repair GUI test suites, generating new test cases that are feasible. We use a genetic algorithm to evolve new test cases that increase our test suite's coverage while avoiding infeasible sequences. We experiment with this algorithm on a set of synthetic programs containing different types of constraints and for test sequences of varying lengths. Our results suggest that we can generate new test cases to cover most of the feasible coverage and that the genetic algorithm outperforms a random algorithm trying to achieve the same goal in almost all cases.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133301493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software testing is a major source of expense in software projects and a proper testing process is a critical ingredient in the cost-efficient development of high-quality software. Contemporary aspects, such as the introduction of amore lightweight process, trends towards distributed development, and the rapid increase of software in embedded and safety-critical systems, challenge the testing process in unexpected manners. To our knowledge, there are very few studies focusing on these aspects in relation to testing as perceived by different contributors in the software development process. This paper qualitatively and quantitatively analyses data from an industrial questionnaire survey, with a focus on current practices and preferences on contemporary aspects of software testing. Specifically, the analysis focuses on perceptions of the software testing process in different categories of respondents. Categorization of respondents is based on safety-criticality, agility, distribution of development, and application domain. While confirming some of the commonly acknowledged facts, our findings also reveal notable discrepancies between preferred and actual testing practices. We believe continued research efforts are essential to provide guidelines in the adaptation of the testing process to take care of these discrepancies, thus improving the quality and efficiency of the software development.
{"title":"An Industrial Survey on Contemporary Aspects of Software Testing","authors":"Adnan Causevic, Daniel Sundmark, S. Punnekkat","doi":"10.1109/ICST.2010.52","DOIUrl":"https://doi.org/10.1109/ICST.2010.52","url":null,"abstract":"Software testing is a major source of expense in software projects and a proper testing process is a critical ingredient in the cost-efficient development of high-quality software. Contemporary aspects, such as the introduction of amore lightweight process, trends towards distributed development, and the rapid increase of software in embedded and safety-critical systems, challenge the testing process in unexpected manners. To our knowledge, there are very few studies focusing on these aspects in relation to testing as perceived by different contributors in the software development process. This paper qualitatively and quantitatively analyses data from an industrial questionnaire survey, with a focus on current practices and preferences on contemporary aspects of software testing. Specifically, the analysis focuses on perceptions of the software testing process in different categories of respondents. Categorization of respondents is based on safety-criticality, agility, distribution of development, and application domain. While confirming some of the commonly acknowledged facts, our findings also reveal notable discrepancies between preferred and actual testing practices. We believe continued research efforts are essential to provide guidelines in the adaptation of the testing process to take care of these discrepancies, thus improving the quality and efficiency of the software development.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127416392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We are working on the techniques which iteratively learn the formal models from black box implementations by testing. The novelty of the approach addressed here is our processing of the long counterexamples. There is a possibility that the counterexamples generated by a counterexample generator include needless sub sequences. We address the techniques which are developed to avoid the impact of such unwanted sequences on the learning process. The gain of the proposed algorithm is confirmed by considering a comprehensive set of experiments on the finite sate machines.
{"title":"State Machine Inference in Testing Context with Long Counterexamples","authors":"M. Irfan","doi":"10.1109/ICST.2010.68","DOIUrl":"https://doi.org/10.1109/ICST.2010.68","url":null,"abstract":"We are working on the techniques which iteratively learn the formal models from black box implementations by testing. The novelty of the approach addressed here is our processing of the long counterexamples. There is a possibility that the counterexamples generated by a counterexample generator include needless sub sequences. We address the techniques which are developed to avoid the impact of such unwanted sequences on the learning process. The gain of the proposed algorithm is confirmed by considering a comprehensive set of experiments on the finite sate machines.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121061439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Google’s external mythology has been one of a brilliant and chaotic innovation machine that produces new products and features at an amazing rate. Behind the curtain of public perception is a company that takes quality seriously and is reinventing how software is created, tested, released, and maintained; a reality that’s even more interesting than the myth. At Google we’ve learned a lot in the last few years about accelerating very large scale software development; in this paper we'll share what has worked and what hasn't worked for us.
{"title":"Google's Innovation Factory: Testing, Culture, and Infrastructure","authors":"Patrick Copeland","doi":"10.1109/ICST.2010.65","DOIUrl":"https://doi.org/10.1109/ICST.2010.65","url":null,"abstract":"Google’s external mythology has been one of a brilliant and chaotic innovation machine that produces new products and features at an amazing rate. Behind the curtain of public perception is a company that takes quality seriously and is reinventing how software is created, tested, released, and maintained; a reality that’s even more interesting than the myth. At Google we’ve learned a lot in the last few years about accelerating very large scale software development; in this paper we'll share what has worked and what hasn't worked for us.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115367411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mutation testing measures the adequacy of a test suite by seeding artificial defects (mutations) into a program. If a test suite fails to detect a mutation, it may also fail to detect real defects-and hence should be improved. However, there also are mutations which keep the program semantics unchanged and thus cannot be detected by any test suite. Such equivalent mutants must be weeded out manually, which is a tedious task. In this paper, we examine whether changes in coverage can be used to detect non-equivalent mutants: If a mutant changes the coverage of a run, it is more likely to be non-equivalent. Ina sample of 140 manually classified mutations of seven Java programs with 5,000to 100,000 lines of code, we found that: (a) the problem is serious and widespread-about 45% of all undetected mutants turned out to be equivalent;(b) manual classification takes time-about 15 minutes per mutation; (c)coverage is a simple, efficient, and effective means to identify equivalent mutants-with a classification precision of 75% and a recall of 56%; and (d)coverage as an equivalence detector is superior to the state of the art, in particular violations of dynamic invariants. Our detectors have been released as part of the open source Javalanche framework; the data set is publicly available for replication and extension of experiments.
{"title":"(Un-)Covering Equivalent Mutants","authors":"David Schuler, A. Zeller","doi":"10.1109/ICST.2010.30","DOIUrl":"https://doi.org/10.1109/ICST.2010.30","url":null,"abstract":"Mutation testing measures the adequacy of a test suite by seeding artificial defects (mutations) into a program. If a test suite fails to detect a mutation, it may also fail to detect real defects-and hence should be improved. However, there also are mutations which keep the program semantics unchanged and thus cannot be detected by any test suite. Such equivalent mutants must be weeded out manually, which is a tedious task. In this paper, we examine whether changes in coverage can be used to detect non-equivalent mutants: If a mutant changes the coverage of a run, it is more likely to be non-equivalent. Ina sample of 140 manually classified mutations of seven Java programs with 5,000to 100,000 lines of code, we found that: (a) the problem is serious and widespread-about 45% of all undetected mutants turned out to be equivalent;(b) manual classification takes time-about 15 minutes per mutation; (c)coverage is a simple, efficient, and effective means to identify equivalent mutants-with a classification precision of 75% and a recall of 56%; and (d)coverage as an equivalence detector is superior to the state of the art, in particular violations of dynamic invariants. Our detectors have been released as part of the open source Javalanche framework; the data set is publicly available for replication and extension of experiments.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114919149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We compare two types of model that have been used to predict software fault-proneness in the next release of a software system. Classification models make a binary prediction that a software entity such as a file or module is likely to be either faulty or not faulty in the next release. Ranking models order the entities according to their predicted number of faults. They are generally used to establish a priority for more intensive testing of the entities that occur early in the ranking. We investigate ways of assessing both classification models and ranking models, and the extent to which metrics appropriate for one type of model are also appropriate for the other. Previous work has shown that ranking models are capable of identifying relatively small sets of files that contain 75-95% of the faults detected in the next release of large legacy systems. In our studies of the rankings produced by these models, the faults not contained in the predicted most fault prone files are nearly always distributed across many of the remaining files; i.e., a single file that is in the lower portion of the ranking virtually never contains a large number of faults.
{"title":"We're Finding Most of the Bugs, but What are We Missing?","authors":"E. Weyuker, Robert M. Bell, T. Ostrand","doi":"10.1109/ICST.2010.18","DOIUrl":"https://doi.org/10.1109/ICST.2010.18","url":null,"abstract":"We compare two types of model that have been used to predict software fault-proneness in the next release of a software system. Classification models make a binary prediction that a software entity such as a file or module is likely to be either faulty or not faulty in the next release. Ranking models order the entities according to their predicted number of faults. They are generally used to establish a priority for more intensive testing of the entities that occur early in the ranking. We investigate ways of assessing both classification models and ranking models, and the extent to which metrics appropriate for one type of model are also appropriate for the other. Previous work has shown that ranking models are capable of identifying relatively small sets of files that contain 75-95% of the faults detected in the next release of large legacy systems. In our studies of the rankings produced by these models, the faults not contained in the predicted most fault prone files are nearly always distributed across many of the remaining files; i.e., a single file that is in the lower portion of the ranking virtually never contains a large number of faults.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130230400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we introduce Timed Moore Automata, a specification formalism which is used in industrial train control applications for specifying the real-time behavior of cooperating reactive software components. We define an operational semantics for the sequential components (units) with an abstraction of time that is suitable for checking timeout behavior of these units. A model checking algorithm for live lock detection is presented, and two alternative methods of test case/test data generation techniques are introduced. The first one is based on Kripke structures as used in explicit model checking, while the second method does not require an explicit representation but relies on SAT solving techniques.
{"title":"Timed Moore Automata: Test Data Generation and Model Checking","authors":"Helge Löding, J. Peleska","doi":"10.1109/ICST.2010.60","DOIUrl":"https://doi.org/10.1109/ICST.2010.60","url":null,"abstract":"In this paper we introduce Timed Moore Automata, a specification formalism which is used in industrial train control applications for specifying the real-time behavior of cooperating reactive software components. We define an operational semantics for the sequential components (units) with an abstraction of time that is suitable for checking timeout behavior of these units. A model checking algorithm for live lock detection is presented, and two alternative methods of test case/test data generation techniques are introduced. The first one is based on Kripke structures as used in explicit model checking, while the second method does not require an explicit representation but relies on SAT solving techniques.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124028382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}