Mutation testing is a method for measuring the quality of test suites. Given a system under test and a test suite, mutations are systematically inserted into the system, and the test suite is executed to determine which mutants it detects. A major cost of mutation testing is the time required to execute the test suite on all the mutants. This cost is even greater when the system under test is multithreaded: not only are test cases from the test suite executed on many mutants, but also each test case is executed for multiple possible thread schedules. We introduce a general framework that can reduce the time for mutation testing of multithreaded code. We present four techniques within the general framework and implement two of them in a tool called MuTMuT. We evaluate MuTMuT on eight multithreaded programs. The results show that MuTMuT reduces the time for mutation testing, substantially over a straightforward mutant execution and up to 77% with the advanced technique over the basic technique.
{"title":"MuTMuT: Efficient Exploration for Mutation Testing of Multithreaded Code","authors":"Miloš Gligorić, V. Jagannath, D. Marinov","doi":"10.1109/ICST.2010.33","DOIUrl":"https://doi.org/10.1109/ICST.2010.33","url":null,"abstract":"Mutation testing is a method for measuring the quality of test suites. Given a system under test and a test suite, mutations are systematically inserted into the system, and the test suite is executed to determine which mutants it detects. A major cost of mutation testing is the time required to execute the test suite on all the mutants. This cost is even greater when the system under test is multithreaded: not only are test cases from the test suite executed on many mutants, but also each test case is executed for multiple possible thread schedules. We introduce a general framework that can reduce the time for mutation testing of multithreaded code. We present four techniques within the general framework and implement two of them in a tool called MuTMuT. We evaluate MuTMuT on eight multithreaded programs. The results show that MuTMuT reduces the time for mutation testing, substantially over a straightforward mutant execution and up to 77% with the advanced technique over the basic technique.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123053278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The modularity and customer centric approach of use cases make them the preferred methods for requirement elicitation, especially in iterative software development processes as in agile programming. Numerous guidelines exist for use case style and content, but enforcing compliance to such guidelines in the industry currently requires specialized training and a strongly managed requirement elicitation process. However, often due to aggressive development schedules, organizations shy away from such extensive processes and end up capturing use cases in an ad-hoc fashion with little guidance. This results in poor quality use cases that are seldom fit for any downstream software activities. We have developed an approach for automated and “edittime”inspection of use cases based on the construction and analysis of models of use cases. Our models contain linguistic properties of the use case text along with the functional properties of the system under discussion. In this paper, we present a suite of model analysis techniques that leverage such models to validate uses cases simultaneously for their style and content. Such model analysis techniques can be combined with a robust NLP techniques to develop integrated development environments for use case authoring, as we do in Text2Test.When used in an industrial setting, Text2Test resulted in better compliance of use cases, in enhanced productivity
{"title":"Text2Test: Automated Inspection of Natural Language Use Cases","authors":"A. Sinha, S. Sutton, A. Paradkar","doi":"10.1109/ICST.2010.19","DOIUrl":"https://doi.org/10.1109/ICST.2010.19","url":null,"abstract":"The modularity and customer centric approach of use cases make them the preferred methods for requirement elicitation, especially in iterative software development processes as in agile programming. Numerous guidelines exist for use case style and content, but enforcing compliance to such guidelines in the industry currently requires specialized training and a strongly managed requirement elicitation process. However, often due to aggressive development schedules, organizations shy away from such extensive processes and end up capturing use cases in an ad-hoc fashion with little guidance. This results in poor quality use cases that are seldom fit for any downstream software activities. We have developed an approach for automated and “edittime”inspection of use cases based on the construction and analysis of models of use cases. Our models contain linguistic properties of the use case text along with the functional properties of the system under discussion. In this paper, we present a suite of model analysis techniques that leverage such models to validate uses cases simultaneously for their style and content. Such model analysis techniques can be combined with a robust NLP techniques to develop integrated development environments for use case authoring, as we do in Text2Test.When used in an industrial setting, Text2Test resulted in better compliance of use cases, in enhanced productivity","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121652938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We are working on the techniques which iteratively learn the formal models from black box implementations by testing. The novelty of the approach addressed here is our processing of the long counterexamples. There is a possibility that the counterexamples generated by a counterexample generator include needless sub sequences. We address the techniques which are developed to avoid the impact of such unwanted sequences on the learning process. The gain of the proposed algorithm is confirmed by considering a comprehensive set of experiments on the finite sate machines.
{"title":"State Machine Inference in Testing Context with Long Counterexamples","authors":"M. Irfan","doi":"10.1109/ICST.2010.68","DOIUrl":"https://doi.org/10.1109/ICST.2010.68","url":null,"abstract":"We are working on the techniques which iteratively learn the formal models from black box implementations by testing. The novelty of the approach addressed here is our processing of the long counterexamples. There is a possibility that the counterexamples generated by a counterexample generator include needless sub sequences. We address the techniques which are developed to avoid the impact of such unwanted sequences on the learning process. The gain of the proposed algorithm is confirmed by considering a comprehensive set of experiments on the finite sate machines.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121061439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Google’s external mythology has been one of a brilliant and chaotic innovation machine that produces new products and features at an amazing rate. Behind the curtain of public perception is a company that takes quality seriously and is reinventing how software is created, tested, released, and maintained; a reality that’s even more interesting than the myth. At Google we’ve learned a lot in the last few years about accelerating very large scale software development; in this paper we'll share what has worked and what hasn't worked for us.
{"title":"Google's Innovation Factory: Testing, Culture, and Infrastructure","authors":"Patrick Copeland","doi":"10.1109/ICST.2010.65","DOIUrl":"https://doi.org/10.1109/ICST.2010.65","url":null,"abstract":"Google’s external mythology has been one of a brilliant and chaotic innovation machine that produces new products and features at an amazing rate. Behind the curtain of public perception is a company that takes quality seriously and is reinventing how software is created, tested, released, and maintained; a reality that’s even more interesting than the myth. At Google we’ve learned a lot in the last few years about accelerating very large scale software development; in this paper we'll share what has worked and what hasn't worked for us.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115367411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software testing is a major source of expense in software projects and a proper testing process is a critical ingredient in the cost-efficient development of high-quality software. Contemporary aspects, such as the introduction of amore lightweight process, trends towards distributed development, and the rapid increase of software in embedded and safety-critical systems, challenge the testing process in unexpected manners. To our knowledge, there are very few studies focusing on these aspects in relation to testing as perceived by different contributors in the software development process. This paper qualitatively and quantitatively analyses data from an industrial questionnaire survey, with a focus on current practices and preferences on contemporary aspects of software testing. Specifically, the analysis focuses on perceptions of the software testing process in different categories of respondents. Categorization of respondents is based on safety-criticality, agility, distribution of development, and application domain. While confirming some of the commonly acknowledged facts, our findings also reveal notable discrepancies between preferred and actual testing practices. We believe continued research efforts are essential to provide guidelines in the adaptation of the testing process to take care of these discrepancies, thus improving the quality and efficiency of the software development.
{"title":"An Industrial Survey on Contemporary Aspects of Software Testing","authors":"Adnan Causevic, Daniel Sundmark, S. Punnekkat","doi":"10.1109/ICST.2010.52","DOIUrl":"https://doi.org/10.1109/ICST.2010.52","url":null,"abstract":"Software testing is a major source of expense in software projects and a proper testing process is a critical ingredient in the cost-efficient development of high-quality software. Contemporary aspects, such as the introduction of amore lightweight process, trends towards distributed development, and the rapid increase of software in embedded and safety-critical systems, challenge the testing process in unexpected manners. To our knowledge, there are very few studies focusing on these aspects in relation to testing as perceived by different contributors in the software development process. This paper qualitatively and quantitatively analyses data from an industrial questionnaire survey, with a focus on current practices and preferences on contemporary aspects of software testing. Specifically, the analysis focuses on perceptions of the software testing process in different categories of respondents. Categorization of respondents is based on safety-criticality, agility, distribution of development, and application domain. While confirming some of the commonly acknowledged facts, our findings also reveal notable discrepancies between preferred and actual testing practices. We believe continued research efforts are essential to provide guidelines in the adaptation of the testing process to take care of these discrepancies, thus improving the quality and efficiency of the software development.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127416392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we introduce Timed Moore Automata, a specification formalism which is used in industrial train control applications for specifying the real-time behavior of cooperating reactive software components. We define an operational semantics for the sequential components (units) with an abstraction of time that is suitable for checking timeout behavior of these units. A model checking algorithm for live lock detection is presented, and two alternative methods of test case/test data generation techniques are introduced. The first one is based on Kripke structures as used in explicit model checking, while the second method does not require an explicit representation but relies on SAT solving techniques.
{"title":"Timed Moore Automata: Test Data Generation and Model Checking","authors":"Helge Löding, J. Peleska","doi":"10.1109/ICST.2010.60","DOIUrl":"https://doi.org/10.1109/ICST.2010.60","url":null,"abstract":"In this paper we introduce Timed Moore Automata, a specification formalism which is used in industrial train control applications for specifying the real-time behavior of cooperating reactive software components. We define an operational semantics for the sequential components (units) with an abstraction of time that is suitable for checking timeout behavior of these units. A model checking algorithm for live lock detection is presented, and two alternative methods of test case/test data generation techniques are introduced. The first one is based on Kripke structures as used in explicit model checking, while the second method does not require an explicit representation but relies on SAT solving techniques.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124028382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tien-Dung Cao, Patrick Félix, R. Castanet, Ismail Berrada
Testing conceptually consists of three activities: test case generation, test case execution and verdict assignment. Using online testing, test cases are generated and simultaneously executed (i.e. the complete test scenario is built during test execution). This paper presents a framework that automatically generates and executes tests "online" for conformance testing of a composite of Web services described in BPEL. The proposed framework considers unit testing and it is based on a timed modeling of BPEL specification, a distributed testing architecture and an online testing algorithm that generates, executes and assigns verdicts to every generated state in the test case.
{"title":"Online Testing Framework for Web Services","authors":"Tien-Dung Cao, Patrick Félix, R. Castanet, Ismail Berrada","doi":"10.1109/ICST.2010.11","DOIUrl":"https://doi.org/10.1109/ICST.2010.11","url":null,"abstract":"Testing conceptually consists of three activities: test case generation, test case execution and verdict assignment. Using online testing, test cases are generated and simultaneously executed (i.e. the complete test scenario is built during test execution). This paper presents a framework that automatically generates and executes tests \"online\" for conformance testing of a composite of Web services described in BPEL. The proposed framework considers unit testing and it is based on a timed modeling of BPEL specification, a distributed testing architecture and an online testing algorithm that generates, executes and assigns verdicts to every generated state in the test case.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126570356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We compare two types of model that have been used to predict software fault-proneness in the next release of a software system. Classification models make a binary prediction that a software entity such as a file or module is likely to be either faulty or not faulty in the next release. Ranking models order the entities according to their predicted number of faults. They are generally used to establish a priority for more intensive testing of the entities that occur early in the ranking. We investigate ways of assessing both classification models and ranking models, and the extent to which metrics appropriate for one type of model are also appropriate for the other. Previous work has shown that ranking models are capable of identifying relatively small sets of files that contain 75-95% of the faults detected in the next release of large legacy systems. In our studies of the rankings produced by these models, the faults not contained in the predicted most fault prone files are nearly always distributed across many of the remaining files; i.e., a single file that is in the lower portion of the ranking virtually never contains a large number of faults.
{"title":"We're Finding Most of the Bugs, but What are We Missing?","authors":"E. Weyuker, Robert M. Bell, T. Ostrand","doi":"10.1109/ICST.2010.18","DOIUrl":"https://doi.org/10.1109/ICST.2010.18","url":null,"abstract":"We compare two types of model that have been used to predict software fault-proneness in the next release of a software system. Classification models make a binary prediction that a software entity such as a file or module is likely to be either faulty or not faulty in the next release. Ranking models order the entities according to their predicted number of faults. They are generally used to establish a priority for more intensive testing of the entities that occur early in the ranking. We investigate ways of assessing both classification models and ranking models, and the extent to which metrics appropriate for one type of model are also appropriate for the other. Previous work has shown that ranking models are capable of identifying relatively small sets of files that contain 75-95% of the faults detected in the next release of large legacy systems. In our studies of the rankings produced by these models, the faults not contained in the predicted most fault prone files are nearly always distributed across many of the remaining files; i.e., a single file that is in the lower portion of the ranking virtually never contains a large number of faults.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130230400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When a program is modified during software evolution, developers typically run the new version of the program against its existing test suite to validate that the changes made on the program did not introduce unintended side effects (i.e., regression faults). This kind of regression testing can be effective in identifying some regression faults, but it is limited by the quality of the existing test suite. Due to the cost of testing, developers build test suites by finding acceptable tradeoffs between cost and thoroughness of the tests. As a result, these test suites tend to exercise only a small subset of the program's functionality and may be inadequate for testing the changes in a program. To address this issue, we propose a novel approach called Behavioral Regression Testing (BERT). Given two versions of a program, BERT identifies behavioral differences between the two versions through dynamical analysis, in three steps. First, it generates a large number of test inputs that focus on the changed parts of the code. Second, it runs the generated test inputs on the old and new versions of the code and identifies differences in the tests' behavior. Third, it analyzes the identified differences and presents them to the developers. By focusing on a subset of the code and leveraging differential behavior, BERT can provide developers with more (and more detailed) information than traditional regression testing techniques. To evaluate BERT, we implemented it as a plug-in for Eclipse, a popular Integrated Development Environment, and used the plug-in to perform a preliminary study on two programs. The results of our study are promising, in that BERT was able to identify true regression faults in the programs.
{"title":"Automated Behavioral Regression Testing","authors":"Wei Jin, A. Orso, Tao Xie","doi":"10.1109/ICST.2010.64","DOIUrl":"https://doi.org/10.1109/ICST.2010.64","url":null,"abstract":"When a program is modified during software evolution, developers typically run the new version of the program against its existing test suite to validate that the changes made on the program did not introduce unintended side effects (i.e., regression faults). This kind of regression testing can be effective in identifying some regression faults, but it is limited by the quality of the existing test suite. Due to the cost of testing, developers build test suites by finding acceptable tradeoffs between cost and thoroughness of the tests. As a result, these test suites tend to exercise only a small subset of the program's functionality and may be inadequate for testing the changes in a program. To address this issue, we propose a novel approach called Behavioral Regression Testing (BERT). Given two versions of a program, BERT identifies behavioral differences between the two versions through dynamical analysis, in three steps. First, it generates a large number of test inputs that focus on the changed parts of the code. Second, it runs the generated test inputs on the old and new versions of the code and identifies differences in the tests' behavior. Third, it analyzes the identified differences and presents them to the developers. By focusing on a subset of the code and leveraging differential behavior, BERT can provide developers with more (and more detailed) information than traditional regression testing techniques. To evaluate BERT, we implemented it as a plug-in for Eclipse, a popular Integrated Development Environment, and used the plug-in to perform a preliminary study on two programs. The results of our study are promising, in that BERT was able to identify true regression faults in the programs.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133703235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mutation testing measures the adequacy of a test suite by seeding artificial defects (mutations) into a program. If a test suite fails to detect a mutation, it may also fail to detect real defects-and hence should be improved. However, there also are mutations which keep the program semantics unchanged and thus cannot be detected by any test suite. Such equivalent mutants must be weeded out manually, which is a tedious task. In this paper, we examine whether changes in coverage can be used to detect non-equivalent mutants: If a mutant changes the coverage of a run, it is more likely to be non-equivalent. Ina sample of 140 manually classified mutations of seven Java programs with 5,000to 100,000 lines of code, we found that: (a) the problem is serious and widespread-about 45% of all undetected mutants turned out to be equivalent;(b) manual classification takes time-about 15 minutes per mutation; (c)coverage is a simple, efficient, and effective means to identify equivalent mutants-with a classification precision of 75% and a recall of 56%; and (d)coverage as an equivalence detector is superior to the state of the art, in particular violations of dynamic invariants. Our detectors have been released as part of the open source Javalanche framework; the data set is publicly available for replication and extension of experiments.
{"title":"(Un-)Covering Equivalent Mutants","authors":"David Schuler, A. Zeller","doi":"10.1109/ICST.2010.30","DOIUrl":"https://doi.org/10.1109/ICST.2010.30","url":null,"abstract":"Mutation testing measures the adequacy of a test suite by seeding artificial defects (mutations) into a program. If a test suite fails to detect a mutation, it may also fail to detect real defects-and hence should be improved. However, there also are mutations which keep the program semantics unchanged and thus cannot be detected by any test suite. Such equivalent mutants must be weeded out manually, which is a tedious task. In this paper, we examine whether changes in coverage can be used to detect non-equivalent mutants: If a mutant changes the coverage of a run, it is more likely to be non-equivalent. Ina sample of 140 manually classified mutations of seven Java programs with 5,000to 100,000 lines of code, we found that: (a) the problem is serious and widespread-about 45% of all undetected mutants turned out to be equivalent;(b) manual classification takes time-about 15 minutes per mutation; (c)coverage is a simple, efficient, and effective means to identify equivalent mutants-with a classification precision of 75% and a recall of 56%; and (d)coverage as an equivalence detector is superior to the state of the art, in particular violations of dynamic invariants. Our detectors have been released as part of the open source Javalanche framework; the data set is publicly available for replication and extension of experiments.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114919149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}