首页 > 最新文献

2010 Third International Conference on Software Testing, Verification and Validation最新文献

英文 中文
MuTMuT: Efficient Exploration for Mutation Testing of Multithreaded Code MuTMuT:多线程代码突变测试的有效探索
Miloš Gligorić, V. Jagannath, D. Marinov
Mutation testing is a method for measuring the quality of test suites. Given a system under test and a test suite, mutations are systematically inserted into the system, and the test suite is executed to determine which mutants it detects. A major cost of mutation testing is the time required to execute the test suite on all the mutants. This cost is even greater when the system under test is multithreaded: not only are test cases from the test suite executed on many mutants, but also each test case is executed for multiple possible thread schedules. We introduce a general framework that can reduce the time for mutation testing of multithreaded code. We present four techniques within the general framework and implement two of them in a tool called MuTMuT. We evaluate MuTMuT on eight multithreaded programs. The results show that MuTMuT reduces the time for mutation testing, substantially over a straightforward mutant execution and up to 77% with the advanced technique over the basic technique.
突变测试是一种测量测试套件质量的方法。给定一个被测试的系统和一个测试套件,突变被系统地插入到系统中,并且测试套件被执行以确定它检测到哪些突变。突变测试的主要成本是在所有突变体上执行测试套件所需的时间。当被测试的系统是多线程的时候,这个成本甚至更大:不仅测试套件中的测试用例在许多变体上执行,而且每个测试用例都是为多个可能的线程调度执行的。我们引入了一个通用框架,可以减少多线程代码的突变测试时间。我们在通用框架中介绍了四种技术,并在称为MuTMuT的工具中实现了其中两种技术。我们在八个多线程程序上评估MuTMuT。结果表明,MuTMuT减少了突变测试的时间,比直接执行突变大大减少,与基本技术相比,高级技术最多减少77%。
{"title":"MuTMuT: Efficient Exploration for Mutation Testing of Multithreaded Code","authors":"Miloš Gligorić, V. Jagannath, D. Marinov","doi":"10.1109/ICST.2010.33","DOIUrl":"https://doi.org/10.1109/ICST.2010.33","url":null,"abstract":"Mutation testing is a method for measuring the quality of test suites. Given a system under test and a test suite, mutations are systematically inserted into the system, and the test suite is executed to determine which mutants it detects. A major cost of mutation testing is the time required to execute the test suite on all the mutants. This cost is even greater when the system under test is multithreaded: not only are test cases from the test suite executed on many mutants, but also each test case is executed for multiple possible thread schedules. We introduce a general framework that can reduce the time for mutation testing of multithreaded code. We present four techniques within the general framework and implement two of them in a tool called MuTMuT. We evaluate MuTMuT on eight multithreaded programs. The results show that MuTMuT reduces the time for mutation testing, substantially over a straightforward mutant execution and up to 77% with the advanced technique over the basic technique.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123053278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Text2Test: Automated Inspection of Natural Language Use Cases Text2Test:自动检查自然语言用例
A. Sinha, S. Sutton, A. Paradkar
The modularity and customer centric approach of use cases make them the preferred methods for requirement elicitation, especially in iterative software development processes as in agile programming. Numerous guidelines exist for use case style and content, but enforcing compliance to such guidelines in the industry currently requires specialized training and a strongly managed requirement elicitation process. However, often due to aggressive development schedules, organizations shy away from such extensive processes and end up capturing use cases in an ad-hoc fashion with little guidance. This results in poor quality use cases that are seldom fit for any downstream software activities. We have developed an approach for automated and “edittime”inspection of use cases based on the construction and analysis of models of use cases. Our models contain linguistic properties of the use case text along with the functional properties of the system under discussion. In this paper, we present a suite of model analysis techniques that leverage such models to validate uses cases simultaneously for their style and content. Such model analysis techniques can be combined with a robust NLP techniques to develop integrated development environments for use case authoring, as we do in Text2Test.When used in an industrial setting, Text2Test resulted in better compliance of use cases, in enhanced productivity
用例的模块化和以客户为中心的方法使它们成为需求引出的首选方法,特别是在敏捷编程的迭代软件开发过程中。对于用例的风格和内容存在着大量的指导方针,但是在行业中强制执行这些指导方针目前需要专门的培训和强有力的管理需求引出过程。然而,通常由于激进的开发计划,组织回避这种广泛的过程,并最终在缺乏指导的情况下以特别的方式捕获用例。这导致低质量的用例很少适合任何下游软件活动。我们已经开发了一种基于用例模型的构建和分析的自动化和“编辑时”用例检查的方法。我们的模型包含用例文本的语言属性以及讨论中的系统的功能属性。在本文中,我们提出了一套模型分析技术,利用这些模型来同时验证用例的样式和内容。这样的模型分析技术可以与健壮的NLP技术相结合,以开发用于用例创作的集成开发环境,正如我们在Text2Test中所做的那样。当在工业环境中使用时,Text2Test可以更好地遵从用例,从而提高生产力
{"title":"Text2Test: Automated Inspection of Natural Language Use Cases","authors":"A. Sinha, S. Sutton, A. Paradkar","doi":"10.1109/ICST.2010.19","DOIUrl":"https://doi.org/10.1109/ICST.2010.19","url":null,"abstract":"The modularity and customer centric approach of use cases make them the preferred methods for requirement elicitation, especially in iterative software development processes as in agile programming. Numerous guidelines exist for use case style and content, but enforcing compliance to such guidelines in the industry currently requires specialized training and a strongly managed requirement elicitation process. However, often due to aggressive development schedules, organizations shy away from such extensive processes and end up capturing use cases in an ad-hoc fashion with little guidance. This results in poor quality use cases that are seldom fit for any downstream software activities. We have developed an approach for automated and “edittime”inspection of use cases based on the construction and analysis of models of use cases. Our models contain linguistic properties of the use case text along with the functional properties of the system under discussion. In this paper, we present a suite of model analysis techniques that leverage such models to validate uses cases simultaneously for their style and content. Such model analysis techniques can be combined with a robust NLP techniques to develop integrated development environments for use case authoring, as we do in Text2Test.When used in an industrial setting, Text2Test resulted in better compliance of use cases, in enhanced productivity","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121652938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
State Machine Inference in Testing Context with Long Counterexamples 具有长反例的测试环境中的状态机推理
M. Irfan
We are working on the techniques which iteratively learn the formal models from black box implementations by testing. The novelty of the approach addressed here is our processing of the long counterexamples. There is a possibility that the counterexamples generated by a counterexample generator include needless sub sequences. We address the techniques which are developed to avoid the impact of such unwanted sequences on the learning process. The gain of the proposed algorithm is confirmed by considering a comprehensive set of experiments on the finite sate machines.
我们正在研究通过测试从黑盒实现中迭代学习正式模型的技术。这里讨论的方法的新颖之处在于我们对长反例的处理。反例生成器生成的反例可能包含不必要的子序列。我们讨论了为避免这些不需要的序列对学习过程的影响而开发的技术。通过在有限安全机上进行的一系列综合实验,验证了该算法的增益。
{"title":"State Machine Inference in Testing Context with Long Counterexamples","authors":"M. Irfan","doi":"10.1109/ICST.2010.68","DOIUrl":"https://doi.org/10.1109/ICST.2010.68","url":null,"abstract":"We are working on the techniques which iteratively learn the formal models from black box implementations by testing. The novelty of the approach addressed here is our processing of the long counterexamples. There is a possibility that the counterexamples generated by a counterexample generator include needless sub sequences. We address the techniques which are developed to avoid the impact of such unwanted sequences on the learning process. The gain of the proposed algorithm is confirmed by considering a comprehensive set of experiments on the finite sate machines.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121061439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Google's Innovation Factory: Testing, Culture, and Infrastructure 谷歌的创新工厂:测试、文化和基础设施
Patrick Copeland
Google’s external mythology has been one of a brilliant and chaotic innovation machine that produces new products and features at an amazing rate. Behind the curtain of public perception is a company that takes quality seriously and is reinventing how software is created, tested, released, and maintained; a reality that’s even more interesting than the myth. At Google we’ve learned a lot in the last few years about accelerating very large scale software development; in this paper we'll share what has worked and what hasn't worked for us.
谷歌的外部神话是一台辉煌而混乱的创新机器,以惊人的速度生产新产品和新功能。在公众的认知背后,是一家认真对待质量的公司,正在重塑软件的创建、测试、发布和维护方式;一个比神话更有趣的现实。在过去几年里,我们在加速大规模软件开发方面学到了很多东西;在这篇文章中,我们将分享哪些对我们有用,哪些对我们没用。
{"title":"Google's Innovation Factory: Testing, Culture, and Infrastructure","authors":"Patrick Copeland","doi":"10.1109/ICST.2010.65","DOIUrl":"https://doi.org/10.1109/ICST.2010.65","url":null,"abstract":"Google’s external mythology has been one of a brilliant and chaotic innovation machine that produces new products and features at an amazing rate. Behind the curtain of public perception is a company that takes quality seriously and is reinventing how software is created, tested, released, and maintained; a reality that’s even more interesting than the myth. At Google we’ve learned a lot in the last few years about accelerating very large scale software development; in this paper we'll share what has worked and what hasn't worked for us.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115367411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An Industrial Survey on Contemporary Aspects of Software Testing 当代软件测试的工业调查
Adnan Causevic, Daniel Sundmark, S. Punnekkat
Software testing is a major source of expense in software projects and a proper testing process is a critical ingredient in the cost-efficient development of high-quality software. Contemporary aspects, such as the introduction of amore lightweight process, trends towards distributed development, and the rapid increase of software in embedded and safety-critical systems, challenge the testing process in unexpected manners. To our knowledge, there are very few studies focusing on these aspects in relation to testing as perceived by different contributors in the software development process. This paper qualitatively and quantitatively analyses data from an industrial questionnaire survey, with a focus on current practices and preferences on contemporary aspects of software testing. Specifically, the analysis focuses on perceptions of the software testing process in different categories of respondents. Categorization of respondents is based on safety-criticality, agility, distribution of development, and application domain. While confirming some of the commonly acknowledged facts, our findings also reveal notable discrepancies between preferred and actual testing practices. We believe continued research efforts are essential to provide guidelines in the adaptation of the testing process to take care of these discrepancies, thus improving the quality and efficiency of the software development.
软件测试是软件项目费用的主要来源,而适当的测试过程是开发高质量软件的成本效率的关键因素。当代的一些方面,例如更轻量级过程的引入,分布式开发的趋势,以及嵌入式和安全关键系统中软件的快速增长,以意想不到的方式挑战测试过程。据我们所知,很少有研究关注软件开发过程中不同贡献者感知到的与测试相关的这些方面。本文定性和定量地分析了来自工业问卷调查的数据,重点关注当前软件测试方面的实践和偏好。具体地说,分析集中在不同类型的受访者对软件测试过程的看法上。受访者的分类基于安全性、敏捷性、开发分布和应用程序领域。在确认一些公认的事实的同时,我们的发现也揭示了首选和实际测试实践之间的显著差异。我们相信持续的研究工作对于提供适应测试过程的指导方针来处理这些差异是必要的,从而提高软件开发的质量和效率。
{"title":"An Industrial Survey on Contemporary Aspects of Software Testing","authors":"Adnan Causevic, Daniel Sundmark, S. Punnekkat","doi":"10.1109/ICST.2010.52","DOIUrl":"https://doi.org/10.1109/ICST.2010.52","url":null,"abstract":"Software testing is a major source of expense in software projects and a proper testing process is a critical ingredient in the cost-efficient development of high-quality software. Contemporary aspects, such as the introduction of amore lightweight process, trends towards distributed development, and the rapid increase of software in embedded and safety-critical systems, challenge the testing process in unexpected manners. To our knowledge, there are very few studies focusing on these aspects in relation to testing as perceived by different contributors in the software development process. This paper qualitatively and quantitatively analyses data from an industrial questionnaire survey, with a focus on current practices and preferences on contemporary aspects of software testing. Specifically, the analysis focuses on perceptions of the software testing process in different categories of respondents. Categorization of respondents is based on safety-criticality, agility, distribution of development, and application domain. While confirming some of the commonly acknowledged facts, our findings also reveal notable discrepancies between preferred and actual testing practices. We believe continued research efforts are essential to provide guidelines in the adaptation of the testing process to take care of these discrepancies, thus improving the quality and efficiency of the software development.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127416392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Timed Moore Automata: Test Data Generation and Model Checking 定时摩尔自动机:测试数据生成和模型检查
Helge Löding, J. Peleska
In this paper we introduce Timed Moore Automata, a specification formalism which is used in industrial train control applications for specifying the real-time behavior of cooperating reactive software components. We define an operational semantics for the sequential components (units) with an abstraction of time that is suitable for checking timeout behavior of these units. A model checking algorithm for live lock detection is presented, and two alternative methods of test case/test data generation techniques are introduced. The first one is based on Kripke structures as used in explicit model checking, while the second method does not require an explicit representation but relies on SAT solving techniques.
本文介绍了时序摩尔自动机,这是一种用于工业列车控制应用的规范形式,用于指定协作响应软件组件的实时行为。我们为顺序组件(单元)定义了一个操作语义,该语义具有适合于检查这些单元超时行为的时间抽象。提出了一种用于活动锁检测的模型检查算法,并介绍了测试用例/测试数据生成技术的两种替代方法。第一种方法是基于用于显式模型检查的Kripke结构,而第二种方法不需要显式表示,而是依赖于SAT求解技术。
{"title":"Timed Moore Automata: Test Data Generation and Model Checking","authors":"Helge Löding, J. Peleska","doi":"10.1109/ICST.2010.60","DOIUrl":"https://doi.org/10.1109/ICST.2010.60","url":null,"abstract":"In this paper we introduce Timed Moore Automata, a specification formalism which is used in industrial train control applications for specifying the real-time behavior of cooperating reactive software components. We define an operational semantics for the sequential components (units) with an abstraction of time that is suitable for checking timeout behavior of these units. A model checking algorithm for live lock detection is presented, and two alternative methods of test case/test data generation techniques are introduced. The first one is based on Kripke structures as used in explicit model checking, while the second method does not require an explicit representation but relies on SAT solving techniques.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124028382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Online Testing Framework for Web Services Web服务在线测试框架
Tien-Dung Cao, Patrick Félix, R. Castanet, Ismail Berrada
Testing conceptually consists of three activities: test case generation, test case execution and verdict assignment. Using online testing, test cases are generated and simultaneously executed (i.e. the complete test scenario is built during test execution). This paper presents a framework that automatically generates and executes tests "online" for conformance testing of a composite of Web services described in BPEL. The proposed framework considers unit testing and it is based on a timed modeling of BPEL specification, a distributed testing architecture and an online testing algorithm that generates, executes and assigns verdicts to every generated state in the test case.
测试在概念上由三个活动组成:测试用例生成、测试用例执行和判决分配。使用在线测试,测试用例被生成并同时执行(即在测试执行期间构建完整的测试场景)。本文提供了一个框架,该框架可以自动生成并“在线”执行测试,用于BPEL中描述的Web服务组合的一致性测试。建议的框架考虑了单元测试,它基于BPEL规范的定时建模、分布式测试体系结构和在线测试算法,该算法生成、执行和分配测试用例中每个生成状态的结论。
{"title":"Online Testing Framework for Web Services","authors":"Tien-Dung Cao, Patrick Félix, R. Castanet, Ismail Berrada","doi":"10.1109/ICST.2010.11","DOIUrl":"https://doi.org/10.1109/ICST.2010.11","url":null,"abstract":"Testing conceptually consists of three activities: test case generation, test case execution and verdict assignment. Using online testing, test cases are generated and simultaneously executed (i.e. the complete test scenario is built during test execution). This paper presents a framework that automatically generates and executes tests \"online\" for conformance testing of a composite of Web services described in BPEL. The proposed framework considers unit testing and it is based on a timed modeling of BPEL specification, a distributed testing architecture and an online testing algorithm that generates, executes and assigns verdicts to every generated state in the test case.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126570356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
We're Finding Most of the Bugs, but What are We Missing? 我们找到了大部分的bug,但我们还错过了什么?
E. Weyuker, Robert M. Bell, T. Ostrand
We compare two types of model that have been used to predict software fault-proneness in the next release of a software system. Classification models make a binary prediction that a software entity such as a file or module is likely to be either faulty or not faulty in the next release. Ranking models order the entities according to their predicted number of faults. They are generally used to establish a priority for more intensive testing of the entities that occur early in the ranking. We investigate ways of assessing both classification models and ranking models, and the extent to which metrics appropriate for one type of model are also appropriate for the other. Previous work has shown that ranking models are capable of identifying relatively small sets of files that contain 75-95% of the faults detected in the next release of large legacy systems. In our studies of the rankings produced by these models, the faults not contained in the predicted most fault prone files are nearly always distributed across many of the remaining files; i.e., a single file that is in the lower portion of the ranking virtually never contains a large number of faults.
我们比较了两种类型的模型,它们被用来预测软件系统下一个版本中的软件故障倾向。分类模型对软件实体(如文件或模块)在下一个版本中可能出现故障或没有故障进行二进制预测。排序模型根据预测的故障数量对实体进行排序。它们通常用于确定优先级,以便对排名早期出现的实体进行更密集的测试。我们研究了评估分类模型和排名模型的方法,以及适用于一种模型的度量在多大程度上也适用于另一种模型。以前的工作表明,排名模型能够识别相对较小的文件集,这些文件集包含大型遗留系统下一个版本中检测到的75-95%的错误。在我们对这些模型产生的排名的研究中,未包含在预测的最易出错文件中的故障几乎总是分布在许多剩余的文件中;也就是说,排名较低的单个文件实际上从不包含大量错误。
{"title":"We're Finding Most of the Bugs, but What are We Missing?","authors":"E. Weyuker, Robert M. Bell, T. Ostrand","doi":"10.1109/ICST.2010.18","DOIUrl":"https://doi.org/10.1109/ICST.2010.18","url":null,"abstract":"We compare two types of model that have been used to predict software fault-proneness in the next release of a software system. Classification models make a binary prediction that a software entity such as a file or module is likely to be either faulty or not faulty in the next release. Ranking models order the entities according to their predicted number of faults. They are generally used to establish a priority for more intensive testing of the entities that occur early in the ranking. We investigate ways of assessing both classification models and ranking models, and the extent to which metrics appropriate for one type of model are also appropriate for the other. Previous work has shown that ranking models are capable of identifying relatively small sets of files that contain 75-95% of the faults detected in the next release of large legacy systems. In our studies of the rankings produced by these models, the faults not contained in the predicted most fault prone files are nearly always distributed across many of the remaining files; i.e., a single file that is in the lower portion of the ranking virtually never contains a large number of faults.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130230400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Automated Behavioral Regression Testing 自动化行为回归测试
Wei Jin, A. Orso, Tao Xie
When a program is modified during software evolution, developers typically run the new version of the program against its existing test suite to validate that the changes made on the program did not introduce unintended side effects (i.e., regression faults). This kind of regression testing can be effective in identifying some regression faults, but it is limited by the quality of the existing test suite. Due to the cost of testing, developers build test suites by finding acceptable tradeoffs between cost and thoroughness of the tests. As a result, these test suites tend to exercise only a small subset of the program's functionality and may be inadequate for testing the changes in a program. To address this issue, we propose a novel approach called Behavioral Regression Testing (BERT). Given two versions of a program, BERT identifies behavioral differences between the two versions through dynamical analysis, in three steps. First, it generates a large number of test inputs that focus on the changed parts of the code. Second, it runs the generated test inputs on the old and new versions of the code and identifies differences in the tests' behavior. Third, it analyzes the identified differences and presents them to the developers. By focusing on a subset of the code and leveraging differential behavior, BERT can provide developers with more (and more detailed) information than traditional regression testing techniques. To evaluate BERT, we implemented it as a plug-in for Eclipse, a popular Integrated Development Environment, and used the plug-in to perform a preliminary study on two programs. The results of our study are promising, in that BERT was able to identify true regression faults in the programs.
当一个程序在软件发展过程中被修改时,开发人员通常会针对它现有的测试套件运行程序的新版本,以验证对程序所做的更改没有引入意想不到的副作用(例如,回归错误)。这种类型的回归测试可以有效地识别一些回归错误,但是它受到现有测试套件质量的限制。由于测试的成本,开发人员通过在成本和测试的彻底性之间找到可接受的折衷来构建测试套件。因此,这些测试套件倾向于只执行程序功能的一小部分,并且可能不足以测试程序中的更改。为了解决这个问题,我们提出了一种新的方法,称为行为回归测试(BERT)。给定一个程序的两个版本,BERT通过动态分析识别两个版本之间的行为差异,分三个步骤。首先,它生成大量的测试输入,这些测试输入关注于代码的更改部分。其次,它在代码的新旧版本上运行生成的测试输入,并识别测试行为中的差异。第三,分析识别出的差异,并将其呈现给开发者。通过关注代码的子集并利用不同的行为,BERT可以为开发人员提供比传统回归测试技术更多(更详细)的信息。为了评估BERT,我们将其实现为Eclipse(一种流行的集成开发环境)的插件,并使用该插件对两个程序执行初步研究。我们的研究结果是有希望的,因为BERT能够识别程序中真正的回归错误。
{"title":"Automated Behavioral Regression Testing","authors":"Wei Jin, A. Orso, Tao Xie","doi":"10.1109/ICST.2010.64","DOIUrl":"https://doi.org/10.1109/ICST.2010.64","url":null,"abstract":"When a program is modified during software evolution, developers typically run the new version of the program against its existing test suite to validate that the changes made on the program did not introduce unintended side effects (i.e., regression faults). This kind of regression testing can be effective in identifying some regression faults, but it is limited by the quality of the existing test suite. Due to the cost of testing, developers build test suites by finding acceptable tradeoffs between cost and thoroughness of the tests. As a result, these test suites tend to exercise only a small subset of the program's functionality and may be inadequate for testing the changes in a program. To address this issue, we propose a novel approach called Behavioral Regression Testing (BERT). Given two versions of a program, BERT identifies behavioral differences between the two versions through dynamical analysis, in three steps. First, it generates a large number of test inputs that focus on the changed parts of the code. Second, it runs the generated test inputs on the old and new versions of the code and identifies differences in the tests' behavior. Third, it analyzes the identified differences and presents them to the developers. By focusing on a subset of the code and leveraging differential behavior, BERT can provide developers with more (and more detailed) information than traditional regression testing techniques. To evaluate BERT, we implemented it as a plug-in for Eclipse, a popular Integrated Development Environment, and used the plug-in to perform a preliminary study on two programs. The results of our study are promising, in that BERT was able to identify true regression faults in the programs.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133703235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
(Un-)Covering Equivalent Mutants (不)覆盖等效突变体
David Schuler, A. Zeller
Mutation testing measures the adequacy of a test suite by seeding artificial defects (mutations) into a program. If a test suite fails to detect a mutation, it may also fail to detect real defects-and hence should be improved. However, there also are mutations which keep the program semantics unchanged and thus cannot be detected by any test suite. Such equivalent mutants must be weeded out manually, which is a tedious task. In this paper, we examine whether changes in coverage can be used to detect non-equivalent mutants: If a mutant changes the coverage of a run, it is more likely to be non-equivalent. Ina sample of 140 manually classified mutations of seven Java programs with 5,000to 100,000 lines of code, we found that: (a) the problem is serious and widespread-about 45% of all undetected mutants turned out to be equivalent;(b) manual classification takes time-about 15 minutes per mutation; (c)coverage is a simple, efficient, and effective means to identify equivalent mutants-with a classification precision of 75% and a recall of 56%; and (d)coverage as an equivalence detector is superior to the state of the art, in particular violations of dynamic invariants. Our detectors have been released as part of the open source Javalanche framework; the data set is publicly available for replication and extension of experiments.
突变测试通过在程序中植入人工缺陷(突变)来度量测试套件的充分性。如果测试套件未能检测到突变,那么它也可能无法检测到真正的缺陷——因此应该进行改进。然而,也有一些突变使程序语义保持不变,因此无法被任何测试套件检测到。必须手动清除这些等效的突变体,这是一项繁琐的任务。在本文中,我们研究了覆盖率的变化是否可以用于检测非等效突变:如果突变改变了运行的覆盖率,则更有可能是非等效的。在对7个Java程序的140个人工分类突变的样本中,我们发现(a)这个问题是严重和普遍的——在所有未被检测到的突变中,大约有45%是相同的;(b)人工分类需要时间——每个突变大约15分钟;(c)覆盖度是一种简单、高效、有效的识别等效突变体的方法——分类精度为75%,召回率为56%;(d)覆盖范围作为等效检测器优于现有技术,特别是违反动态不变量的情况。我们的检测器已经作为开源Javalanche框架的一部分发布;数据集是公开的,可用于复制和扩展实验。
{"title":"(Un-)Covering Equivalent Mutants","authors":"David Schuler, A. Zeller","doi":"10.1109/ICST.2010.30","DOIUrl":"https://doi.org/10.1109/ICST.2010.30","url":null,"abstract":"Mutation testing measures the adequacy of a test suite by seeding artificial defects (mutations) into a program. If a test suite fails to detect a mutation, it may also fail to detect real defects-and hence should be improved. However, there also are mutations which keep the program semantics unchanged and thus cannot be detected by any test suite. Such equivalent mutants must be weeded out manually, which is a tedious task. In this paper, we examine whether changes in coverage can be used to detect non-equivalent mutants: If a mutant changes the coverage of a run, it is more likely to be non-equivalent. Ina sample of 140 manually classified mutations of seven Java programs with 5,000to 100,000 lines of code, we found that: (a) the problem is serious and widespread-about 45% of all undetected mutants turned out to be equivalent;(b) manual classification takes time-about 15 minutes per mutation; (c)coverage is a simple, efficient, and effective means to identify equivalent mutants-with a classification precision of 75% and a recall of 56%; and (d)coverage as an equivalence detector is superior to the state of the art, in particular violations of dynamic invariants. Our detectors have been released as part of the open source Javalanche framework; the data set is publicly available for replication and extension of experiments.","PeriodicalId":192678,"journal":{"name":"2010 Third International Conference on Software Testing, Verification and Validation","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114919149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 127
期刊
2010 Third International Conference on Software Testing, Verification and Validation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1