Xavier Devroey, Alessio Gambi, Juan P. Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, Sebastiano Panichella
{"title":"JUGE:对Java单元测试生成器进行基准测试的基础设施","authors":"Xavier Devroey, Alessio Gambi, Juan P. Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, Sebastiano Panichella","doi":"10.1002/stvr.1838","DOIUrl":null,"url":null,"abstract":"Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present our JUnit Generation Benchmarking Infrastructure (JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place where JUGE was used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact of JUGE in improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, the JUGE infrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"120 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2021-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"JUGE: An infrastructure for benchmarking Java unit test generators\",\"authors\":\"Xavier Devroey, Alessio Gambi, Juan P. Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, Sebastiano Panichella\",\"doi\":\"10.1002/stvr.1838\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present our JUnit Generation Benchmarking Infrastructure (JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place where JUGE was used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact of JUGE in improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, the JUGE infrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.\",\"PeriodicalId\":49506,\"journal\":{\"name\":\"Software Testing Verification & Reliability\",\"volume\":\"120 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2021-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Software Testing Verification & Reliability\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1002/stvr.1838\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software Testing Verification & Reliability","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/stvr.1838","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 4
摘要
研究者和实践者已经设计并实现了各种自动化的测试用例生成器来支持有效的软件测试。这些生成器适用于各种语言(例如Java、c#或Python)和各种平台(例如桌面、web或移动应用程序)。生成器表现出不同的有效性和效率,这取决于它们旨在满足的测试目标(例如,库的单元测试与整个应用程序的系统测试)和它们实现的底层技术。在这种情况下,从业者需要能够比较不同的生成器,以确定最适合他们的需求,而研究人员则寻求确定未来的研究方向。这可以通过系统地执行不同生成器的大规模评估来实现。然而,执行这样的经验评估并不是微不足道的,并且需要大量的努力来选择适当的基准,设置评估基础结构,并收集和分析结果。在本软件说明中,我们介绍了我们的JUnit生成基准基础设施(JUGE),它支持生成器(基于搜索的、基于随机的、符号执行的等),旨在为各种目的(验证、回归测试、故障定位等)自动生成单元测试。主要目标是减少总体基准测试工作,简化几个生成器的比较,并通过标准化评估和比较过程加强学术界和工业界之间的知识转移。自2013年以来,已经举办了几个版本的单元测试工具竞赛,这些竞赛与基于搜索的软件测试研讨会(Search - Based Software testing Workshop)同地举行,在那里使用和发展了JUGE。因此,越来越多来自学术界和工业界的工具(超过10种)在JUGE上进行了评估,这些工具经过多年的发展已经成熟,并且可以确定未来的研究方向。基于从竞赛中获得的经验,我们讨论了JUGE在改善学术界和工业界之间测试生成工具和方法的知识转移方面的预期影响。实际上,JUGE基础架构展示了一种实现设计,它足够灵活,可以集成额外的单元测试生成工具,这对开发人员来说是实用的,并且允许研究人员试验新的和先进的单元测试工具和方法。
JUGE: An infrastructure for benchmarking Java unit test generators
Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present our JUnit Generation Benchmarking Infrastructure (JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place where JUGE was used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact of JUGE in improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, the JUGE infrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.
期刊介绍:
The journal is the premier outlet for research results on the subjects of testing, verification and reliability. Readers will find useful research on issues pertaining to building better software and evaluating it.
The journal is unique in its emphasis on theoretical foundations and applications to real-world software development. The balance of theory, empirical work, and practical applications provide readers with better techniques for testing, verifying and improving the reliability of software.
The journal targets researchers, practitioners, educators and students that have a vested interest in results generated by high-quality testing, verification and reliability modeling and evaluation of software. Topics of special interest include, but are not limited to:
-New criteria for software testing and verification
-Application of existing software testing and verification techniques to new types of software, including web applications, web services, embedded software, aspect-oriented software, and software architectures
-Model based testing
-Formal verification techniques such as model-checking
-Comparison of testing and verification techniques
-Measurement of and metrics for testing, verification and reliability
-Industrial experience with cutting edge techniques
-Descriptions and evaluations of commercial and open-source software testing tools
-Reliability modeling, measurement and application
-Testing and verification of software security
-Automated test data generation
-Process issues and methods
-Non-functional testing