Mutant Reduction Evaluation: What is There and What is Missing?

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2021-02-05 DOI:10.1145/3522578

Peng Zhang, Yang Wang, Xutong Liu, Yanhui Li, Yibao Yang, Ziyuan Wang, Xiaoyu Zhou, Lin Chen, Yuming Zhou

{"title":"Mutant Reduction Evaluation: What is There and What is Missing?","authors":"Peng Zhang, Yang Wang, Xutong Liu, Yanhui Li, Yibao Yang, Ziyuan Wang, Xiaoyu Zhou, Lin Chen, Yuming Zhou","doi":"10.1145/3522578","DOIUrl":null,"url":null,"abstract":"Background. Mutation testing is a commonly used defect injection technique for evaluating the effectiveness of a test suite. However, it is usually computationally expensive. Therefore, many mutation reduction strategies, which aim to reduce the number of mutants, have been proposed. Problem. It is important to measure the ability of a mutation reduction strategy to maintain test suite effectiveness evaluation. However, existing evaluation indicators are unable to measure the “order-preserving ability”, i.e., to what extent the mutation score order among test suites is maintained before and after mutation reduction. As a result, misleading conclusions can be achieved when using existing indicators to evaluate the reduction effectiveness. Objective. We aim to propose evaluation indicators to measure the “order-preserving ability” of a mutation reduction strategy, which is important but missing in our community. Method. Given a test suite on a Software Under Test (SUT) with a set of original mutants, we leverage the test suite to generate a group of test suites that have a partial order relationship in defect detecting ability. When evaluating a reduction strategy, we first construct two partial order relationships among the generated test suites in terms of mutation score, one with the original mutants and another with the reduced mutants. Then, we measure the extent to which the partial order under the original mutants remains unchanged in the partial order under the reduced mutants. The more partial order is unchanged, the stronger the Order Preservation (OP) of the mutation reduction strategy is, and the more effective the reduction strategy is. Furthermore, we propose Effort-aware Relative Order Preservation (EROP) to measure how much gain a mutation reduction strategy can provide compared with a random reduction strategy. Result. The experimental results show that OP and EROP are able to efficiently measure the “order-preserving ability” of a mutation reduction strategy. As a result, they have a better ability to distinguish various mutation reduction strategies compared with the existing evaluation indicators. In addition, we find that Subsuming Mutant Selection (SMS) and Clustering Mutant Selection (CMS) are more effective than the other strategies under OP and EROP. Conclusion. We suggest, for the researchers, that OP and EROP should be used to measure the effectiveness of a mutant reduction strategy, and for the practitioners, that SMS and CMS should be given priority in practice.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"23 1","pages":"1 - 46"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3522578","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Background. Mutation testing is a commonly used defect injection technique for evaluating the effectiveness of a test suite. However, it is usually computationally expensive. Therefore, many mutation reduction strategies, which aim to reduce the number of mutants, have been proposed. Problem. It is important to measure the ability of a mutation reduction strategy to maintain test suite effectiveness evaluation. However, existing evaluation indicators are unable to measure the “order-preserving ability”, i.e., to what extent the mutation score order among test suites is maintained before and after mutation reduction. As a result, misleading conclusions can be achieved when using existing indicators to evaluate the reduction effectiveness. Objective. We aim to propose evaluation indicators to measure the “order-preserving ability” of a mutation reduction strategy, which is important but missing in our community. Method. Given a test suite on a Software Under Test (SUT) with a set of original mutants, we leverage the test suite to generate a group of test suites that have a partial order relationship in defect detecting ability. When evaluating a reduction strategy, we first construct two partial order relationships among the generated test suites in terms of mutation score, one with the original mutants and another with the reduced mutants. Then, we measure the extent to which the partial order under the original mutants remains unchanged in the partial order under the reduced mutants. The more partial order is unchanged, the stronger the Order Preservation (OP) of the mutation reduction strategy is, and the more effective the reduction strategy is. Furthermore, we propose Effort-aware Relative Order Preservation (EROP) to measure how much gain a mutation reduction strategy can provide compared with a random reduction strategy. Result. The experimental results show that OP and EROP are able to efficiently measure the “order-preserving ability” of a mutation reduction strategy. As a result, they have a better ability to distinguish various mutation reduction strategies compared with the existing evaluation indicators. In addition, we find that Subsuming Mutant Selection (SMS) and Clustering Mutant Selection (CMS) are more effective than the other strategies under OP and EROP. Conclusion. We suggest, for the researchers, that OP and EROP should be used to measure the effectiveness of a mutant reduction strategy, and for the practitioners, that SMS and CMS should be given priority in practice.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

突变减少评估:有什么和缺少什么?

背景。突变测试是一种常用的缺陷注入技术，用于评估测试套件的有效性。然而，它通常在计算上很昂贵。因此，人们提出了许多旨在减少突变体数量的突变减少策略。问题。度量突变减少策略的能力以维护测试套件的有效性评估是很重要的。然而，现有的评价指标无法衡量“保序能力”，即在突变还原前后测试套件之间的突变评分顺序保持到什么程度。因此，在使用现有指标评价减排效果时，可能会得出误导性结论。目标。我们的目标是提出评价指标来衡量突变减少策略的“保序能力”，这是我们社区中重要但缺失的。方法。给定一个带有一组原始突变的在测软件(SUT)上的测试套件，我们利用该测试套件来生成一组在缺陷检测能力中具有偏序关系的测试套件。在评估约简策略时，我们首先根据突变得分在生成的测试套件之间构建两个偏序关系，一个是原始突变体，另一个是减少的突变体。然后，我们测量了原始突变下的偏序在减少突变下的偏序保持不变的程度。偏序越不变，说明突变约简策略的序保持性(OP)越强，约简策略越有效。此外，我们提出了努力感知相对顺序保存(EROP)来衡量与随机约简策略相比，突变约简策略可以提供多少增益。结果。实验结果表明，OP和EROP能够有效地衡量突变约简策略的“保序能力”。因此，与现有的评价指标相比，它们具有更好的区分各种减少突变策略的能力。此外，我们发现在OP和EROP条件下，包含突变选择(Subsuming Mutant Selection, SMS)和聚类突变选择(Clustering Mutant Selection, CMS)比其他策略更有效。我们建议，对于研究人员来说，应该使用OP和EROP来衡量突变减少策略的有效性;对于从业者来说，在实践中应该优先考虑SMS和CMS。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助