Mutant Reduction Evaluation: What is There and What is Missing?

Peng Zhang, Yang Wang, Xutong Liu, Yanhui Li, Yibao Yang, Ziyuan Wang, Xiaoyu Zhou, Lin Chen, Yuming Zhou
{"title":"Mutant Reduction Evaluation: What is There and What is Missing?","authors":"Peng Zhang, Yang Wang, Xutong Liu, Yanhui Li, Yibao Yang, Ziyuan Wang, Xiaoyu Zhou, Lin Chen, Yuming Zhou","doi":"10.1145/3522578","DOIUrl":null,"url":null,"abstract":"Background. Mutation testing is a commonly used defect injection technique for evaluating the effectiveness of a test suite. However, it is usually computationally expensive. Therefore, many mutation reduction strategies, which aim to reduce the number of mutants, have been proposed. Problem. It is important to measure the ability of a mutation reduction strategy to maintain test suite effectiveness evaluation. However, existing evaluation indicators are unable to measure the “order-preserving ability”, i.e., to what extent the mutation score order among test suites is maintained before and after mutation reduction. As a result, misleading conclusions can be achieved when using existing indicators to evaluate the reduction effectiveness. Objective. We aim to propose evaluation indicators to measure the “order-preserving ability” of a mutation reduction strategy, which is important but missing in our community. Method. Given a test suite on a Software Under Test (SUT) with a set of original mutants, we leverage the test suite to generate a group of test suites that have a partial order relationship in defect detecting ability. When evaluating a reduction strategy, we first construct two partial order relationships among the generated test suites in terms of mutation score, one with the original mutants and another with the reduced mutants. Then, we measure the extent to which the partial order under the original mutants remains unchanged in the partial order under the reduced mutants. The more partial order is unchanged, the stronger the Order Preservation (OP) of the mutation reduction strategy is, and the more effective the reduction strategy is. Furthermore, we propose Effort-aware Relative Order Preservation (EROP) to measure how much gain a mutation reduction strategy can provide compared with a random reduction strategy. Result. The experimental results show that OP and EROP are able to efficiently measure the “order-preserving ability” of a mutation reduction strategy. As a result, they have a better ability to distinguish various mutation reduction strategies compared with the existing evaluation indicators. In addition, we find that Subsuming Mutant Selection (SMS) and Clustering Mutant Selection (CMS) are more effective than the other strategies under OP and EROP. Conclusion. We suggest, for the researchers, that OP and EROP should be used to measure the effectiveness of a mutant reduction strategy, and for the practitioners, that SMS and CMS should be given priority in practice.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"23 1","pages":"1 - 46"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3522578","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Background. Mutation testing is a commonly used defect injection technique for evaluating the effectiveness of a test suite. However, it is usually computationally expensive. Therefore, many mutation reduction strategies, which aim to reduce the number of mutants, have been proposed. Problem. It is important to measure the ability of a mutation reduction strategy to maintain test suite effectiveness evaluation. However, existing evaluation indicators are unable to measure the “order-preserving ability”, i.e., to what extent the mutation score order among test suites is maintained before and after mutation reduction. As a result, misleading conclusions can be achieved when using existing indicators to evaluate the reduction effectiveness. Objective. We aim to propose evaluation indicators to measure the “order-preserving ability” of a mutation reduction strategy, which is important but missing in our community. Method. Given a test suite on a Software Under Test (SUT) with a set of original mutants, we leverage the test suite to generate a group of test suites that have a partial order relationship in defect detecting ability. When evaluating a reduction strategy, we first construct two partial order relationships among the generated test suites in terms of mutation score, one with the original mutants and another with the reduced mutants. Then, we measure the extent to which the partial order under the original mutants remains unchanged in the partial order under the reduced mutants. The more partial order is unchanged, the stronger the Order Preservation (OP) of the mutation reduction strategy is, and the more effective the reduction strategy is. Furthermore, we propose Effort-aware Relative Order Preservation (EROP) to measure how much gain a mutation reduction strategy can provide compared with a random reduction strategy. Result. The experimental results show that OP and EROP are able to efficiently measure the “order-preserving ability” of a mutation reduction strategy. As a result, they have a better ability to distinguish various mutation reduction strategies compared with the existing evaluation indicators. In addition, we find that Subsuming Mutant Selection (SMS) and Clustering Mutant Selection (CMS) are more effective than the other strategies under OP and EROP. Conclusion. We suggest, for the researchers, that OP and EROP should be used to measure the effectiveness of a mutant reduction strategy, and for the practitioners, that SMS and CMS should be given priority in practice.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
突变减少评估:有什么和缺少什么?
背景。突变测试是一种常用的缺陷注入技术,用于评估测试套件的有效性。然而,它通常在计算上很昂贵。因此,人们提出了许多旨在减少突变体数量的突变减少策略。问题。度量突变减少策略的能力以维护测试套件的有效性评估是很重要的。然而,现有的评价指标无法衡量“保序能力”,即在突变还原前后测试套件之间的突变评分顺序保持到什么程度。因此,在使用现有指标评价减排效果时,可能会得出误导性结论。目标。我们的目标是提出评价指标来衡量突变减少策略的“保序能力”,这是我们社区中重要但缺失的。方法。给定一个带有一组原始突变的在测软件(SUT)上的测试套件,我们利用该测试套件来生成一组在缺陷检测能力中具有偏序关系的测试套件。在评估约简策略时,我们首先根据突变得分在生成的测试套件之间构建两个偏序关系,一个是原始突变体,另一个是减少的突变体。然后,我们测量了原始突变下的偏序在减少突变下的偏序保持不变的程度。偏序越不变,说明突变约简策略的序保持性(OP)越强,约简策略越有效。此外,我们提出了努力感知相对顺序保存(EROP)来衡量与随机约简策略相比,突变约简策略可以提供多少增益。结果。实验结果表明,OP和EROP能够有效地衡量突变约简策略的“保序能力”。因此,与现有的评价指标相比,它们具有更好的区分各种减少突变策略的能力。此外,我们发现在OP和EROP条件下,包含突变选择(Subsuming Mutant Selection, SMS)和聚类突变选择(Clustering Mutant Selection, CMS)比其他策略更有效。我们建议,对于研究人员来说,应该使用OP和EROP来衡量突变减少策略的有效性;对于从业者来说,在实践中应该优先考虑SMS和CMS。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Turnover of Companies in OpenStack: Prevalence and Rationale Super-optimization of Smart Contracts Verification of Programs Sensitive to Heap Layout Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones via Deep Learning Guaranteeing Timed Opacity using Parametric Timed Model Checking
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1