Evaluating the Trade-offs of Text-based Diversity in Test Prioritisation

Ranim Khojah, Chi Hong Chao, F. D. O. Neto
{"title":"Evaluating the Trade-offs of Text-based Diversity in Test Prioritisation","authors":"Ranim Khojah, Chi Hong Chao, F. D. O. Neto","doi":"10.1109/AST58925.2023.00021","DOIUrl":null,"url":null,"abstract":"Diversity-based techniques (DBT) have been cost-effective by prioritizing the most dissimilar test cases to detect faults at earlier stages of test execution. Diversity is measured on test specifications to convey how different test cases are from one another. However, there is little research on the trade-off of diversity measures based on different types of text-based specification (lexicographical or semantics). Particularly because the text content in test scripts vary widely from unit (e.g., code) to system-level (e.g., natural language). This paper compares and evaluates the cost-effectiveness in coverage and failures of different text-based diversity measures for different levels of tests. We perform an experiment on the test suites of 7 open source projects on the unit level, and 2 industry projects on the integration and system level. Our results show that test suites prioritised using semantic-based diversity measures causes a small improvement in requirements coverage, as opposed to lexical diversity that showed less coverage than random for system-level artefacts. In contrast, using lexical-based measures such as Jaccard or Levenshtein to prioritise code artefacts yield better failure coverage across all levels of tests. We summarise our findings into a list of recommendations for using semantic or lexical diversity on different levels of testing.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AST58925.2023.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Diversity-based techniques (DBT) have been cost-effective by prioritizing the most dissimilar test cases to detect faults at earlier stages of test execution. Diversity is measured on test specifications to convey how different test cases are from one another. However, there is little research on the trade-off of diversity measures based on different types of text-based specification (lexicographical or semantics). Particularly because the text content in test scripts vary widely from unit (e.g., code) to system-level (e.g., natural language). This paper compares and evaluates the cost-effectiveness in coverage and failures of different text-based diversity measures for different levels of tests. We perform an experiment on the test suites of 7 open source projects on the unit level, and 2 industry projects on the integration and system level. Our results show that test suites prioritised using semantic-based diversity measures causes a small improvement in requirements coverage, as opposed to lexical diversity that showed less coverage than random for system-level artefacts. In contrast, using lexical-based measures such as Jaccard or Levenshtein to prioritise code artefacts yield better failure coverage across all levels of tests. We summarise our findings into a list of recommendations for using semantic or lexical diversity on different levels of testing.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估测试优先级中基于文本的多样性的权衡
基于多样性的技术(DBT)通过对最不相似的测试用例进行优先排序,从而在测试执行的早期阶段检测故障,从而节省了成本。多样性是在测试规范上测量的,以传达测试用例之间的差异。然而,关于基于不同类型的文本规范(词典或语义)的多样性度量权衡的研究很少。特别是因为测试脚本中的文本内容从单元(例如,代码)到系统级(例如,自然语言)变化很大。本文比较和评估了不同文本多样性措施在不同测试水平下的覆盖和失败的成本效益。我们在单元层面上对7个开源项目的测试套件进行实验,在集成和系统层面上对2个行业项目的测试套件进行实验。我们的结果表明,使用基于语义的多样性度量对测试套件进行优先排序,会导致需求覆盖率的小幅提高,而与之相对的是,对于系统级工件,词汇多样性显示的覆盖率比随机的要少。相反,使用诸如Jaccard或Levenshtein之类的基于词汇的度量来确定代码工件的优先级,可以在所有级别的测试中获得更好的失败覆盖率。我们将我们的发现总结为在不同水平的测试中使用语义或词汇多样性的建议列表。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
FlakyCat: Predicting Flaky Tests Categories using Few-Shot Learning Evaluating the Trade-offs of Text-based Diversity in Test Prioritisation Cross-Project setting using Deep learning Architectures in Just-In-Time Software Fault Prediction: An Investigation AutoMetric: Towards Measuring Open-Source Software Quality Metrics Automatically Detecting Potential User-data Save & Export Losses due to Android App Termination
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1