Evaluating the Trade-offs of Text-based Diversity in Test Prioritisation

2023 IEEE/ACM International Conference on Automation of Software Test (AST) Pub Date : 2023-05-01 DOI:10.1109/AST58925.2023.00021

Ranim Khojah, Chi Hong Chao, F. D. O. Neto

{"title":"Evaluating the Trade-offs of Text-based Diversity in Test Prioritisation","authors":"Ranim Khojah, Chi Hong Chao, F. D. O. Neto","doi":"10.1109/AST58925.2023.00021","DOIUrl":null,"url":null,"abstract":"Diversity-based techniques (DBT) have been cost-effective by prioritizing the most dissimilar test cases to detect faults at earlier stages of test execution. Diversity is measured on test specifications to convey how different test cases are from one another. However, there is little research on the trade-off of diversity measures based on different types of text-based specification (lexicographical or semantics). Particularly because the text content in test scripts vary widely from unit (e.g., code) to system-level (e.g., natural language). This paper compares and evaluates the cost-effectiveness in coverage and failures of different text-based diversity measures for different levels of tests. We perform an experiment on the test suites of 7 open source projects on the unit level, and 2 industry projects on the integration and system level. Our results show that test suites prioritised using semantic-based diversity measures causes a small improvement in requirements coverage, as opposed to lexical diversity that showed less coverage than random for system-level artefacts. In contrast, using lexical-based measures such as Jaccard or Levenshtein to prioritise code artefacts yield better failure coverage across all levels of tests. We summarise our findings into a list of recommendations for using semantic or lexical diversity on different levels of testing.","PeriodicalId":252417,"journal":{"name":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM International Conference on Automation of Software Test (AST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AST58925.2023.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Diversity-based techniques (DBT) have been cost-effective by prioritizing the most dissimilar test cases to detect faults at earlier stages of test execution. Diversity is measured on test specifications to convey how different test cases are from one another. However, there is little research on the trade-off of diversity measures based on different types of text-based specification (lexicographical or semantics). Particularly because the text content in test scripts vary widely from unit (e.g., code) to system-level (e.g., natural language). This paper compares and evaluates the cost-effectiveness in coverage and failures of different text-based diversity measures for different levels of tests. We perform an experiment on the test suites of 7 open source projects on the unit level, and 2 industry projects on the integration and system level. Our results show that test suites prioritised using semantic-based diversity measures causes a small improvement in requirements coverage, as opposed to lexical diversity that showed less coverage than random for system-level artefacts. In contrast, using lexical-based measures such as Jaccard or Levenshtein to prioritise code artefacts yield better failure coverage across all levels of tests. We summarise our findings into a list of recommendations for using semantic or lexical diversity on different levels of testing.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估测试优先级中基于文本的多样性的权衡

基于多样性的技术(DBT)通过对最不相似的测试用例进行优先排序，从而在测试执行的早期阶段检测故障，从而节省了成本。多样性是在测试规范上测量的，以传达测试用例之间的差异。然而，关于基于不同类型的文本规范(词典或语义)的多样性度量权衡的研究很少。特别是因为测试脚本中的文本内容从单元(例如，代码)到系统级(例如，自然语言)变化很大。本文比较和评估了不同文本多样性措施在不同测试水平下的覆盖和失败的成本效益。我们在单元层面上对7个开源项目的测试套件进行实验，在集成和系统层面上对2个行业项目的测试套件进行实验。我们的结果表明，使用基于语义的多样性度量对测试套件进行优先排序，会导致需求覆盖率的小幅提高，而与之相对的是，对于系统级工件，词汇多样性显示的覆盖率比随机的要少。相反，使用诸如Jaccard或Levenshtein之类的基于词汇的度量来确定代码工件的优先级，可以在所有级别的测试中获得更好的失败覆盖率。我们将我们的发现总结为在不同水平的测试中使用语义或词汇多样性的建议列表。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE/ACM International Conference on Automation of Software Test (AST)

自引率

0.00%

发文量