巧合正确性对基于覆盖率的故障检测和定位有多大危害?实证研究

IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Software Testing Verification & Reliability Pub Date : 2021-01-09 DOI:10.1002/stvr.1762
R. A. Assi, Wes Masri, Chadi Trad
{"title":"巧合正确性对基于覆盖率的故障检测和定位有多大危害?实证研究","authors":"R. A. Assi, Wes Masri, Chadi Trad","doi":"10.1002/stvr.1762","DOIUrl":null,"url":null,"abstract":"According to the reachability–infection–propagation (RIP) model, three conditions must be satisfied for program failure to occur: (1) the defect's location must be reached, (2) the program's state must become infected and (3) the infection must propagate to the output. Weak coincidental correctness (or weak CC) occurs when the program produces the correct output, while condition (1) is satisfied but conditions (2) and (3) are not satisfied. Strong coincidental correctness (or strong CC) occurs when the output is correct, while both conditions (1) and (2) are satisfied but not (3). The prevalence of CC was previously recognized. In addition, the potential for its negative effect on spectrum‐based fault localization (SBFL) was analytically demonstrated; however, this was not empirically validated. Using Defects4J, this paper empirically studies the impact of weak and strong CC on three well‐researched coverage‐based fault detection and localization techniques, namely, test suite reduction (TSR), test case prioritization (TCP) and SBFL. Our study, which involved 52 SBFL metrics, provides the following empirical evidence. (i) The negative impact of CC tests on TSR and TCP is very significant. In addition, cleansing the CC tests was observed to yield (a) a 100% TSR defect detection rate for all subject programs and (b) an improvement of TCP for over 92% of the subjects. (ii) The impact of CC tests on SBFL varies widely w.r.t. the metric used. The negative impact was strong for 11 metrics, mild for 37, non‐measurable for 1 and non‐existent for 3 metrics. Interestingly, the negative impact was mild for the 9 most popular and/or most effective SBFL metrics. In addition, cleansing the CC tests resulted in the deterioration of SBFL for a considerable number of subject programs. (iii) Increasing the proportion of CC tests has a limited impact on TSR, TCP and SBFL. Interestingly, for TSR and TCP and 11 SBFL metrics, small and large proportions of CC tests are strongly harmful. (iv) Lastly, weak and strong CC are equally detrimental in the context of TSR, TCP and SBFL.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"44 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2021-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"How detrimental is coincidental correctness to coverage‐based fault detection and localization? An empirical study\",\"authors\":\"R. A. Assi, Wes Masri, Chadi Trad\",\"doi\":\"10.1002/stvr.1762\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"According to the reachability–infection–propagation (RIP) model, three conditions must be satisfied for program failure to occur: (1) the defect's location must be reached, (2) the program's state must become infected and (3) the infection must propagate to the output. Weak coincidental correctness (or weak CC) occurs when the program produces the correct output, while condition (1) is satisfied but conditions (2) and (3) are not satisfied. Strong coincidental correctness (or strong CC) occurs when the output is correct, while both conditions (1) and (2) are satisfied but not (3). The prevalence of CC was previously recognized. In addition, the potential for its negative effect on spectrum‐based fault localization (SBFL) was analytically demonstrated; however, this was not empirically validated. Using Defects4J, this paper empirically studies the impact of weak and strong CC on three well‐researched coverage‐based fault detection and localization techniques, namely, test suite reduction (TSR), test case prioritization (TCP) and SBFL. Our study, which involved 52 SBFL metrics, provides the following empirical evidence. (i) The negative impact of CC tests on TSR and TCP is very significant. In addition, cleansing the CC tests was observed to yield (a) a 100% TSR defect detection rate for all subject programs and (b) an improvement of TCP for over 92% of the subjects. (ii) The impact of CC tests on SBFL varies widely w.r.t. the metric used. The negative impact was strong for 11 metrics, mild for 37, non‐measurable for 1 and non‐existent for 3 metrics. Interestingly, the negative impact was mild for the 9 most popular and/or most effective SBFL metrics. In addition, cleansing the CC tests resulted in the deterioration of SBFL for a considerable number of subject programs. (iii) Increasing the proportion of CC tests has a limited impact on TSR, TCP and SBFL. Interestingly, for TSR and TCP and 11 SBFL metrics, small and large proportions of CC tests are strongly harmful. (iv) Lastly, weak and strong CC are equally detrimental in the context of TSR, TCP and SBFL.\",\"PeriodicalId\":49506,\"journal\":{\"name\":\"Software Testing Verification & Reliability\",\"volume\":\"44 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2021-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Software Testing Verification & Reliability\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1002/stvr.1762\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software Testing Verification & Reliability","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/stvr.1762","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 5

摘要

根据可达性-感染-传播(RIP)模型,程序发生故障必须满足三个条件:(1)必须达到缺陷的位置,(2)程序的状态必须被感染,(3)感染必须传播到输出。当程序产生正确的输出,而满足条件(1)而不满足条件(2)和(3)时,发生弱巧合正确性(或弱CC)。当输出是正确的,同时满足条件(1)和条件(2),但不满足条件(3)时,就会出现强巧合正确性(或强CC)。CC的普遍存在是以前认识到的。此外,还分析了其对基于频谱的故障定位(SBFL)的潜在负面影响;然而,这并没有得到实证的验证。利用缺陷4j,本文实证研究了弱CC和强CC对三种基于覆盖率的故障检测和定位技术的影响,即测试套件缩减(TSR)、测试用例优先级(TCP)和SBFL。我们的研究涉及52个SBFL指标,提供了以下经验证据。(i) CC测试对TSR和TCP的负面影响非常显著。此外,清理CC测试被观察到产生(a)所有受试者程序的100% TSR缺陷检出率和(b)超过92%的受试者的TCP改进。(二)CC试验对sffl的影响因所使用的度量而有很大差异。11个指标的负面影响较强,37个指标的负面影响较轻,1个指标的负面影响不可测量,3个指标的负面影响不存在。有趣的是,对9个最流行和/或最有效的SBFL指标的负面影响是温和的。此外,清除CC测试导致相当数量的受试者项目的SBFL恶化。(三)增加CC测试的比例对TSR、TCP和sffl的影响有限。有趣的是,对于TSR、TCP和11个SBFL指标,CC测试的大小比例都是非常有害的。(iv)最后,在TSR、TCP和sffl的背景下,弱CC和强CC同样有害。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
How detrimental is coincidental correctness to coverage‐based fault detection and localization? An empirical study
According to the reachability–infection–propagation (RIP) model, three conditions must be satisfied for program failure to occur: (1) the defect's location must be reached, (2) the program's state must become infected and (3) the infection must propagate to the output. Weak coincidental correctness (or weak CC) occurs when the program produces the correct output, while condition (1) is satisfied but conditions (2) and (3) are not satisfied. Strong coincidental correctness (or strong CC) occurs when the output is correct, while both conditions (1) and (2) are satisfied but not (3). The prevalence of CC was previously recognized. In addition, the potential for its negative effect on spectrum‐based fault localization (SBFL) was analytically demonstrated; however, this was not empirically validated. Using Defects4J, this paper empirically studies the impact of weak and strong CC on three well‐researched coverage‐based fault detection and localization techniques, namely, test suite reduction (TSR), test case prioritization (TCP) and SBFL. Our study, which involved 52 SBFL metrics, provides the following empirical evidence. (i) The negative impact of CC tests on TSR and TCP is very significant. In addition, cleansing the CC tests was observed to yield (a) a 100% TSR defect detection rate for all subject programs and (b) an improvement of TCP for over 92% of the subjects. (ii) The impact of CC tests on SBFL varies widely w.r.t. the metric used. The negative impact was strong for 11 metrics, mild for 37, non‐measurable for 1 and non‐existent for 3 metrics. Interestingly, the negative impact was mild for the 9 most popular and/or most effective SBFL metrics. In addition, cleansing the CC tests resulted in the deterioration of SBFL for a considerable number of subject programs. (iii) Increasing the proportion of CC tests has a limited impact on TSR, TCP and SBFL. Interestingly, for TSR and TCP and 11 SBFL metrics, small and large proportions of CC tests are strongly harmful. (iv) Lastly, weak and strong CC are equally detrimental in the context of TSR, TCP and SBFL.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Software Testing Verification & Reliability
Software Testing Verification & Reliability 工程技术-计算机:软件工程
CiteScore
3.70
自引率
0.00%
发文量
34
审稿时长
>12 weeks
期刊介绍: The journal is the premier outlet for research results on the subjects of testing, verification and reliability. Readers will find useful research on issues pertaining to building better software and evaluating it. The journal is unique in its emphasis on theoretical foundations and applications to real-world software development. The balance of theory, empirical work, and practical applications provide readers with better techniques for testing, verifying and improving the reliability of software. The journal targets researchers, practitioners, educators and students that have a vested interest in results generated by high-quality testing, verification and reliability modeling and evaluation of software. Topics of special interest include, but are not limited to: -New criteria for software testing and verification -Application of existing software testing and verification techniques to new types of software, including web applications, web services, embedded software, aspect-oriented software, and software architectures -Model based testing -Formal verification techniques such as model-checking -Comparison of testing and verification techniques -Measurement of and metrics for testing, verification and reliability -Industrial experience with cutting edge techniques -Descriptions and evaluations of commercial and open-source software testing tools -Reliability modeling, measurement and application -Testing and verification of software security -Automated test data generation -Process issues and methods -Non-functional testing
期刊最新文献
Model‐based testing, test case prioritization and testing of virtual reality applications In vivo testing and integration of proving and testing Mutation testing optimisations using the Clang front‐end Semantic‐aware two‐phase test case prioritization for continuous integration Exploiting deep reinforcement learning and metamorphic testing to automatically test virtual reality applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1