Evaluating and Improving Fault Localization

2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE) Pub Date : 2017-05-01 DOI:10.1109/ICSE.2017.62

Spencer Pearson, José Campos, René Just, G. Fraser, Rui Abreu, Michael D. Ernst, D. Pang, Benjamin Keller

{"title":"Evaluating and Improving Fault Localization","authors":"Spencer Pearson, José Campos, René Just, G. Fraser, Rui Abreu, Michael D. Ernst, D. Pang, Benjamin Keller","doi":"10.1109/ICSE.2017.62","DOIUrl":null,"url":null,"abstract":"Most fault localization techniques take as input a faulty program, and produce as output a ranked list of suspicious code locations at which the program may be defective. When researchers propose a new fault localization technique, they typically evaluate it on programs with known faults. The technique is scored based on where in its output list the defective code appears. This enables the comparison of multiple fault localization techniques to determine which one is better. Previous research has evaluated fault localization techniques using artificial faults, generated either by mutation tools or manually. In other words, previous research has determined which fault localization techniques are best at finding artificial faults. However, it is not known which fault localization techniques are best at finding real faults. It is not obvious that the answer is the same, given previous work showing that artificial faults have both similarities to and differences from real faults. We performed a replication study to evaluate 10 claims in the literature that compared fault localization techniques (from the spectrum-based and mutation-based families). We used 2995 artificial faults in 6 real-world programs. Our results support 7 of the previous claims as statistically significant, but only 3 as having non-negligible effect sizes. Then, we evaluated the same 10 claims, using 310 real faults from the 6 programs. Every previous result was refuted or was statistically and practically insignificant. Our experiments show that artificial faults are not useful for predicting which fault localization techniques perform best on real faults. In light of these results, we identified a design space that includes many previously-studied fault localization techniques as well as hundreds of new techniques. We experimentally determined which factors in the design space are most important, using an overall set of 395 real faults. Then, we extended this design space with new techniques. Several of our novel techniques outperform all existing techniques, notably in terms of ranking defective code in the top-5 or top-10 reports.","PeriodicalId":6505,"journal":{"name":"2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)","volume":"39 1","pages":"609-620"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"321","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2017.62","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 321

Abstract

Most fault localization techniques take as input a faulty program, and produce as output a ranked list of suspicious code locations at which the program may be defective. When researchers propose a new fault localization technique, they typically evaluate it on programs with known faults. The technique is scored based on where in its output list the defective code appears. This enables the comparison of multiple fault localization techniques to determine which one is better. Previous research has evaluated fault localization techniques using artificial faults, generated either by mutation tools or manually. In other words, previous research has determined which fault localization techniques are best at finding artificial faults. However, it is not known which fault localization techniques are best at finding real faults. It is not obvious that the answer is the same, given previous work showing that artificial faults have both similarities to and differences from real faults. We performed a replication study to evaluate 10 claims in the literature that compared fault localization techniques (from the spectrum-based and mutation-based families). We used 2995 artificial faults in 6 real-world programs. Our results support 7 of the previous claims as statistically significant, but only 3 as having non-negligible effect sizes. Then, we evaluated the same 10 claims, using 310 real faults from the 6 programs. Every previous result was refuted or was statistically and practically insignificant. Our experiments show that artificial faults are not useful for predicting which fault localization techniques perform best on real faults. In light of these results, we identified a design space that includes many previously-studied fault localization techniques as well as hundreds of new techniques. We experimentally determined which factors in the design space are most important, using an overall set of 395 real faults. Then, we extended this design space with new techniques. Several of our novel techniques outperform all existing techniques, notably in terms of ranking defective code in the top-5 or top-10 reports.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估和改进故障定位

大多数故障定位技术将一个有故障的程序作为输入，并产生一个可疑代码位置的排序列表作为输出，在这些位置上程序可能存在缺陷。当研究人员提出一种新的故障定位技术时，他们通常会在已知故障的程序上进行评估。该技术的评分基于其输出列表中出现缺陷代码的位置。这样可以比较多种故障定位技术，以确定哪一种更好。以前的研究已经评估了使用人工故障的故障定位技术，这些故障要么是由突变工具产生的，要么是人工产生的。换句话说，之前的研究已经确定了哪种故障定位技术最适合发现人工故障。然而，目前尚不清楚哪种故障定位技术最适合发现实际故障。鉴于先前的研究表明，人工断层与真实断层既有相似之处，也有不同之处，因此答案并不明显相同。我们进行了一项重复研究，以评估文献中比较故障定位技术(来自基于频谱和基于突变的家族)的10项索赔。我们在6个真实的程序中使用了2995个人为故障。我们的结果支持先前的7项声明具有统计显著性，但只有3项具有不可忽略的效应大小。然后，我们使用来自6个程序的310个真实故障评估了相同的10个索赔。以前的每一个结果都被反驳了，或者在统计上和实际上是微不足道的。我们的实验表明，人工故障对于预测哪种故障定位技术在真实故障上表现最好是没有帮助的。根据这些结果，我们确定了一个设计空间，其中包括许多以前研究过的故障定位技术以及数百种新技术。我们通过实验确定了设计空间中哪些因素是最重要的，使用了395个真实故障的集合。然后，我们用新技术扩展了这个设计空间。我们的一些新技术优于所有现有技术，特别是在将缺陷代码排在前5名或前10名报告中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量

期刊最新文献

Adaptive Unpacking of Android Apps Symbolic Model Extraction for Web Application Verification On Cross-Stack Configuration Errors Syntactic and Semantic Differencing for Combinatorial Models of Test Designs Fuzzy Fine-Grained Code-History Analysis