Ju-Yeol Yu, Yan Lei, Huan Xie, Lingfeng Fu, Chunyan Liu
{"title":"Context-based Cluster Fault Localization","authors":"Ju-Yeol Yu, Yan Lei, Huan Xie, Lingfeng Fu, Chunyan Liu","doi":"10.1145/3524610.3527891","DOIUrl":null,"url":null,"abstract":"Automated fault localization techniques collect runtime information as input data to identify suspicious statement potentially respon-sible for program failures. To discover the statistical coincidences between test results (i.e., failing or passing) and the executions of the different statements of a program (i.e., executed or not exe-cuted), researchers developed a suspiciousness methodology (e.g., spectrum-based formulas and deep neural network models). How-ever, the occurrences of coincidental correctness (CC) which means the faulty statements were executed but the output of the program was right affect the effectiveness of fault localization. Many re-searchers seek to identify CC tests using cluster analysis. However, the high-dimensional data containing too much noise reduce the effectiveness of cluster analysis. To overcome the obstacle, we propose CBCFL: a context-based cluster fault localization approach, which incorporates a failure context showing how a failure is produced into cluster analysis. Specifically, CBCFL uses the failure context containing the state-ments whose execution affects the output of a failing test as input data for cluster analysis to improve the effectiveness of identifying CC tests. Since CC tests execute the faulty statement, we change the labels of CC tests into failing tests. We take the context and the corresponding changed labels as the input data for fault local-ization techniques. To evaluate the effectiveness of CBCFL, we conduct large-scale experiments on six large-sized programs using five state-of-the-art fault localization approaches. The experimen-tal results show that CBCFL is more effective than the baselines, e.g., our approach can improve the MLP-FL method using cluster analysis by at most 200%, 250%, and 320% under the Top-1, Top-5, and Top-10 accuracies.","PeriodicalId":426634,"journal":{"name":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","volume":" 35","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3524610.3527891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Automated fault localization techniques collect runtime information as input data to identify suspicious statement potentially respon-sible for program failures. To discover the statistical coincidences between test results (i.e., failing or passing) and the executions of the different statements of a program (i.e., executed or not exe-cuted), researchers developed a suspiciousness methodology (e.g., spectrum-based formulas and deep neural network models). How-ever, the occurrences of coincidental correctness (CC) which means the faulty statements were executed but the output of the program was right affect the effectiveness of fault localization. Many re-searchers seek to identify CC tests using cluster analysis. However, the high-dimensional data containing too much noise reduce the effectiveness of cluster analysis. To overcome the obstacle, we propose CBCFL: a context-based cluster fault localization approach, which incorporates a failure context showing how a failure is produced into cluster analysis. Specifically, CBCFL uses the failure context containing the state-ments whose execution affects the output of a failing test as input data for cluster analysis to improve the effectiveness of identifying CC tests. Since CC tests execute the faulty statement, we change the labels of CC tests into failing tests. We take the context and the corresponding changed labels as the input data for fault local-ization techniques. To evaluate the effectiveness of CBCFL, we conduct large-scale experiments on six large-sized programs using five state-of-the-art fault localization approaches. The experimen-tal results show that CBCFL is more effective than the baselines, e.g., our approach can improve the MLP-FL method using cluster analysis by at most 200%, 250%, and 320% under the Top-1, Top-5, and Top-10 accuracies.