Towards More Precise Coincidental Correctness Detection With Deep Semantic Learning

IF 5.6 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-10-16 DOI:10.1109/TSE.2024.3481893

Huan Xie;Yan Lei;Meng Yan;Shanshan Li;Xiaoguang Mao;Yue Yu;David Lo

{"title":"Towards More Precise Coincidental Correctness Detection With Deep Semantic Learning","authors":"Huan Xie;Yan Lei;Meng Yan;Shanshan Li;Xiaoguang Mao;Yue Yu;David Lo","doi":"10.1109/TSE.2024.3481893","DOIUrl":null,"url":null,"abstract":"Coincidental correctness (CC) is a situation during the execution of a test case, the buggy entity is executed, but the program behaves correctly as expected. Many automated fault localization (FL) techniques use runtime information to discover the underlying connection between the executed buggy entity and the failing test result. The existence of CC will weaken such connection, mislead the FL algorithms to build inaccurate models, and consequently, decrease the localization accuracy. To alleviate the adverse effect of CC on FL, CC detection techniques have been proposed to identify the possible CC tests via heuristic or machine learning algorithms. However, their performance on precision is not satisfactory since they overestimate the possible CC tests and are insufficient in learning the deep semantic features. In this work, we propose a novel \nTri\nplet network-based \nCo\nincidental \nCo\nrrectness detection technique (\ni.e.,\n \nTriCoCo\n) to overcome the limitations of the prior works. \nTriCoCo\n narrows the possible CC tests by designing three features to identify genuine passing tests. Instead of using all tests as inputs by existing techniques, \nTriCoCo\n takes the identified genuine passing tests and failing ones to train a triplet model that can evaluate their relative distance. Finally, \nTriCoCo\n infers the probability of being a CC test of the test in the rest of the passing tests by using the trained triplet model. We conduct large-scale experiments to evaluate \nTriCoCo\n based on the widely-used Defects4J benchmark. The results demonstrate that \nTriCoCo\n can improve not only the precision of CC detection but also the effectiveness of FL techniques, \ne.g.,\n the precision of \nTriCoCo\n is 80.33\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n on average, and \nTriCoCo\n boosts the efficacy of DStar by 18\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n–74\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n in terms of MFR metric when compared to seven state-of-the-art CC detection baselines.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 12","pages":"3265-3289"},"PeriodicalIF":5.6000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10720528/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Coincidental correctness (CC) is a situation during the execution of a test case, the buggy entity is executed, but the program behaves correctly as expected. Many automated fault localization (FL) techniques use runtime information to discover the underlying connection between the executed buggy entity and the failing test result. The existence of CC will weaken such connection, mislead the FL algorithms to build inaccurate models, and consequently, decrease the localization accuracy. To alleviate the adverse effect of CC on FL, CC detection techniques have been proposed to identify the possible CC tests via heuristic or machine learning algorithms. However, their performance on precision is not satisfactory since they overestimate the possible CC tests and are insufficient in learning the deep semantic features. In this work, we propose a novel Tri plet network-based Co incidental Co rrectness detection technique ( i.e., TriCoCo ) to overcome the limitations of the prior works. TriCoCo narrows the possible CC tests by designing three features to identify genuine passing tests. Instead of using all tests as inputs by existing techniques, TriCoCo takes the identified genuine passing tests and failing ones to train a triplet model that can evaluate their relative distance. Finally, TriCoCo infers the probability of being a CC test of the test in the rest of the passing tests by using the trained triplet model. We conduct large-scale experiments to evaluate TriCoCo based on the widely-used Defects4J benchmark. The results demonstrate that TriCoCo can improve not only the precision of CC detection but also the effectiveness of FL techniques, e.g., the precision of TriCoCo is 80.33

$\%$

on average, and TriCoCo boosts the efficacy of DStar by 18

$\%$

–74

$\%$

in terms of MFR metric when compared to seven state-of-the-art CC detection baselines.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用深度语义学习实现更精确的巧合正确性检测

巧合正确性（CC）是在执行测试用例期间的一种情况，有bug的实体被执行，但程序的行为如预期的那样正确。许多自动化故障定位（FL）技术使用运行时信息来发现执行的错误实体和失败测试结果之间的底层连接。CC的存在会削弱这种联系，误导FL算法建立不准确的模型，从而降低定位精度。为了减轻CC对FL的不利影响，已经提出了CC检测技术，通过启发式或机器学习算法来识别可能的CC测试。然而，由于它们高估了可能的CC测试，并且在深度语义特征的学习方面存在不足，因此在精度上的表现并不令人满意。在这项工作中，我们提出了一种新的基于三重网络的巧合正确性检测技术（即TriCoCo）来克服先前工作的局限性。TriCoCo通过设计三个特征来识别真正的合格测试，从而缩小了可能的CC测试。TriCoCo并没有将所有测试作为现有技术的输入，而是将识别出的真正通过和未通过的测试训练成一个三重模型，以评估它们的相对距离。最后，TriCoCo使用训练好的三元组模型推断出在其余通过的测试中该测试是CC测试的概率。我们进行了大规模的实验，以基于广泛使用的Defects4J基准来评估TriCoCo。结果表明，TriCoCo不仅可以提高CC检测的精度，而且可以提高FL技术的有效性，例如，与7种最先进的CC检测基线相比，TriCoCo的精度平均为80.33%美元，而在MFR指标方面，TriCoCo将DStar的效率提高了18%美元- 74%美元。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Software Engineering 工程技术-工程：电子与电气

CiteScore

9.70

自引率

10.80%

发文量

724

审稿时长

6 months

期刊介绍： IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.