Huan Xie;Yan Lei;Meng Yan;Shanshan Li;Xiaoguang Mao;Yue Yu;David Lo
{"title":"Towards More Precise Coincidental Correctness Detection With Deep Semantic Learning","authors":"Huan Xie;Yan Lei;Meng Yan;Shanshan Li;Xiaoguang Mao;Yue Yu;David Lo","doi":"10.1109/TSE.2024.3481893","DOIUrl":null,"url":null,"abstract":"Coincidental correctness (CC) is a situation during the execution of a test case, the buggy entity is executed, but the program behaves correctly as expected. Many automated fault localization (FL) techniques use runtime information to discover the underlying connection between the executed buggy entity and the failing test result. The existence of CC will weaken such connection, mislead the FL algorithms to build inaccurate models, and consequently, decrease the localization accuracy. To alleviate the adverse effect of CC on FL, CC detection techniques have been proposed to identify the possible CC tests via heuristic or machine learning algorithms. However, their performance on precision is not satisfactory since they overestimate the possible CC tests and are insufficient in learning the deep semantic features. In this work, we propose a novel \n<u>Tri</u>\nplet network-based \n<u>Co</u>\nincidental \n<u>Co</u>\nrrectness detection technique (\n<i>i.e.,</i>\n \n<b>TriCoCo</b>\n) to overcome the limitations of the prior works. \n<b>TriCoCo</b>\n narrows the possible CC tests by designing three features to identify genuine passing tests. Instead of using all tests as inputs by existing techniques, \n<b>TriCoCo</b>\n takes the identified genuine passing tests and failing ones to train a triplet model that can evaluate their relative distance. Finally, \n<b>TriCoCo</b>\n infers the probability of being a CC test of the test in the rest of the passing tests by using the trained triplet model. We conduct large-scale experiments to evaluate \n<b>TriCoCo</b>\n based on the widely-used Defects4J benchmark. The results demonstrate that \n<b>TriCoCo</b>\n can improve not only the precision of CC detection but also the effectiveness of FL techniques, \n<i>e.g.,</i>\n the precision of \n<b>TriCoCo</b>\n is 80.33\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n on average, and \n<b>TriCoCo</b>\n boosts the efficacy of DStar by 18\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n–74\n<inline-formula><tex-math>$\\%$</tex-math></inline-formula>\n in terms of MFR metric when compared to seven state-of-the-art CC detection baselines.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 12","pages":"3265-3289"},"PeriodicalIF":5.6000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10720528/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Coincidental correctness (CC) is a situation during the execution of a test case, the buggy entity is executed, but the program behaves correctly as expected. Many automated fault localization (FL) techniques use runtime information to discover the underlying connection between the executed buggy entity and the failing test result. The existence of CC will weaken such connection, mislead the FL algorithms to build inaccurate models, and consequently, decrease the localization accuracy. To alleviate the adverse effect of CC on FL, CC detection techniques have been proposed to identify the possible CC tests via heuristic or machine learning algorithms. However, their performance on precision is not satisfactory since they overestimate the possible CC tests and are insufficient in learning the deep semantic features. In this work, we propose a novel
Tri
plet network-based
Co
incidental
Co
rrectness detection technique (
i.e.,TriCoCo
) to overcome the limitations of the prior works.
TriCoCo
narrows the possible CC tests by designing three features to identify genuine passing tests. Instead of using all tests as inputs by existing techniques,
TriCoCo
takes the identified genuine passing tests and failing ones to train a triplet model that can evaluate their relative distance. Finally,
TriCoCo
infers the probability of being a CC test of the test in the rest of the passing tests by using the trained triplet model. We conduct large-scale experiments to evaluate
TriCoCo
based on the widely-used Defects4J benchmark. The results demonstrate that
TriCoCo
can improve not only the precision of CC detection but also the effectiveness of FL techniques,
e.g.,
the precision of
TriCoCo
is 80.33
$\%$
on average, and
TriCoCo
boosts the efficacy of DStar by 18
$\%$
–74
$\%$
in terms of MFR metric when compared to seven state-of-the-art CC detection baselines.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.