HetFL: Heterogeneous Graph-Based Software Fault Localization

IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING IEEE Transactions on Software Engineering Pub Date : 2024-09-05 DOI:10.1109/TSE.2024.3454605
Xin Chen;Tian Sun;Dongling Zhuang;Dongjin Yu;He Jiang;Zhide Zhou;Sicheng Li
{"title":"HetFL: Heterogeneous Graph-Based Software Fault Localization","authors":"Xin Chen;Tian Sun;Dongling Zhuang;Dongjin Yu;He Jiang;Zhide Zhou;Sicheng Li","doi":"10.1109/TSE.2024.3454605","DOIUrl":null,"url":null,"abstract":"Automated software fault localization has become one of the hot spots on which researchers have focused in recent years. Existing studies have shown that learning-based techniques can effectively localize faults leveraging various information. However, there exist two problems in these techniques. The first is that they simply represent various information without caring the contribution of different information. The second is that the data imbalance problem is not considered in these techniques. Thus, their effectiveness is limited in practice. In this paper, we propose HetFL, a novel heterogeneous graph-based software fault localization technique to aggregate different information into a heterogeneous graph in which program entities and test cases are regarded as nodes, and coverage, change histories, and call relationships are viewed as edges. HetFL first extracts textual and structure information from source code as attributes of nodes and integrates them to form an attribute vector. Then, for a given node, HetFL finds its neighbor nodes based on the types of edges and aggregates corresponding neighbor nodes to form type vectors. After that, the attribute vector and all the type vectors of each node are aggregated to generate the final vector representation by an attention mechanism. Finally, we leverage a convolution neural network (CNN) to obtain the suspicious score of each method. To validate the effectiveness of HetFL, experiments are conducted on the widely used dataset Defects4J (v1.2.0). The experimental results show that HetFL can localize 217 faults within Top-1 that is 25 higher than the state-of-the-art technique DeepFL, and achieve 6.37 and 5.58 in terms of MAR and MFR which improve DeepFL by 9.0% and 5.6%, respectively. In addition, we also perform experiments on the latest version of Defects4J (v2.0.0). The experimental results show that HetFL has better performance than the baseline methods.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2884-2905"},"PeriodicalIF":6.5000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10666908/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Automated software fault localization has become one of the hot spots on which researchers have focused in recent years. Existing studies have shown that learning-based techniques can effectively localize faults leveraging various information. However, there exist two problems in these techniques. The first is that they simply represent various information without caring the contribution of different information. The second is that the data imbalance problem is not considered in these techniques. Thus, their effectiveness is limited in practice. In this paper, we propose HetFL, a novel heterogeneous graph-based software fault localization technique to aggregate different information into a heterogeneous graph in which program entities and test cases are regarded as nodes, and coverage, change histories, and call relationships are viewed as edges. HetFL first extracts textual and structure information from source code as attributes of nodes and integrates them to form an attribute vector. Then, for a given node, HetFL finds its neighbor nodes based on the types of edges and aggregates corresponding neighbor nodes to form type vectors. After that, the attribute vector and all the type vectors of each node are aggregated to generate the final vector representation by an attention mechanism. Finally, we leverage a convolution neural network (CNN) to obtain the suspicious score of each method. To validate the effectiveness of HetFL, experiments are conducted on the widely used dataset Defects4J (v1.2.0). The experimental results show that HetFL can localize 217 faults within Top-1 that is 25 higher than the state-of-the-art technique DeepFL, and achieve 6.37 and 5.58 in terms of MAR and MFR which improve DeepFL by 9.0% and 5.6%, respectively. In addition, we also perform experiments on the latest version of Defects4J (v2.0.0). The experimental results show that HetFL has better performance than the baseline methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HetFL:基于异构图的软件故障定位
软件故障自动定位已成为近年来研究人员关注的热点之一。现有研究表明,基于学习的技术可以利用各种信息有效定位故障。然而,这些技术存在两个问题。一是它们只是简单地表示各种信息,而没有考虑不同信息的贡献。第二,这些技术没有考虑数据不平衡问题。因此,它们在实践中的效果有限。在本文中,我们提出了一种基于异构图的新型软件故障定位技术 HetFL,它将不同的信息聚合到一个异构图中,其中程序实体和测试用例被视为节点,覆盖率、变更历史和调用关系被视为边。HetFL 首先从源代码中提取文本和结构信息作为节点的属性,并将其整合形成属性向量。然后,对于给定的节点,HetFL 会根据边的类型找到它的邻居节点,并将相应的邻居节点聚合起来形成类型向量。然后,每个节点的属性向量和所有类型向量通过注意力机制进行聚合,生成最终的向量表示。最后,我们利用卷积神经网络(CNN)来获得每种方法的可疑得分。为了验证 HetFL 的有效性,我们在广泛使用的数据集 Defects4J(v1.2.0)上进行了实验。实验结果表明,HetFL 可以在 Top-1 范围内定位 217 个故障,比最先进的 DeepFL 高出 25 个,在 MAR 和 MFR 方面分别达到 6.37 和 5.58,比 DeepFL 分别提高了 9.0% 和 5.6%。此外,我们还在最新版本的 Defects4J(v2.0.0)上进行了实验。实验结果表明,HetFL 比基线方法具有更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering 工程技术-工程:电子与电气
CiteScore
9.70
自引率
10.80%
发文量
724
审稿时长
6 months
期刊介绍: IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.
期刊最新文献
Triple Peak Day: Work Rhythms of Software Developers in Hybrid Work GenProgJS: a Baseline System for Test-based Automated Repair of JavaScript Programs On Inter-dataset Code Duplication and Data Leakage in Large Language Models Line-Level Defect Prediction by Capturing Code Contexts with Graph Convolutional Networks Does Treatment Adherence Impact Experiment Results in TDD?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1