Xin Chen;Tian Sun;Dongling Zhuang;Dongjin Yu;He Jiang;Zhide Zhou;Sicheng Li
{"title":"HetFL: Heterogeneous Graph-Based Software Fault Localization","authors":"Xin Chen;Tian Sun;Dongling Zhuang;Dongjin Yu;He Jiang;Zhide Zhou;Sicheng Li","doi":"10.1109/TSE.2024.3454605","DOIUrl":null,"url":null,"abstract":"Automated software fault localization has become one of the hot spots on which researchers have focused in recent years. Existing studies have shown that learning-based techniques can effectively localize faults leveraging various information. However, there exist two problems in these techniques. The first is that they simply represent various information without caring the contribution of different information. The second is that the data imbalance problem is not considered in these techniques. Thus, their effectiveness is limited in practice. In this paper, we propose HetFL, a novel heterogeneous graph-based software fault localization technique to aggregate different information into a heterogeneous graph in which program entities and test cases are regarded as nodes, and coverage, change histories, and call relationships are viewed as edges. HetFL first extracts textual and structure information from source code as attributes of nodes and integrates them to form an attribute vector. Then, for a given node, HetFL finds its neighbor nodes based on the types of edges and aggregates corresponding neighbor nodes to form type vectors. After that, the attribute vector and all the type vectors of each node are aggregated to generate the final vector representation by an attention mechanism. Finally, we leverage a convolution neural network (CNN) to obtain the suspicious score of each method. To validate the effectiveness of HetFL, experiments are conducted on the widely used dataset Defects4J (v1.2.0). The experimental results show that HetFL can localize 217 faults within Top-1 that is 25 higher than the state-of-the-art technique DeepFL, and achieve 6.37 and 5.58 in terms of MAR and MFR which improve DeepFL by 9.0% and 5.6%, respectively. In addition, we also perform experiments on the latest version of Defects4J (v2.0.0). The experimental results show that HetFL has better performance than the baseline methods.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2884-2905"},"PeriodicalIF":6.5000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10666908/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Automated software fault localization has become one of the hot spots on which researchers have focused in recent years. Existing studies have shown that learning-based techniques can effectively localize faults leveraging various information. However, there exist two problems in these techniques. The first is that they simply represent various information without caring the contribution of different information. The second is that the data imbalance problem is not considered in these techniques. Thus, their effectiveness is limited in practice. In this paper, we propose HetFL, a novel heterogeneous graph-based software fault localization technique to aggregate different information into a heterogeneous graph in which program entities and test cases are regarded as nodes, and coverage, change histories, and call relationships are viewed as edges. HetFL first extracts textual and structure information from source code as attributes of nodes and integrates them to form an attribute vector. Then, for a given node, HetFL finds its neighbor nodes based on the types of edges and aggregates corresponding neighbor nodes to form type vectors. After that, the attribute vector and all the type vectors of each node are aggregated to generate the final vector representation by an attention mechanism. Finally, we leverage a convolution neural network (CNN) to obtain the suspicious score of each method. To validate the effectiveness of HetFL, experiments are conducted on the widely used dataset Defects4J (v1.2.0). The experimental results show that HetFL can localize 217 faults within Top-1 that is 25 higher than the state-of-the-art technique DeepFL, and achieve 6.37 and 5.58 in terms of MAR and MFR which improve DeepFL by 9.0% and 5.6%, respectively. In addition, we also perform experiments on the latest version of Defects4J (v2.0.0). The experimental results show that HetFL has better performance than the baseline methods.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.