Pub Date : 2024-09-10DOI: 10.1109/tse.2024.3453783
Delano Oliveira, Reydne Santos, Benedito de Oliveira, Martin Monperrus, Fernando Castor, Fernanda Madeiral
{"title":"Understanding Code Understandability Improvements in Code Reviews","authors":"Delano Oliveira, Reydne Santos, Benedito de Oliveira, Martin Monperrus, Fernando Castor, Fernanda Madeiral","doi":"10.1109/tse.2024.3453783","DOIUrl":"https://doi.org/10.1109/tse.2024.3453783","url":null,"abstract":"","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"48 1","pages":""},"PeriodicalIF":7.4,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142166419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1109/TSE.2024.3454605
Xin Chen;Tian Sun;Dongling Zhuang;Dongjin Yu;He Jiang;Zhide Zhou;Sicheng Li
Automated software fault localization has become one of the hot spots on which researchers have focused in recent years. Existing studies have shown that learning-based techniques can effectively localize faults leveraging various information. However, there exist two problems in these techniques. The first is that they simply represent various information without caring the contribution of different information. The second is that the data imbalance problem is not considered in these techniques. Thus, their effectiveness is limited in practice. In this paper, we propose HetFL, a novel heterogeneous graph-based software fault localization technique to aggregate different information into a heterogeneous graph in which program entities and test cases are regarded as nodes, and coverage, change histories, and call relationships are viewed as edges. HetFL first extracts textual and structure information from source code as attributes of nodes and integrates them to form an attribute vector. Then, for a given node, HetFL finds its neighbor nodes based on the types of edges and aggregates corresponding neighbor nodes to form type vectors. After that, the attribute vector and all the type vectors of each node are aggregated to generate the final vector representation by an attention mechanism. Finally, we leverage a convolution neural network (CNN) to obtain the suspicious score of each method. To validate the effectiveness of HetFL, experiments are conducted on the widely used dataset Defects4J (v1.2.0). The experimental results show that HetFL can localize 217 faults within Top-1 that is 25 higher than the state-of-the-art technique DeepFL, and achieve 6.37 and 5.58 in terms of MAR and MFR which improve DeepFL by 9.0% and 5.6%, respectively. In addition, we also perform experiments on the latest version of Defects4J (v2.0.0). The experimental results show that HetFL has better performance than the baseline methods.
{"title":"HetFL: Heterogeneous Graph-Based Software Fault Localization","authors":"Xin Chen;Tian Sun;Dongling Zhuang;Dongjin Yu;He Jiang;Zhide Zhou;Sicheng Li","doi":"10.1109/TSE.2024.3454605","DOIUrl":"10.1109/TSE.2024.3454605","url":null,"abstract":"Automated software fault localization has become one of the hot spots on which researchers have focused in recent years. Existing studies have shown that learning-based techniques can effectively localize faults leveraging various information. However, there exist two problems in these techniques. The first is that they simply represent various information without caring the contribution of different information. The second is that the data imbalance problem is not considered in these techniques. Thus, their effectiveness is limited in practice. In this paper, we propose HetFL, a novel heterogeneous graph-based software fault localization technique to aggregate different information into a heterogeneous graph in which program entities and test cases are regarded as nodes, and coverage, change histories, and call relationships are viewed as edges. HetFL first extracts textual and structure information from source code as attributes of nodes and integrates them to form an attribute vector. Then, for a given node, HetFL finds its neighbor nodes based on the types of edges and aggregates corresponding neighbor nodes to form type vectors. After that, the attribute vector and all the type vectors of each node are aggregated to generate the final vector representation by an attention mechanism. Finally, we leverage a convolution neural network (CNN) to obtain the suspicious score of each method. To validate the effectiveness of HetFL, experiments are conducted on the widely used dataset Defects4J (v1.2.0). The experimental results show that HetFL can localize 217 faults within Top-1 that is 25 higher than the state-of-the-art technique DeepFL, and achieve 6.37 and 5.58 in terms of MAR and MFR which improve DeepFL by 9.0% and 5.6%, respectively. In addition, we also perform experiments on the latest version of Defects4J (v2.0.0). The experimental results show that HetFL has better performance than the baseline methods.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2884-2905"},"PeriodicalIF":6.5,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142142451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Developers usually use third-party libraries (TPLs) to facilitate the development of their projects to avoid reinventing the wheels, however, the vulnerable TPLs indeed cause severe security threats. The majority of existing research only considered whether projects used vulnerable TPLs but neglected whether the vulnerable code of the TPLs was indeed used by the projects, which inevitably results in false positives and further requires additional patching efforts and maintenance costs (e.g., dependency conflict issues after version upgrades). To mitigate such a problem, we propose VAScanner