Quanxin Yang , Dongjin Yu , Sixuan Wang , Yihang Xu , Xin Chen , Jie Chen , Bin Hu
{"title":"Enhancing structural knowledge in code smell identification: A fusion learning framework combining AST-based metrics with semantic embeddings","authors":"Quanxin Yang , Dongjin Yu , Sixuan Wang , Yihang Xu , Xin Chen , Jie Chen , Bin Hu","doi":"10.1016/j.eswa.2024.125725","DOIUrl":null,"url":null,"abstract":"<div><div>Identifying code smells is a crucial task in software engineering that aims to uncover potential problems and bad practices in source code. Existing learning-based approaches have achieved good results in identifying code smells by learning features such as software code metrics, syntax, and semantics. However, some gaps in existing research still need to be addressed: (1) Software code metrics are challenging to extract and vary across different levels of code granularity; (2) Highly abstract code semantics rarely convey the structural details of the code. To address these issues, we propose using Abstract Syntax Tree (AST)-based metrics to replace software code metrics for identifying code smells. The proposed AST-based metrics are easy to extract, treat code at all granularity levels uniformly, and precisely describe the structural details of the code. Additionally, we propose a fusion learning framework that combines AST-based metrics and semantic embeddings to identify code smells and their severity. Extensive experimental results reveal that our proposed AST-based metrics have the potential to replace software code metrics in identifying code smells, and the proposed fusion learning framework outperforms state-of-the-art approaches on the same dataset.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"263 ","pages":"Article 125725"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424025922","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Identifying code smells is a crucial task in software engineering that aims to uncover potential problems and bad practices in source code. Existing learning-based approaches have achieved good results in identifying code smells by learning features such as software code metrics, syntax, and semantics. However, some gaps in existing research still need to be addressed: (1) Software code metrics are challenging to extract and vary across different levels of code granularity; (2) Highly abstract code semantics rarely convey the structural details of the code. To address these issues, we propose using Abstract Syntax Tree (AST)-based metrics to replace software code metrics for identifying code smells. The proposed AST-based metrics are easy to extract, treat code at all granularity levels uniformly, and precisely describe the structural details of the code. Additionally, we propose a fusion learning framework that combines AST-based metrics and semantic embeddings to identify code smells and their severity. Extensive experimental results reveal that our proposed AST-based metrics have the potential to replace software code metrics in identifying code smells, and the proposed fusion learning framework outperforms state-of-the-art approaches on the same dataset.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.