Hybrid Model with Multi-Level Code Representation for Multi-Label Code Smell Detection (077)

IF 0.6 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Software Engineering and Knowledge Engineering Pub Date : 2022-12-07 DOI:10.1142/s0218194022500723
Yichen Li, An Liu, Lei Zhao, Xiaofang Zhang
{"title":"Hybrid Model with Multi-Level Code Representation for Multi-Label Code Smell Detection (077)","authors":"Yichen Li, An Liu, Lei Zhao, Xiaofang Zhang","doi":"10.1142/s0218194022500723","DOIUrl":null,"url":null,"abstract":"Code smell is an indicator of potential problems in a software design that have a negative impact on readability and maintainability. Hence, detecting code smells in a timely and effective manner can provide guides for developers in refactoring. Fortunately, many approaches like metric-based, heuristic-based, machine-learning-based and deep-learning-based have been proposed to detect code smells. However, existing methods, using the simple code representation to describe different code smells unilaterally, cannot efficiently extract enough rich information from source code. In addition, one code snippet often has several code smells at the same time and there is a lack of multi-label code smell detection based on deep learning. In this paper, we present a large-scale dataset for the multi-label code smell detection task since there is still no publicly sufficient dataset for this task. The release of this dataset would push forward the research in this field. Based on it, we propose a hybrid model with multi-level code representation to further optimize the code smell detection. First, we parse the code into the abstract syntax tree (AST) with control and data flow edges and the graph convolution network is applied to get the prediction at the syntactic and semantic level. Then we use the bidirectional long-short term memory network with attention mechanism to analyze the code tokens at the token-level in the meanwhile. Finally, we get the fusion prediction result of the models. Experimental results illustrate that our proposed model outperforms the state-of-the-art methods not only in single code smell detection but also in multi-label code smell detection.","PeriodicalId":50288,"journal":{"name":"International Journal of Software Engineering and Knowledge Engineering","volume":"84 1","pages":"1643-1666"},"PeriodicalIF":0.6000,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Software Engineering and Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/s0218194022500723","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Code smell is an indicator of potential problems in a software design that have a negative impact on readability and maintainability. Hence, detecting code smells in a timely and effective manner can provide guides for developers in refactoring. Fortunately, many approaches like metric-based, heuristic-based, machine-learning-based and deep-learning-based have been proposed to detect code smells. However, existing methods, using the simple code representation to describe different code smells unilaterally, cannot efficiently extract enough rich information from source code. In addition, one code snippet often has several code smells at the same time and there is a lack of multi-label code smell detection based on deep learning. In this paper, we present a large-scale dataset for the multi-label code smell detection task since there is still no publicly sufficient dataset for this task. The release of this dataset would push forward the research in this field. Based on it, we propose a hybrid model with multi-level code representation to further optimize the code smell detection. First, we parse the code into the abstract syntax tree (AST) with control and data flow edges and the graph convolution network is applied to get the prediction at the syntactic and semantic level. Then we use the bidirectional long-short term memory network with attention mechanism to analyze the code tokens at the token-level in the meanwhile. Finally, we get the fusion prediction result of the models. Experimental results illustrate that our proposed model outperforms the state-of-the-art methods not only in single code smell detection but also in multi-label code smell detection.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于多级代码表示的多标签代码气味检测混合模型(077)
代码气味是软件设计中对可读性和可维护性有负面影响的潜在问题的指示器。因此,及时有效地检测代码气味可以为开发人员提供重构指南。幸运的是,已经提出了许多方法,如基于度量、基于启发式、基于机器学习和基于深度学习的方法来检测代码气味。然而,现有方法采用简单的代码表示来片面地描述不同的代码气味,无法有效地从源代码中提取足够丰富的信息。此外,一个代码片段通常同时具有多个代码气味,并且缺乏基于深度学习的多标签代码气味检测。在本文中,我们为多标签代码气味检测任务提供了一个大规模的数据集,因为仍然没有公开的足够的数据集来完成该任务。该数据集的发布将推动该领域的研究。在此基础上,提出了一种多级代码表示的混合模型,进一步优化代码气味检测。首先,我们将代码解析为具有控制边和数据流边的抽象语法树(AST),并应用图卷积网络在语法和语义层面进行预测。同时,我们利用具有注意机制的双向长短期记忆网络在符号层面对编码符号进行分析。最后,给出了模型的融合预测结果。实验结果表明,该模型不仅在单标签代码气味检测方面优于现有方法,而且在多标签代码气味检测方面也优于现有方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.90
自引率
11.10%
发文量
71
审稿时长
16 months
期刊介绍: The International Journal of Software Engineering and Knowledge Engineering is intended to serve as a forum for researchers, practitioners, and developers to exchange ideas and results for the advancement of software engineering and knowledge engineering. Three types of papers will be published: Research papers reporting original research results Technology trend surveys reviewing an area of research in software engineering and knowledge engineering Survey articles surveying a broad area in software engineering and knowledge engineering In addition, tool reviews (no more than three manuscript pages) and book reviews (no more than two manuscript pages) are also welcome. A central theme of this journal is the interplay between software engineering and knowledge engineering: how knowledge engineering methods can be applied to software engineering, and vice versa. The journal publishes papers in the areas of software engineering methods and practices, object-oriented systems, rapid prototyping, software reuse, cleanroom software engineering, stepwise refinement/enhancement, formal methods of specification, ambiguity in software development, impact of CASE on software development life cycle, knowledge engineering methods and practices, logic programming, expert systems, knowledge-based systems, distributed knowledge-based systems, deductive database systems, knowledge representations, knowledge-based systems in language translation & processing, software and knowledge-ware maintenance, reverse engineering in software design, and applications in various domains of interest.
期刊最新文献
Modeling and Control of Drinking Water Supply Infrastructures Through Multi-Agent Systems for Sustainability MOID: Many-to-One Patent Graph Embedding Base Infringement Detection Model EFSP: an enhanced full scrum process model Exploring the Impact of Vocabulary Techniques on Code Completion: A Comparative Approach Appling Scrum to knowledge transfer among software developers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1