基于多级代码表示的多标签代码气味检测混合模型(077)

IF 0.6 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Software Engineering and Knowledge Engineering Pub Date : 2022-12-07 DOI:10.1142/s0218194022500723

Yichen Li, An Liu, Lei Zhao, Xiaofang Zhang

{"title":"基于多级代码表示的多标签代码气味检测混合模型(077)","authors":"Yichen Li, An Liu, Lei Zhao, Xiaofang Zhang","doi":"10.1142/s0218194022500723","DOIUrl":null,"url":null,"abstract":"Code smell is an indicator of potential problems in a software design that have a negative impact on readability and maintainability. Hence, detecting code smells in a timely and effective manner can provide guides for developers in refactoring. Fortunately, many approaches like metric-based, heuristic-based, machine-learning-based and deep-learning-based have been proposed to detect code smells. However, existing methods, using the simple code representation to describe different code smells unilaterally, cannot efficiently extract enough rich information from source code. In addition, one code snippet often has several code smells at the same time and there is a lack of multi-label code smell detection based on deep learning. In this paper, we present a large-scale dataset for the multi-label code smell detection task since there is still no publicly sufficient dataset for this task. The release of this dataset would push forward the research in this field. Based on it, we propose a hybrid model with multi-level code representation to further optimize the code smell detection. First, we parse the code into the abstract syntax tree (AST) with control and data flow edges and the graph convolution network is applied to get the prediction at the syntactic and semantic level. Then we use the bidirectional long-short term memory network with attention mechanism to analyze the code tokens at the token-level in the meanwhile. Finally, we get the fusion prediction result of the models. Experimental results illustrate that our proposed model outperforms the state-of-the-art methods not only in single code smell detection but also in multi-label code smell detection.","PeriodicalId":50288,"journal":{"name":"International Journal of Software Engineering and Knowledge Engineering","volume":"84 1","pages":"1643-1666"},"PeriodicalIF":0.6000,"publicationDate":"2022-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid Model with Multi-Level Code Representation for Multi-Label Code Smell Detection (077)\",\"authors\":\"Yichen Li, An Liu, Lei Zhao, Xiaofang Zhang\",\"doi\":\"10.1142/s0218194022500723\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Code smell is an indicator of potential problems in a software design that have a negative impact on readability and maintainability. Hence, detecting code smells in a timely and effective manner can provide guides for developers in refactoring. Fortunately, many approaches like metric-based, heuristic-based, machine-learning-based and deep-learning-based have been proposed to detect code smells. However, existing methods, using the simple code representation to describe different code smells unilaterally, cannot efficiently extract enough rich information from source code. In addition, one code snippet often has several code smells at the same time and there is a lack of multi-label code smell detection based on deep learning. In this paper, we present a large-scale dataset for the multi-label code smell detection task since there is still no publicly sufficient dataset for this task. The release of this dataset would push forward the research in this field. Based on it, we propose a hybrid model with multi-level code representation to further optimize the code smell detection. First, we parse the code into the abstract syntax tree (AST) with control and data flow edges and the graph convolution network is applied to get the prediction at the syntactic and semantic level. Then we use the bidirectional long-short term memory network with attention mechanism to analyze the code tokens at the token-level in the meanwhile. Finally, we get the fusion prediction result of the models. Experimental results illustrate that our proposed model outperforms the state-of-the-art methods not only in single code smell detection but also in multi-label code smell detection.\",\"PeriodicalId\":50288,\"journal\":{\"name\":\"International Journal of Software Engineering and Knowledge Engineering\",\"volume\":\"84 1\",\"pages\":\"1643-1666\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2022-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Software Engineering and Knowledge Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1142/s0218194022500723\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Software Engineering and Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/s0218194022500723","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

代码气味是软件设计中对可读性和可维护性有负面影响的潜在问题的指示器。因此，及时有效地检测代码气味可以为开发人员提供重构指南。幸运的是，已经提出了许多方法，如基于度量、基于启发式、基于机器学习和基于深度学习的方法来检测代码气味。然而，现有方法采用简单的代码表示来片面地描述不同的代码气味，无法有效地从源代码中提取足够丰富的信息。此外，一个代码片段通常同时具有多个代码气味，并且缺乏基于深度学习的多标签代码气味检测。在本文中，我们为多标签代码气味检测任务提供了一个大规模的数据集，因为仍然没有公开的足够的数据集来完成该任务。该数据集的发布将推动该领域的研究。在此基础上，提出了一种多级代码表示的混合模型，进一步优化代码气味检测。首先，我们将代码解析为具有控制边和数据流边的抽象语法树(AST)，并应用图卷积网络在语法和语义层面进行预测。同时，我们利用具有注意机制的双向长短期记忆网络在符号层面对编码符号进行分析。最后，给出了模型的融合预测结果。实验结果表明，该模型不仅在单标签代码气味检测方面优于现有方法，而且在多标签代码气味检测方面也优于现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Hybrid Model with Multi-Level Code Representation for Multi-Label Code Smell Detection (077)

Code smell is an indicator of potential problems in a software design that have a negative impact on readability and maintainability. Hence, detecting code smells in a timely and effective manner can provide guides for developers in refactoring. Fortunately, many approaches like metric-based, heuristic-based, machine-learning-based and deep-learning-based have been proposed to detect code smells. However, existing methods, using the simple code representation to describe different code smells unilaterally, cannot efficiently extract enough rich information from source code. In addition, one code snippet often has several code smells at the same time and there is a lack of multi-label code smell detection based on deep learning. In this paper, we present a large-scale dataset for the multi-label code smell detection task since there is still no publicly sufficient dataset for this task. The release of this dataset would push forward the research in this field. Based on it, we propose a hybrid model with multi-level code representation to further optimize the code smell detection. First, we parse the code into the abstract syntax tree (AST) with control and data flow edges and the graph convolution network is applied to get the prediction at the syntactic and semantic level. Then we use the bidirectional long-short term memory network with attention mechanism to analyze the code tokens at the token-level in the meanwhile. Finally, we get the fusion prediction result of the models. Experimental results illustrate that our proposed model outperforms the state-of-the-art methods not only in single code smell detection but also in multi-label code smell detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Software Engineering and Knowledge Engineering 工程技术-工程：电子与电气

CiteScore

1.90

自引率

11.10%

发文量

审稿时长

16 months

期刊介绍： The International Journal of Software Engineering and Knowledge Engineering is intended to serve as a forum for researchers, practitioners, and developers to exchange ideas and results for the advancement of software engineering and knowledge engineering. Three types of papers will be published: Research papers reporting original research results Technology trend surveys reviewing an area of research in software engineering and knowledge engineering Survey articles surveying a broad area in software engineering and knowledge engineering In addition, tool reviews (no more than three manuscript pages) and book reviews (no more than two manuscript pages) are also welcome. A central theme of this journal is the interplay between software engineering and knowledge engineering: how knowledge engineering methods can be applied to software engineering, and vice versa. The journal publishes papers in the areas of software engineering methods and practices, object-oriented systems, rapid prototyping, software reuse, cleanroom software engineering, stepwise refinement/enhancement, formal methods of specification, ambiguity in software development, impact of CASE on software development life cycle, knowledge engineering methods and practices, logic programming, expert systems, knowledge-based systems, distributed knowledge-based systems, deductive database systems, knowledge representations, knowledge-based systems in language translation & processing, software and knowledge-ware maintenance, reverse engineering in software design, and applications in various domains of interest.