Dynamic stacking ensemble for cross-language code smell detection

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE PeerJ Computer Science Pub Date : 2024-08-15 DOI:10.7717/peerj-cs.2254

Hamoud Aljamaan

{"title":"Dynamic stacking ensemble for cross-language code smell detection","authors":"Hamoud Aljamaan","doi":"10.7717/peerj-cs.2254","DOIUrl":null,"url":null,"abstract":"Code smells refer to poor design and implementation choices by software engineers that might affect the overall software quality. Code smells detection using machine learning models has become a popular area to build effective models that are capable of detecting different code smells in multiple programming languages. However, the process of building of such effective models has not reached a state of stability, and most of the existing research focuses on Java code smells detection. The main objective of this article is to propose dynamic ensembles using two strategies, namely greedy search and backward elimination, which are capable of accurately detecting code smells in two programming languages (i.e., Java and Python), and which are less complex than full stacking ensembles. The detection performance of dynamic ensembles were investigated within the context of four Java and two Python code smells. The greedy search and backward elimination strategies yielded different base models lists to build dynamic ensembles. In comparison to full stacking ensembles, dynamic ensembles yielded less complex models when they were used to detect most of the investigated Java and Python code smells, with the backward elimination strategy resulting in less complex models. Dynamic ensembles were able to perform comparably against full stacking ensembles with no significant detection loss. This article concludes that dynamic stacking ensembles were able to facilitate the effective and stable detection performance of Java and Python code smells over all base models and with less complexity than full stacking ensembles.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"24 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2254","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Code smells refer to poor design and implementation choices by software engineers that might affect the overall software quality. Code smells detection using machine learning models has become a popular area to build effective models that are capable of detecting different code smells in multiple programming languages. However, the process of building of such effective models has not reached a state of stability, and most of the existing research focuses on Java code smells detection. The main objective of this article is to propose dynamic ensembles using two strategies, namely greedy search and backward elimination, which are capable of accurately detecting code smells in two programming languages (i.e., Java and Python), and which are less complex than full stacking ensembles. The detection performance of dynamic ensembles were investigated within the context of four Java and two Python code smells. The greedy search and backward elimination strategies yielded different base models lists to build dynamic ensembles. In comparison to full stacking ensembles, dynamic ensembles yielded less complex models when they were used to detect most of the investigated Java and Python code smells, with the backward elimination strategy resulting in less complex models. Dynamic ensembles were able to perform comparably against full stacking ensembles with no significant detection loss. This article concludes that dynamic stacking ensembles were able to facilitate the effective and stable detection performance of Java and Python code smells over all base models and with less complexity than full stacking ensembles.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

跨语言代码气味检测的动态堆叠组合

代码气味是指软件工程师在设计和实施过程中做出的不良选择，可能会影响软件的整体质量。使用机器学习模型检测代码气味已成为一个热门领域，以建立能够检测多种编程语言中不同代码气味的有效模型。然而，建立此类有效模型的过程尚未达到稳定状态，现有研究大多集中于 Java 代码气味检测。本文的主要目的是提出使用贪婪搜索和后向消除两种策略的动态集合，它们能够准确地检测两种编程语言（即 Java 和 Python）中的代码气味，而且其复杂性低于完全堆叠集合。在四种 Java 和两种 Python 代码气味的背景下，研究了动态集合的检测性能。贪婪搜索和后向消除策略产生了不同的基础模型列表来构建动态集合。与完全堆叠集合相比，动态集合在用于检测大部分被调查的 Java 和 Python 代码气味时，所产生的模型复杂度较低，后向消除策略所产生的模型复杂度较低。动态集合与完全堆叠集合的性能相当，没有明显的检测损失。本文的结论是，在所有基本模型中，动态堆叠集合都能有效、稳定地检测出 Java 和 Python 代码气味，而且复杂度低于完全堆叠集合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.