改进和比较通过群智能算法优化的机器学习分类器在代码气味检测方面的性能

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Science of Computer Programming Pub Date : 2024-05-15 DOI:10.1016/j.scico.2024.103140

Shivani Jain, Anju Saha

{"title":"改进和比较通过群智能算法优化的机器学习分类器在代码气味检测方面的性能","authors":"Shivani Jain, Anju Saha","doi":"10.1016/j.scico.2024.103140","DOIUrl":null,"url":null,"abstract":"<div><p>In complex systems, the maintenance phase engenders the emergence of code smells due to incessant shifts in requirements and designs, stringent timelines, and the developer's relative inexperience. While not conventionally classified as errors, code smells inherently signify flawed design structures that lead to future bugs and errors. It increases the software budget and eventually makes the system hard to maintain or completely obsolete. To mitigate these challenges, practitioners must detect and refactor code smells. However, the theoretical interpretation of smell definitions and intelligent establishment of threshold values pose a significant conundrum. Supervised machine learning emerges as a potent strategy to address these problems and alleviate the dependence on expert intervention. The learning mechanism of these algorithms can be refined through data pre-processing and hyperparameter tuning. Selecting the best values for hyperparameters can be tedious and requires an expert. This study introduces an innovative paradigm that fuses twelve swarm-based, meta-heuristic algorithms with two machine learning classifiers, optimizing their hyperparameters, eliminating the need for an expert, and automating the entire code smell detection process. Through this synergistic approach, the highest post-optimization accuracy, precision, recall, F-measure, and ROC-AUC values are 99.09%, 99.20%, 99.09%, 98.06%, and 100%, respectively. The most remarkable upsurge is 35.9% in accuracy, 53.79% in precision, 35.90% in recall, 44.73% in F-measure, and 36.28% in ROC-AUC. Artificial Bee Colony, Grey Wolf, and Salp Swarm Optimizer are the top-performing swarm-intelligent algorithms. God and Data Class are the most readily detectable smells with optimized classifiers. Statistical tests underscore the profound impact of employing swarm-based algorithms to optimize machine learning classifiers, corroborated by statistical tests. This seamless integration enhances classifier performance, automates code smell detection, and offers a robust solution to a persistent software engineering challenge.</p></div>","PeriodicalId":49561,"journal":{"name":"Science of Computer Programming","volume":"237 ","pages":"Article 103140"},"PeriodicalIF":1.5000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving and comparing performance of machine learning classifiers optimized by swarm intelligent algorithms for code smell detection\",\"authors\":\"Shivani Jain, Anju Saha\",\"doi\":\"10.1016/j.scico.2024.103140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In complex systems, the maintenance phase engenders the emergence of code smells due to incessant shifts in requirements and designs, stringent timelines, and the developer's relative inexperience. While not conventionally classified as errors, code smells inherently signify flawed design structures that lead to future bugs and errors. It increases the software budget and eventually makes the system hard to maintain or completely obsolete. To mitigate these challenges, practitioners must detect and refactor code smells. However, the theoretical interpretation of smell definitions and intelligent establishment of threshold values pose a significant conundrum. Supervised machine learning emerges as a potent strategy to address these problems and alleviate the dependence on expert intervention. The learning mechanism of these algorithms can be refined through data pre-processing and hyperparameter tuning. Selecting the best values for hyperparameters can be tedious and requires an expert. This study introduces an innovative paradigm that fuses twelve swarm-based, meta-heuristic algorithms with two machine learning classifiers, optimizing their hyperparameters, eliminating the need for an expert, and automating the entire code smell detection process. Through this synergistic approach, the highest post-optimization accuracy, precision, recall, F-measure, and ROC-AUC values are 99.09%, 99.20%, 99.09%, 98.06%, and 100%, respectively. The most remarkable upsurge is 35.9% in accuracy, 53.79% in precision, 35.90% in recall, 44.73% in F-measure, and 36.28% in ROC-AUC. Artificial Bee Colony, Grey Wolf, and Salp Swarm Optimizer are the top-performing swarm-intelligent algorithms. God and Data Class are the most readily detectable smells with optimized classifiers. Statistical tests underscore the profound impact of employing swarm-based algorithms to optimize machine learning classifiers, corroborated by statistical tests. This seamless integration enhances classifier performance, automates code smell detection, and offers a robust solution to a persistent software engineering challenge.</p></div>\",\"PeriodicalId\":49561,\"journal\":{\"name\":\"Science of Computer Programming\",\"volume\":\"237 \",\"pages\":\"Article 103140\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Science of Computer Programming\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167642324000637\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of Computer Programming","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167642324000637","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

在复杂的系统中，由于需求和设计的不断变化、严格的时间限制以及开发人员相对缺乏经验，维护阶段会产生代码气味。虽然在传统意义上，代码气味并不属于错误，但它本质上意味着设计结构存在缺陷，会导致未来的错误和错误。它增加了软件预算，最终导致系统难以维护或完全过时。为了减轻这些挑战，实践者必须检测和重构代码气味。然而，气味定义的理论解释和阈值的智能确定是一个重大难题。有监督的机器学习是解决这些问题并减轻对专家干预依赖的有效策略。这些算法的学习机制可以通过数据预处理和超参数调整来完善。选择超参数的最佳值可能很繁琐，而且需要专家的帮助。本研究引入了一种创新范式，将 12 种基于蜂群的元启发式算法与两种机器学习分类器融合在一起，优化了它们的超参数，无需专家，并使整个代码气味检测过程自动化。通过这种协同方法，优化后的最高准确率、精确率、召回率、F-measure 和 ROC-AUC 值分别为 99.09%、99.20%、99.09%、98.06% 和 100%。其中，准确率、精确率、召回率、F-measure 和 ROC-AUC 分别上升了 35.9%、53.79%、35.90%、44.73% 和 36.28%。人工蜂群、灰狼和 Salp Swarm Optimizer 是表现最好的蜂群智能算法。上帝和数据类别是优化分类器最容易检测到的气味。统计测试强调了采用基于蜂群的算法来优化机器学习分类器的深远影响，统计测试也证实了这一点。这种无缝集成提高了分类器的性能，实现了代码气味检测的自动化，并为软件工程的长期挑战提供了强大的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improving and comparing performance of machine learning classifiers optimized by swarm intelligent algorithms for code smell detection

In complex systems, the maintenance phase engenders the emergence of code smells due to incessant shifts in requirements and designs, stringent timelines, and the developer's relative inexperience. While not conventionally classified as errors, code smells inherently signify flawed design structures that lead to future bugs and errors. It increases the software budget and eventually makes the system hard to maintain or completely obsolete. To mitigate these challenges, practitioners must detect and refactor code smells. However, the theoretical interpretation of smell definitions and intelligent establishment of threshold values pose a significant conundrum. Supervised machine learning emerges as a potent strategy to address these problems and alleviate the dependence on expert intervention. The learning mechanism of these algorithms can be refined through data pre-processing and hyperparameter tuning. Selecting the best values for hyperparameters can be tedious and requires an expert. This study introduces an innovative paradigm that fuses twelve swarm-based, meta-heuristic algorithms with two machine learning classifiers, optimizing their hyperparameters, eliminating the need for an expert, and automating the entire code smell detection process. Through this synergistic approach, the highest post-optimization accuracy, precision, recall, F-measure, and ROC-AUC values are 99.09%, 99.20%, 99.09%, 98.06%, and 100%, respectively. The most remarkable upsurge is 35.9% in accuracy, 53.79% in precision, 35.90% in recall, 44.73% in F-measure, and 36.28% in ROC-AUC. Artificial Bee Colony, Grey Wolf, and Salp Swarm Optimizer are the top-performing swarm-intelligent algorithms. God and Data Class are the most readily detectable smells with optimized classifiers. Statistical tests underscore the profound impact of employing swarm-based algorithms to optimize machine learning classifiers, corroborated by statistical tests. This seamless integration enhances classifier performance, automates code smell detection, and offers a robust solution to a persistent software engineering challenge.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Science of Computer Programming 工程技术-计算机：软件工程

CiteScore

3.80

自引率

0.00%

发文量

审稿时长

67 days

期刊介绍： Science of Computer Programming is dedicated to the distribution of research results in the areas of software systems development, use and maintenance, including the software aspects of hardware design. The journal has a wide scope ranging from the many facets of methodological foundations to the details of technical issues andthe aspects of industrial practice. The subjects of interest to SCP cover the entire spectrum of methods for the entire life cycle of software systems, including • Requirements, specification, design, validation, verification, coding, testing, maintenance, metrics and renovation of software; • Design, implementation and evaluation of programming languages; • Programming environments, development tools, visualisation and animation; • Management of the development process; • Human factors in software, software for social interaction, software for social computing; • Cyber physical systems, and software for the interaction between the physical and the machine; • Software aspects of infrastructure services, system administration, and network management.