Machine learning with word embedding for detecting web-services anti-patterns

IF 1.7 3区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Computer Languages Pub Date : 2023-06-01 DOI:10.1016/j.cola.2023.101207
Lov Kumar , Sahithi Tummalapalli , Sonika Chandrakant Rathi , Lalita Bhanu Murthy , Aneesh Krishna , Sanjay Misra
{"title":"Machine learning with word embedding for detecting web-services anti-patterns","authors":"Lov Kumar ,&nbsp;Sahithi Tummalapalli ,&nbsp;Sonika Chandrakant Rathi ,&nbsp;Lalita Bhanu Murthy ,&nbsp;Aneesh Krishna ,&nbsp;Sanjay Misra","doi":"10.1016/j.cola.2023.101207","DOIUrl":null,"url":null,"abstract":"<div><p>Software design Anti-pattern is the common feedback to a recurring problem that is ineffective and has a high risk of failure. Early prediction of these Anti-patterns helps reduce the design process’s efforts, resources, and costs. In earlier research, static code or Web Service Description Language (WSDL) metrics were used to develop anti-pattern prediction models. These source code metrics are calculated at either file-level or system-level. So, the values of these metrics are frequently dependent on assumptions that are not defined or standardized and might vary depending on the tools available. This study aims to develop a machine learning-based Anti-patterns prediction model using natural language processing techniques for representing the WSDL file as an input. In this research, the four-word embedding methods have been used to process the WSDL file. The processed outputs are used as input to the models trained using thirty-three classifier techniques. This study also uses eight feature selection techniques to remove ineffective features and five data sampling techniques to handle the class imbalance nature of the datasets. The results indicate that the developed models using text metrics perform better than the static code or WSDL metrics. Additionally, the results suggest that selecting features using feature selection and balancing data using sampling techniques helps improve the models’ performance.</p></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"75 ","pages":"Article 101207"},"PeriodicalIF":1.7000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118423000175","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Software design Anti-pattern is the common feedback to a recurring problem that is ineffective and has a high risk of failure. Early prediction of these Anti-patterns helps reduce the design process’s efforts, resources, and costs. In earlier research, static code or Web Service Description Language (WSDL) metrics were used to develop anti-pattern prediction models. These source code metrics are calculated at either file-level or system-level. So, the values of these metrics are frequently dependent on assumptions that are not defined or standardized and might vary depending on the tools available. This study aims to develop a machine learning-based Anti-patterns prediction model using natural language processing techniques for representing the WSDL file as an input. In this research, the four-word embedding methods have been used to process the WSDL file. The processed outputs are used as input to the models trained using thirty-three classifier techniques. This study also uses eight feature selection techniques to remove ineffective features and five data sampling techniques to handle the class imbalance nature of the datasets. The results indicate that the developed models using text metrics perform better than the static code or WSDL metrics. Additionally, the results suggest that selecting features using feature selection and balancing data using sampling techniques helps improve the models’ performance.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于单词嵌入的机器学习检测web服务反模式
软件设计反模式是对重复出现的问题的常见反馈,该问题无效且有很高的失败风险。这些反模式的早期预测有助于减少设计过程的工作量、资源和成本。在早期的研究中,静态代码或Web服务描述语言(WSDL)度量被用于开发反模式预测模型。这些源代码度量是在文件级别或系统级别计算的。因此,这些指标的值通常取决于未定义或标准化的假设,并且可能因可用工具而异。本研究旨在开发一个基于机器学习的反模式预测模型,该模型使用自然语言处理技术将WSDL文件表示为输入。在本研究中,使用了四个单词的嵌入方法来处理WSDL文件。处理后的输出被用作使用三十三种分类器技术训练的模型的输入。本研究还使用了八种特征选择技术来去除无效特征,并使用了五种数据采样技术来处理数据集的类不平衡性质。结果表明,使用文本度量的开发模型的性能优于静态代码或WSDL度量。此外,研究结果表明,使用特征选择和采样技术平衡数据来选择特征有助于提高模型的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Computer Languages
Journal of Computer Languages Computer Science-Computer Networks and Communications
CiteScore
5.00
自引率
13.60%
发文量
36
期刊最新文献
Debugging in the Domain-Specific Modeling Languages for multi-agent systems GPotion: Embedding GPU programming in Elixir Near-Pruned single assignment transformation of programs MLAPW: A framework to assess the impact of feature selection and sampling techniques on anti-pattern prediction using WSDL metrics Editorial Board
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1