MLAPW: A framework to assess the impact of feature selection and sampling techniques on anti-pattern prediction using WSDL metrics

IF 1.7 3区 计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Computer Languages Pub Date : 2025-02-01 DOI:10.1016/j.cola.2025.101322
Lov Kumar , Vikram Singh , Lalita Bhanu Murthy , Aneesh Krishna , Sanjay Misra
{"title":"MLAPW: A framework to assess the impact of feature selection and sampling techniques on anti-pattern prediction using WSDL metrics","authors":"Lov Kumar ,&nbsp;Vikram Singh ,&nbsp;Lalita Bhanu Murthy ,&nbsp;Aneesh Krishna ,&nbsp;Sanjay Misra","doi":"10.1016/j.cola.2025.101322","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>The quality and design of Service-Based Systems may be degraded because of frequent changes, and negatively impacts the software design quality called <strong>Anti-patterns</strong>. The existence of these Anti-patterns highly impacts the overall maintainability of Service-Based Systems. Hence, early detection of these anti-patterns’ presence becomes mandatory with co-located modifications. However, it is not easy to find these anti-patterns manually.</div></div><div><h3>Objective:</h3><div>The objective of this work is to explore the role of WSDL (Web Services Description Language) metrics (MLAPW) for anti-pattern prediction using a Machine Learning (ML) based framework. This framework encompasses different variants of feature selection techniques, data sampling techniques, and a wide range of ML algorithms. This work empirically investigates the predictive ability of anti-pattern prediction models developed using different sets of WSDL metrics. Our major focus is to investigate ’<em>how these metrics accurately predict different types of Anti-patterns present in the WSDL file</em>’.</div></div><div><h3>Methods:</h3><div>To achieve the objective, different sets of WSDL metrics such as Structural Quality Metrics, Procedural Quality Metrics, Data Quality Metrics, Quality Metrics, and Complexity metrics, are used as input for Anti-patterns prediction models. Since these models use WSDL metrics as input, we have also used feature selection methods to find the best sets of WSDL metrics. These models are trained using various machine-learning techniques. This study also shows the performance of these models trained on balanced data using data sampling techniques. Finally, the empirical investigation of these techniques was done using accuracy and ROC (receiver operating characteristic curve) curve (AUC) with hypothesis testing.</div></div><div><h3>Results:</h3><div>The empirical study’s observation is based on 226 WSDL files from various domains such as finance, tourism, health, education, etc. The assessment asserts that the models trained using WSDL metrics have 0.79 mean AUC and 0.90 Median AUC. However, the models trained using the selected feature with classifier feature subset selection (CFS) have a better mean AUC of 0.80 and median AUC of 0.97. The experimental results also confirm that the models trained on up-sampling (UPSAM) have a better mean AUC of 0.79 and median AUC of 0.91 with a low value of Friedman rank of 2.40. Finally, the models trained using the least square support vector machine (LSSVM) achieved 1 median AUC, 0.99 mean AUC, and a low Friedman rank of 1.30.</div></div><div><h3>Conclusion:</h3><div>The experimental results show that the AUC values of the models trained using Data and Procedural Quality Metrics are high as compared to the other sets of metrics. However, the models improved significantly in their prediction performance after employing feature selection techniques. The experimental results also show that the models trained using the advanced level of classifiers and ensemble learning have a higher value of AUC than other techniques. Based on this research, it is reasonable to claim that using data sampling techniques helps to improve the models’ prediction capability. The models trained on sampled data using UPSAM or up-sampling achieved 0.91 medians AUC and 0.79 average AUC.</div></div>","PeriodicalId":48552,"journal":{"name":"Journal of Computer Languages","volume":"83 ","pages":"Article 101322"},"PeriodicalIF":1.7000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Languages","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590118425000085","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Context:

The quality and design of Service-Based Systems may be degraded because of frequent changes, and negatively impacts the software design quality called Anti-patterns. The existence of these Anti-patterns highly impacts the overall maintainability of Service-Based Systems. Hence, early detection of these anti-patterns’ presence becomes mandatory with co-located modifications. However, it is not easy to find these anti-patterns manually.

Objective:

The objective of this work is to explore the role of WSDL (Web Services Description Language) metrics (MLAPW) for anti-pattern prediction using a Machine Learning (ML) based framework. This framework encompasses different variants of feature selection techniques, data sampling techniques, and a wide range of ML algorithms. This work empirically investigates the predictive ability of anti-pattern prediction models developed using different sets of WSDL metrics. Our major focus is to investigate ’how these metrics accurately predict different types of Anti-patterns present in the WSDL file’.

Methods:

To achieve the objective, different sets of WSDL metrics such as Structural Quality Metrics, Procedural Quality Metrics, Data Quality Metrics, Quality Metrics, and Complexity metrics, are used as input for Anti-patterns prediction models. Since these models use WSDL metrics as input, we have also used feature selection methods to find the best sets of WSDL metrics. These models are trained using various machine-learning techniques. This study also shows the performance of these models trained on balanced data using data sampling techniques. Finally, the empirical investigation of these techniques was done using accuracy and ROC (receiver operating characteristic curve) curve (AUC) with hypothesis testing.

Results:

The empirical study’s observation is based on 226 WSDL files from various domains such as finance, tourism, health, education, etc. The assessment asserts that the models trained using WSDL metrics have 0.79 mean AUC and 0.90 Median AUC. However, the models trained using the selected feature with classifier feature subset selection (CFS) have a better mean AUC of 0.80 and median AUC of 0.97. The experimental results also confirm that the models trained on up-sampling (UPSAM) have a better mean AUC of 0.79 and median AUC of 0.91 with a low value of Friedman rank of 2.40. Finally, the models trained using the least square support vector machine (LSSVM) achieved 1 median AUC, 0.99 mean AUC, and a low Friedman rank of 1.30.

Conclusion:

The experimental results show that the AUC values of the models trained using Data and Procedural Quality Metrics are high as compared to the other sets of metrics. However, the models improved significantly in their prediction performance after employing feature selection techniques. The experimental results also show that the models trained using the advanced level of classifiers and ensemble learning have a higher value of AUC than other techniques. Based on this research, it is reasonable to claim that using data sampling techniques helps to improve the models’ prediction capability. The models trained on sampled data using UPSAM or up-sampling achieved 0.91 medians AUC and 0.79 average AUC.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Computer Languages
Journal of Computer Languages Computer Science-Computer Networks and Communications
CiteScore
5.00
自引率
13.60%
发文量
36
期刊最新文献
Near-Pruned single assignment transformation of programs MLAPW: A framework to assess the impact of feature selection and sampling techniques on anti-pattern prediction using WSDL metrics Editorial Board Code histories: Documenting development by recording code influences and changes in code A comprehensive meta-analysis of efficiency and effectiveness in the detection community
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1