Study of Data Mining Algorithms Using a Dataset from the Size-Effect on Open Source Software Defects

Muthana Yaseen Nawaf, M. M. Rashid
{"title":"Study of Data Mining Algorithms Using a Dataset from the Size-Effect on Open Source Software Defects","authors":"Muthana Yaseen Nawaf, M. M. Rashid","doi":"10.32894/kujss.2020.15.2.3","DOIUrl":null,"url":null,"abstract":"This article focuses on the quality of data mining algorithms in terms of the accuracy ratio and time consumption. So, in order to figure out the best algorithm among the classification and clustering algorithms, the WEKA program will be testing all algorithms using a real dataset from the size effect on defect proneness for open source software. The Mozilla product is adopted as an example of open source software. The dataset that is used in this paper represents the output of the study of the size effect on defect proneness in the open source software. The study of Mozilla product shows a significant relationship between the size of software and the number of defect proneness in software. The Mozilla product study produced a dataset to be as inputs of the WEKA program in order to compare the data mining tools (algorithms). We use the Naive Bayes, Decision Trees J48, Expectation-maximization for classifying and K-Star and Simple KMeans for clustering methods. The findings demonstrate the difference between the algorithms according to the accuracy, and the time consuming to reach the result in each algorithm. Furthermore, the effect of the software size is significant on defect proneness. Finally, the experiments are conducted in WEKA with the aim of this research is finding out the best algorithm in terms of accuracy and timeconsuming. At the end, the paper will be figuring out the best algorithm in order to choose and depending on it in the tests of classification and clustering.","PeriodicalId":34247,"journal":{"name":"mjl@ jm`@ krkwk ldrst l`lmy@","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"mjl@ jm`@ krkwk ldrst l`lmy@","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32894/kujss.2020.15.2.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This article focuses on the quality of data mining algorithms in terms of the accuracy ratio and time consumption. So, in order to figure out the best algorithm among the classification and clustering algorithms, the WEKA program will be testing all algorithms using a real dataset from the size effect on defect proneness for open source software. The Mozilla product is adopted as an example of open source software. The dataset that is used in this paper represents the output of the study of the size effect on defect proneness in the open source software. The study of Mozilla product shows a significant relationship between the size of software and the number of defect proneness in software. The Mozilla product study produced a dataset to be as inputs of the WEKA program in order to compare the data mining tools (algorithms). We use the Naive Bayes, Decision Trees J48, Expectation-maximization for classifying and K-Star and Simple KMeans for clustering methods. The findings demonstrate the difference between the algorithms according to the accuracy, and the time consuming to reach the result in each algorithm. Furthermore, the effect of the software size is significant on defect proneness. Finally, the experiments are conducted in WEKA with the aim of this research is finding out the best algorithm in terms of accuracy and timeconsuming. At the end, the paper will be figuring out the best algorithm in order to choose and depending on it in the tests of classification and clustering.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于开源软件缺陷规模效应数据集的数据挖掘算法研究
本文主要从正确率和耗时两个方面讨论数据挖掘算法的质量。因此,为了在分类和聚类算法中找出最好的算法,WEKA程序将使用来自开源软件缺陷倾向的大小效应的真实数据集测试所有算法。本文以Mozilla产品为例介绍开源软件。本文使用的数据集代表了开源软件中缺陷倾向性的大小效应研究的输出。对Mozilla产品的研究表明,软件的大小与软件中缺陷倾向的数量之间存在显著的关系。Mozilla产品研究生成了一个数据集作为WEKA程序的输入,以便比较数据挖掘工具(算法)。我们使用朴素贝叶斯,决策树J48,期望最大化分类和K-Star和简单的k - means聚类方法。研究结果显示了不同算法在精度上的差异,以及每种算法达到结果所需的时间。此外,软件大小对缺陷倾向的影响是显著的。最后,在WEKA中进行了实验,目的是找出在准确率和耗时方面最好的算法。最后,本文将找出最佳算法,以便在分类和聚类测试中进行选择和依赖。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
3
审稿时长
10 weeks
期刊最新文献
Effect of Nd:YAG Laser’s Wavelength on the Optical Properties of Agarose Thin Films Determination of some hematological, biochemical parameters and vitamin D receptor gene polymorphism in Kurdish patients with COVID-19 in Erbil city Study of positive parity states form factors for 17O nucleus with Skyrme-Hartree-Fock method A New Paired Spectral Gradient Method to Improve Unconstrained and Non-Linear Optimization The Impact of the Normality of Two Types of Chemical Solutions on the Etching Rates of Nuclear Track detector CR-39
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1