Using machine learning to design a flexible LOC counter

Miroslaw Ochodek, M. Staron, Dominik Bargowski, Wilhelm Meding, R. Hebig
{"title":"Using machine learning to design a flexible LOC counter","authors":"Miroslaw Ochodek, M. Staron, Dominik Bargowski, Wilhelm Meding, R. Hebig","doi":"10.1109/MALTESQUE.2017.7882011","DOIUrl":null,"url":null,"abstract":"The results of counting the size of programs in terms of Lines-of-Code (LOC) depends on the rules used for counting (i.e. definition of which lines should be counted). In the majority of the measurement tools, the rules are statically coded in the tool and the users of the measurement tools do not know which lines were counted and which were not. The goal of our research is to investigate how to use machine learning to teach a measurement tool which lines should be counted and which should not. Our interest is to identify which parameters of the learning algorithm can be used to classify lines to be counted. Our research is based on the design science research methodology where we construct a measurement tool based on machine learning and evaluate it based on open source programs. As a training set, we use industry professionals to classify which lines should be counted. The results show that classifying the lines as to be counted or not has an average accuracy varying between 0.90 and 0.99 measured as Matthew's Correlation Coefficient and between 95% and nearly 100% measured as the percentage of correctly classified lines. Based on the results we conclude that using machine learning algorithms as the core of modern measurement instruments has a large potential and should be explored further.","PeriodicalId":153927,"journal":{"name":"2017 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MALTESQUE.2017.7882011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

The results of counting the size of programs in terms of Lines-of-Code (LOC) depends on the rules used for counting (i.e. definition of which lines should be counted). In the majority of the measurement tools, the rules are statically coded in the tool and the users of the measurement tools do not know which lines were counted and which were not. The goal of our research is to investigate how to use machine learning to teach a measurement tool which lines should be counted and which should not. Our interest is to identify which parameters of the learning algorithm can be used to classify lines to be counted. Our research is based on the design science research methodology where we construct a measurement tool based on machine learning and evaluate it based on open source programs. As a training set, we use industry professionals to classify which lines should be counted. The results show that classifying the lines as to be counted or not has an average accuracy varying between 0.90 and 0.99 measured as Matthew's Correlation Coefficient and between 95% and nearly 100% measured as the percentage of correctly classified lines. Based on the results we conclude that using machine learning algorithms as the core of modern measurement instruments has a large potential and should be explored further.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习设计一个灵活的LOC计数器
根据代码行数(LOC)计算程序大小的结果取决于用于计数的规则(即应该计数哪些行的定义)。在大多数度量工具中,规则在工具中是静态编码的,并且度量工具的用户不知道哪些行被计数,哪些没有。我们研究的目标是研究如何使用机器学习来教测量工具哪些线应该计数,哪些线不应该计数。我们感兴趣的是确定学习算法的哪些参数可以用来对要计数的线进行分类。我们的研究基于设计科学的研究方法,我们构建了一个基于机器学习的测量工具,并基于开源程序对其进行评估。作为一个训练集,我们使用行业专业人士来分类哪些行应该被计数。结果表明,对需要计数或不需要计数的线进行分类的平均准确率在马修相关系数为0.90 ~ 0.99之间,正确分类线的百分比为95% ~近100%之间。基于这些结果,我们得出结论,使用机器学习算法作为现代测量仪器的核心具有很大的潜力,应该进一步探索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Using machine learning to design a flexible LOC counter Investigating code smell co-occurrences using association rule learning: A replicated study Machine learning for finding bugs: An initial report Hyperparameter optimization to improve bug prediction accuracy Automatic feature selection by regularization to improve bug prediction accuracy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1