Using machine learning to design a flexible LOC counter

2017 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE) Pub Date : 2017-02-21 DOI:10.1109/MALTESQUE.2017.7882011

Miroslaw Ochodek, M. Staron, Dominik Bargowski, Wilhelm Meding, R. Hebig

{"title":"Using machine learning to design a flexible LOC counter","authors":"Miroslaw Ochodek, M. Staron, Dominik Bargowski, Wilhelm Meding, R. Hebig","doi":"10.1109/MALTESQUE.2017.7882011","DOIUrl":null,"url":null,"abstract":"The results of counting the size of programs in terms of Lines-of-Code (LOC) depends on the rules used for counting (i.e. definition of which lines should be counted). In the majority of the measurement tools, the rules are statically coded in the tool and the users of the measurement tools do not know which lines were counted and which were not. The goal of our research is to investigate how to use machine learning to teach a measurement tool which lines should be counted and which should not. Our interest is to identify which parameters of the learning algorithm can be used to classify lines to be counted. Our research is based on the design science research methodology where we construct a measurement tool based on machine learning and evaluate it based on open source programs. As a training set, we use industry professionals to classify which lines should be counted. The results show that classifying the lines as to be counted or not has an average accuracy varying between 0.90 and 0.99 measured as Matthew's Correlation Coefficient and between 95% and nearly 100% measured as the percentage of correctly classified lines. Based on the results we conclude that using machine learning algorithms as the core of modern measurement instruments has a large potential and should be explored further.","PeriodicalId":153927,"journal":{"name":"2017 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MALTESQUE.2017.7882011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

The results of counting the size of programs in terms of Lines-of-Code (LOC) depends on the rules used for counting (i.e. definition of which lines should be counted). In the majority of the measurement tools, the rules are statically coded in the tool and the users of the measurement tools do not know which lines were counted and which were not. The goal of our research is to investigate how to use machine learning to teach a measurement tool which lines should be counted and which should not. Our interest is to identify which parameters of the learning algorithm can be used to classify lines to be counted. Our research is based on the design science research methodology where we construct a measurement tool based on machine learning and evaluate it based on open source programs. As a training set, we use industry professionals to classify which lines should be counted. The results show that classifying the lines as to be counted or not has an average accuracy varying between 0.90 and 0.99 measured as Matthew's Correlation Coefficient and between 95% and nearly 100% measured as the percentage of correctly classified lines. Based on the results we conclude that using machine learning algorithms as the core of modern measurement instruments has a large potential and should be explored further.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用机器学习设计一个灵活的LOC计数器

根据代码行数(LOC)计算程序大小的结果取决于用于计数的规则(即应该计数哪些行的定义)。在大多数度量工具中，规则在工具中是静态编码的，并且度量工具的用户不知道哪些行被计数，哪些没有。我们研究的目标是研究如何使用机器学习来教测量工具哪些线应该计数，哪些线不应该计数。我们感兴趣的是确定学习算法的哪些参数可以用来对要计数的线进行分类。我们的研究基于设计科学的研究方法，我们构建了一个基于机器学习的测量工具，并基于开源程序对其进行评估。作为一个训练集，我们使用行业专业人士来分类哪些行应该被计数。结果表明，对需要计数或不需要计数的线进行分类的平均准确率在马修相关系数为0.90 ~ 0.99之间，正确分类线的百分比为95% ~近100%之间。基于这些结果，我们得出结论，使用机器学习算法作为现代测量仪器的核心具有很大的潜力，应该进一步探索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)

自引率

0.00%

发文量

期刊最新文献

Using machine learning to design a flexible LOC counter Investigating code smell co-occurrences using association rule learning: A replicated study Machine learning for finding bugs: An initial report Hyperparameter optimization to improve bug prediction accuracy Automatic feature selection by regularization to improve bug prediction accuracy