[Research Paper] On the Use of Machine Learning Techniques Towards the Design of Cloud Based Automatic Code Clone Validation Tools

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2018-09-01 DOI:10.1109/SCAM.2018.00025

Golam Mostaeen, Jeffrey Svajlenko, B. Roy, C. Roy, Kevin A. Schneider

{"title":"[Research Paper] On the Use of Machine Learning Techniques Towards the Design of Cloud Based Automatic Code Clone Validation Tools","authors":"Golam Mostaeen, Jeffrey Svajlenko, B. Roy, C. Roy, Kevin A. Schneider","doi":"10.1109/SCAM.2018.00025","DOIUrl":null,"url":null,"abstract":"A code clone is a pair of code fragments, within or between software systems that are similar. Since code clones often negatively impact the maintainability of a software system, a great many numbers of code clone detection techniques and tools have been proposed and studied over the last decade. To detect all possible similar source code patterns in general, the clone detection tools work on syntax level (such as texts, tokens, AST and so on) while lacking user-specific preferences. This often means the reported clones must be manually validated prior to any analysis in order to filter out the true positive clones from task or user-specific considerations. This manual clone validation effort is very time-consuming and often error-prone, in particular for large-scale clone detection. In this paper, we propose a machine learning based approach for automating the validation process. In an experiment with clones detected by several clone detectors in several different software systems, we found our approach has an accuracy of up to 87.4% when compared against the manual validation by multiple expert judges. The proposed method shows promising results in several comparative studies with the existing related approaches for automatic code clone validation. We also present our experimental results in terms of different code clone detection tools, machine learning algorithms and open source software systems.","PeriodicalId":127335,"journal":{"name":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM.2018.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

A code clone is a pair of code fragments, within or between software systems that are similar. Since code clones often negatively impact the maintainability of a software system, a great many numbers of code clone detection techniques and tools have been proposed and studied over the last decade. To detect all possible similar source code patterns in general, the clone detection tools work on syntax level (such as texts, tokens, AST and so on) while lacking user-specific preferences. This often means the reported clones must be manually validated prior to any analysis in order to filter out the true positive clones from task or user-specific considerations. This manual clone validation effort is very time-consuming and often error-prone, in particular for large-scale clone detection. In this paper, we propose a machine learning based approach for automating the validation process. In an experiment with clones detected by several clone detectors in several different software systems, we found our approach has an accuracy of up to 87.4% when compared against the manual validation by multiple expert judges. The proposed method shows promising results in several comparative studies with the existing related approaches for automatic code clone validation. We also present our experimental results in terms of different code clone detection tools, machine learning algorithms and open source software systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

[研究论文]机器学习技术在基于云的自动代码克隆验证工具设计中的应用

代码克隆是一对代码片段，存在于相似的软件系统内部或系统之间。由于代码克隆通常会对软件系统的可维护性产生负面影响，因此在过去的十年中，已经提出和研究了大量的代码克隆检测技术和工具。一般来说，为了检测所有可能的类似源代码模式，克隆检测工具在语法级别(如文本、令牌、AST等)上工作，同时缺乏用户特定的首选项。这通常意味着必须在任何分析之前手动验证报告的克隆，以便从任务或用户特定的考虑中过滤出真正的阳性克隆。这种手动克隆验证工作非常耗时，而且经常容易出错，特别是对于大规模克隆检测而言。在本文中，我们提出了一种基于机器学习的方法来自动化验证过程。在几个不同软件系统中由几个克隆检测器检测克隆的实验中，我们发现与多个专家法官手动验证相比，我们的方法的准确率高达87.4%。该方法与现有的代码克隆自动验证方法进行了对比研究，取得了良好的效果。我们还介绍了我们在不同代码克隆检测工具、机器学习算法和开源软件系统方面的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)

自引率

0.00%

发文量

期刊最新文献

[Research Paper] Untangling Composite Commits Using Program Slicing [Engineering Paper] Built-in Clone Detection in Meta Languages [Research Paper] Static JavaScript Call Graphs: A Comparative Study [Engineering Paper] Challenges of Implementing Cross Translation Unit Analysis in Clang Static Analyzer [Engineering Paper] Graal: The Quest for Source Code Knowledge