Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones via Deep Learning

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2022-06-25 DOI:10.1145/3502852

Hao Yu, Xing Hu, Ge Li, Ying Li, Qianxiang Wang, Tao Xie

{"title":"Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones via Deep Learning","authors":"Hao Yu, Xing Hu, Ge Li, Ying Li, Qianxiang Wang, Tao Xie","doi":"10.1145/3502852","DOIUrl":null,"url":null,"abstract":"In recent years, applying deep learning to detect semantic code clones has received substantial attention from the research community. Accordingly, various evaluation benchmark datasets, with the most popular one as BigCloneBench, are constructed and selected as benchmarks to assess and compare different deep learning models for detecting semantic clones. However, there is no study to investigate whether an evaluation benchmark dataset such as BigCloneBench is properly used to evaluate models for detecting semantic code clones. In this article, we present an experimental study to show that BigCloneBench typically includes semantic clone pairs that use the same identifier names, which however are not used in non-semantic-clone pairs. Subsequently, we propose an undesirable-by-design Linear-Model that considers only which identifiers appear in a code fragment; this model can achieve high effectiveness for detecting semantic clones when evaluated on BigCloneBench, even comparable to state-of-the-art deep learning models recently proposed for detecting semantic clones. To alleviate these issues, we abstract a subset of the identifier names (including type, variable, and method names) in BigCloneBench to result in AbsBigCloneBench and use AbsBigCloneBench to better assess the effectiveness of deep learning models on the task of detecting semantic clones.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"14 1","pages":"1 - 25"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3502852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In recent years, applying deep learning to detect semantic code clones has received substantial attention from the research community. Accordingly, various evaluation benchmark datasets, with the most popular one as BigCloneBench, are constructed and selected as benchmarks to assess and compare different deep learning models for detecting semantic clones. However, there is no study to investigate whether an evaluation benchmark dataset such as BigCloneBench is properly used to evaluate models for detecting semantic code clones. In this article, we present an experimental study to show that BigCloneBench typically includes semantic clone pairs that use the same identifier names, which however are not used in non-semantic-clone pairs. Subsequently, we propose an undesirable-by-design Linear-Model that considers only which identifiers appear in a code fragment; this model can achieve high effectiveness for detecting semantic clones when evaluated on BigCloneBench, even comparable to state-of-the-art deep learning models recently proposed for detecting semantic clones. To alleviate these issues, we abstract a subset of the identifier names (including type, variable, and method names) in BigCloneBench to result in AbsBigCloneBench and use AbsBigCloneBench to better assess the effectiveness of deep learning models on the task of detecting semantic clones.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度学习的语义代码克隆检测评估数据集的评估与改进

近年来，应用深度学习来检测语义代码克隆受到了研究界的广泛关注。因此，构建各种评估基准数据集，并选择最流行的BigCloneBench作为基准，评估和比较不同的深度学习模型检测语义克隆。然而，没有研究调查评估基准数据集(如BigCloneBench)是否适合用于评估检测语义代码克隆的模型。在本文中，我们提出了一项实验研究，表明BigCloneBench通常包括使用相同标识符名称的语义克隆对，而非语义克隆对中不使用这些标识符名称。随后，我们提出了一个不受欢迎的设计线性模型，只考虑哪些标识符出现在代码片段中;当在BigCloneBench上进行评估时，该模型在检测语义克隆方面可以达到很高的效率，甚至可以与最近提出的用于检测语义克隆的最先进的深度学习模型相媲美。为了缓解这些问题，我们在BigCloneBench中抽象了标识符名称的一个子集(包括类型、变量和方法名称)，从而产生了AbsBigCloneBench，并使用AbsBigCloneBench来更好地评估深度学习模型在检测语义克隆任务上的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Software Engineering and Methodology (TOSEM)

自引率

0.00%

发文量

期刊最新文献

Turnover of Companies in OpenStack: Prevalence and Rationale Super-optimization of Smart Contracts Verification of Programs Sensitive to Heap Layout Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones via Deep Learning Guaranteeing Timed Opacity using Parametric Timed Model Checking