SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge

Farouq Al-Omari, C. Roy, Tonghao Chen
{"title":"SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge","authors":"Farouq Al-Omari, C. Roy, Tonghao Chen","doi":"10.1109/IWSC50091.2020.9047643","DOIUrl":null,"url":null,"abstract":"Not only do newly proposed code clone detection techniques, but existing techniques and tools also need to be evaluated and compared. This evaluation process could be done by assessing the reported clones manually or by using benchmarks. The main limitations of available benchmarks include: they are restricted to one programming language; they have a limited number of clone pairs that are confined within the selected system(s); they require manual validation; they do not support all types of code clones. To overcome these limitations, we proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation. Our technique is based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow. We applied automatic filtering, selection and validation to the source code in Stack Overflow answers. Finally, we build a semantic code clone benchmark of 4000 clones pairs for the languages Java, C, C# and Python.","PeriodicalId":127830,"journal":{"name":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 14th International Workshop on Software Clones (IWSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWSC50091.2020.9047643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Not only do newly proposed code clone detection techniques, but existing techniques and tools also need to be evaluated and compared. This evaluation process could be done by assessing the reported clones manually or by using benchmarks. The main limitations of available benchmarks include: they are restricted to one programming language; they have a limited number of clone pairs that are confined within the selected system(s); they require manual validation; they do not support all types of code clones. To overcome these limitations, we proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation. Our technique is based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow. We applied automatic filtering, selection and validation to the source code in Stack Overflow answers. Finally, we build a semantic code clone benchmark of 4000 clones pairs for the languages Java, C, C# and Python.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SemanticCloneBench:使用众源知识的语义代码克隆基准
不仅需要对新提出的代码克隆检测技术进行评估和比较,还需要对现有的技术和工具进行评估和比较。此评估过程可以通过手动评估报告的克隆或使用基准来完成。可用基准测试的主要限制包括:它们仅限于一种编程语言;它们在选定的系统中有有限数量的克隆对;它们需要手动验证;它们不支持所有类型的代码克隆。为了克服这些限制,我们提出了一种方法,以最少的人工验证为不同的编程语言生成广泛的语义克隆基准。我们的技术是基于参与众包信息网站Stack Overflow的开发人员提供的知识。我们对Stack Overflow答案中的源代码应用了自动过滤、选择和验证。最后,我们为Java、C、c#和Python语言构建了一个包含4000个克隆对的语义代码克隆基准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Clone Swarm: A Cloud Based Code-Clone Analysis Tool Blanker: A Refactor-Oriented Cloned Source Code Normalizer Improving Syntactical Clone Detection Methods through the Use of an Intermediate Representation Clone Detection on Large Scala Codebases Comparison and Visualization of Code Clone Detection Results
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1