学习对齐:代码克隆检测的代码对齐网络

Aiping Zhang, Kui Liu, Liming Fang, Qianjun Liu, Xinyu Yun, S. Ji
{"title":"学习对齐:代码克隆检测的代码对齐网络","authors":"Aiping Zhang, Kui Liu, Liming Fang, Qianjun Liu, Xinyu Yun, S. Ji","doi":"10.1109/APSEC53868.2021.00008","DOIUrl":null,"url":null,"abstract":"Deep learning techniques have achieved promising results in code clone detection in the past decade. However, existing techniques merely focus on how to extract more dis-criminative features from source codes, while some issues, such as structural differences of functional similar codes, are not explicitly addressed. This phenomenon is common when programmers copy a code segment along with adding or removing several statements, or use a more flexible syntax structure to implement the same function. In this paper, we unify the aforementioned problems as the problem of code misalignment, and propose a novel code alignment network to tackle it. We design a bi-directional causal convolutional neural network to extract feature representations of code fragments with rich structural and semantical information. After feature extraction, our method learns to align the two code fragments in a data-driven fashion. We present two independent strategies for code alignment, namely attention-based alignment and sparse reconstruction-based alignment. Both two strategies strive to learn an alignment matrix that represents the correspondences between two code fragments. Our method outperforms state-of-the-art methods in terms of F1 score by 0.5% and 3.1 % on BigCloneBench and OJClone, respectively11Our code is available at https://github.com/ArcticHare105/Code-Alignment.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Learn To Align: A Code Alignment Network For Code Clone Detection\",\"authors\":\"Aiping Zhang, Kui Liu, Liming Fang, Qianjun Liu, Xinyu Yun, S. Ji\",\"doi\":\"10.1109/APSEC53868.2021.00008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning techniques have achieved promising results in code clone detection in the past decade. However, existing techniques merely focus on how to extract more dis-criminative features from source codes, while some issues, such as structural differences of functional similar codes, are not explicitly addressed. This phenomenon is common when programmers copy a code segment along with adding or removing several statements, or use a more flexible syntax structure to implement the same function. In this paper, we unify the aforementioned problems as the problem of code misalignment, and propose a novel code alignment network to tackle it. We design a bi-directional causal convolutional neural network to extract feature representations of code fragments with rich structural and semantical information. After feature extraction, our method learns to align the two code fragments in a data-driven fashion. We present two independent strategies for code alignment, namely attention-based alignment and sparse reconstruction-based alignment. Both two strategies strive to learn an alignment matrix that represents the correspondences between two code fragments. Our method outperforms state-of-the-art methods in terms of F1 score by 0.5% and 3.1 % on BigCloneBench and OJClone, respectively11Our code is available at https://github.com/ArcticHare105/Code-Alignment.\",\"PeriodicalId\":143800,\"journal\":{\"name\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"volume\":\"150 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSEC53868.2021.00008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在过去的十年中,深度学习技术在代码克隆检测方面取得了可喜的成果。然而,现有的技术只关注如何从源代码中提取更多的区别特征,而一些问题,如功能相似代码的结构差异,没有明确解决。当程序员复制代码段并添加或删除几个语句时,或者使用更灵活的语法结构来实现相同的功能时,这种现象很常见。本文将上述问题统一为代码对齐问题,并提出了一种新的代码对齐网络来解决代码对齐问题。我们设计了一个双向因果卷积神经网络来提取具有丰富结构和语义信息的代码片段的特征表示。在特征提取之后,我们的方法学习以数据驱动的方式对齐两个代码片段。我们提出了两种独立的代码对齐策略,即基于注意力的对齐和基于稀疏重建的对齐。这两种策略都努力学习一个表示两个代码片段之间对应关系的对齐矩阵。在BigCloneBench和OJClone上,我们的方法在F1得分方面分别比最先进的方法高出0.5%和3.1%。11我们的代码可从https://github.com/ArcticHare105/Code-Alignment获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Learn To Align: A Code Alignment Network For Code Clone Detection
Deep learning techniques have achieved promising results in code clone detection in the past decade. However, existing techniques merely focus on how to extract more dis-criminative features from source codes, while some issues, such as structural differences of functional similar codes, are not explicitly addressed. This phenomenon is common when programmers copy a code segment along with adding or removing several statements, or use a more flexible syntax structure to implement the same function. In this paper, we unify the aforementioned problems as the problem of code misalignment, and propose a novel code alignment network to tackle it. We design a bi-directional causal convolutional neural network to extract feature representations of code fragments with rich structural and semantical information. After feature extraction, our method learns to align the two code fragments in a data-driven fashion. We present two independent strategies for code alignment, namely attention-based alignment and sparse reconstruction-based alignment. Both two strategies strive to learn an alignment matrix that represents the correspondences between two code fragments. Our method outperforms state-of-the-art methods in terms of F1 score by 0.5% and 3.1 % on BigCloneBench and OJClone, respectively11Our code is available at https://github.com/ArcticHare105/Code-Alignment.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Verification Assisted Gas Reduction for Smart Contracts Effective Bug Triage Based on a Hybrid Neural Network Learn To Align: A Code Alignment Network For Code Clone Detection Framework for Recommending Data Residency Compliant Application Architecture Degree doesn't Matter: Identifying the Drivers of Interaction in Software Development Ecosystems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1