Capturing source code semantics via tree-based convolution over API-enhanced AST

Proceedings of the 16th ACM International Conference on Computing Frontiers Pub Date : 2019-04-30 DOI:10.1145/3310273.3321560

Long Chen, Wei Ye, Shikun Zhang

引用次数: 20

Abstract

When deep learning meets big code, a key question is how to efficiently learn a distributed representation for source code that can capture its semantics effectively. We propose to use tree-based convolution over API-enhanced AST. To demonstrate the effectiveness of our approach, we apply it to detect semantic clones---code fragments with similar semantics but dissimilar syntax. Experiment results show that our approach outperforms an existing state-of-the-art approach that uses tree-based LSTM, with an increase of 0.39 and 0.12 in F1-score on OJClone and BigCloneBench respectively. We further propose architectures that incorporate our approach for code search and code summarization.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过基于树的卷积在api增强的AST上捕获源代码语义

当深度学习遇到大代码时，一个关键问题是如何有效地学习源代码的分布式表示，从而有效地捕获其语义。我们建议在api增强的AST上使用基于树的卷积。为了证明我们方法的有效性，我们将其应用于检测语义克隆——语义相似但语法不同的代码片段。实验结果表明，我们的方法优于现有的基于树的LSTM方法，在OJClone和BigCloneBench上的f1得分分别提高了0.39和0.12。我们进一步提出了包含我们的代码搜索和代码总结方法的架构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 16th ACM International Conference on Computing Frontiers

自引率

0.00%

发文量