{"title":"CCGraph","authors":"Y. Zou, B. Ban, Yinxing Xue, Yun Xu","doi":"10.1145/3324884.3416541","DOIUrl":null,"url":null,"abstract":"Software clone detection is an active research area, which is very important for software maintenance, bug detection, etc. The two pieces of cloned code reflect some similarities or equivalents in the syntax or structure of the code representations. There are many representations of code like AST, token, PDG, etc. The PDG (Program Dependency Graph) of source code can contain both syntactic and structural information. However, most existing PDG-based tools are quite time-consuming and miss many clones because they detect code clones with exact graph matching by using subgraph isomorphism. In this paper, we propose a novel PDG-based code clone detector, CCGraph, that uses graph kernels. Firstly, we normalize the structure of PDGs and design a two-stage filtering strategy by measuring the characteristic vectors of codes. Then we detect the code clones by using an approximate graph matching algorithm based on the reforming WL (Weisfeiler-Lehman) graph kernel. Experiment results show that CCGraph retains a high accuracy, has both better recall and F1-score values, and detects more semantic clones than other two related state-of-the-art tools. Besides, CCGraph is much more efficient than the existing PDG-based tools.","PeriodicalId":267160,"journal":{"name":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","volume":"122 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3324884.3416541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30
Abstract
Software clone detection is an active research area, which is very important for software maintenance, bug detection, etc. The two pieces of cloned code reflect some similarities or equivalents in the syntax or structure of the code representations. There are many representations of code like AST, token, PDG, etc. The PDG (Program Dependency Graph) of source code can contain both syntactic and structural information. However, most existing PDG-based tools are quite time-consuming and miss many clones because they detect code clones with exact graph matching by using subgraph isomorphism. In this paper, we propose a novel PDG-based code clone detector, CCGraph, that uses graph kernels. Firstly, we normalize the structure of PDGs and design a two-stage filtering strategy by measuring the characteristic vectors of codes. Then we detect the code clones by using an approximate graph matching algorithm based on the reforming WL (Weisfeiler-Lehman) graph kernel. Experiment results show that CCGraph retains a high accuracy, has both better recall and F1-score values, and detects more semantic clones than other two related state-of-the-art tools. Besides, CCGraph is much more efficient than the existing PDG-based tools.