{"title":"基于 BERT 预训练模型的语义代码克隆检测","authors":"Zekai Cheng, Jiahao Hu, Yongkang Guo, Xiaoke Li","doi":"10.1117/12.3031928","DOIUrl":null,"url":null,"abstract":"Clone detection of source code is one of the most fundamental software engineering techniques. Although intensive research has been conducted in the past few years, it has more often addressed syntactic code clone, and there are still a number of problems in detecting semantic code clone. In this paper, we propose an approach that uses C/C++ code to finetune the Bert pre-training model so that it better understands the syntactic and semantic features of the C/C++ code, thus enabling better source code similarity evaluation. We evaluated our approach on a large C/C++ code clone dataset and the results show that our approach achieves excellent semantic code clone detection.","PeriodicalId":342847,"journal":{"name":"International Conference on Algorithms, Microchips and Network Applications","volume":" 44","pages":"131711K - 131711K-7"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic code clone detection based on BERT pre-trained model\",\"authors\":\"Zekai Cheng, Jiahao Hu, Yongkang Guo, Xiaoke Li\",\"doi\":\"10.1117/12.3031928\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clone detection of source code is one of the most fundamental software engineering techniques. Although intensive research has been conducted in the past few years, it has more often addressed syntactic code clone, and there are still a number of problems in detecting semantic code clone. In this paper, we propose an approach that uses C/C++ code to finetune the Bert pre-training model so that it better understands the syntactic and semantic features of the C/C++ code, thus enabling better source code similarity evaluation. We evaluated our approach on a large C/C++ code clone dataset and the results show that our approach achieves excellent semantic code clone detection.\",\"PeriodicalId\":342847,\"journal\":{\"name\":\"International Conference on Algorithms, Microchips and Network Applications\",\"volume\":\" 44\",\"pages\":\"131711K - 131711K-7\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Algorithms, Microchips and Network Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.3031928\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithms, Microchips and Network Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3031928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semantic code clone detection based on BERT pre-trained model
Clone detection of source code is one of the most fundamental software engineering techniques. Although intensive research has been conducted in the past few years, it has more often addressed syntactic code clone, and there are still a number of problems in detecting semantic code clone. In this paper, we propose an approach that uses C/C++ code to finetune the Bert pre-training model so that it better understands the syntactic and semantic features of the C/C++ code, thus enabling better source code similarity evaluation. We evaluated our approach on a large C/C++ code clone dataset and the results show that our approach achieves excellent semantic code clone detection.