基于抽象语法树的源代码克隆检测算法的开发

Yevhenii Kubiuk, Gennadiy Kyselov
{"title":"基于抽象语法树的源代码克隆检测算法的开发","authors":"Yevhenii Kubiuk, Gennadiy Kyselov","doi":"10.15587/2706-5448.2023.286472","DOIUrl":null,"url":null,"abstract":"The object of research of this work is the algorithm for searching for duplicates in the program code based on the Abstract Syntaxes Tree (AST). The main tasks solved within the framework of this study are the detection of duplicate code and the search for vulnerabilities in the program code. The obtained results showed that the proposed algorithm is resistant to type 1 and 2 clones, which means its effectiveness in detecting similar code fragments with identical or variant text. However, for type 3 and 4 clones, the algorithm may show less efficiency due to the change in the AST structure for these types of clones. Experimental studies of the proposed algorithm showed that the algorithm can detect matches between unrelated files due to the presence of typical AST chains present in many programs. This can lead to a certain level of false positives in the detection of duplicates. Testing of the algorithm in the task of finding vulnerabilities showed that: The best recognition is observed for the «SQL injection» vulnerability, but it also has the highest number of false positives. Memory leak and null pointer dereferencing vulnerabilities are detected with equal effectiveness and false positives. «Buffer overflow» has the lowest recognition rate but fewer false positives compared to «SQL injection». The study showed that the use of AST allows for the effective detection of duplicate code and vulnerabilities in the software code. The developed tool can help software developers reduce maintenance efforts, improve code quality, and ensure software product security.","PeriodicalId":22480,"journal":{"name":"Technology audit and production reserves","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development of an algorithm for code clone detection in source code based on abstract syntax tree\",\"authors\":\"Yevhenii Kubiuk, Gennadiy Kyselov\",\"doi\":\"10.15587/2706-5448.2023.286472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The object of research of this work is the algorithm for searching for duplicates in the program code based on the Abstract Syntaxes Tree (AST). The main tasks solved within the framework of this study are the detection of duplicate code and the search for vulnerabilities in the program code. The obtained results showed that the proposed algorithm is resistant to type 1 and 2 clones, which means its effectiveness in detecting similar code fragments with identical or variant text. However, for type 3 and 4 clones, the algorithm may show less efficiency due to the change in the AST structure for these types of clones. Experimental studies of the proposed algorithm showed that the algorithm can detect matches between unrelated files due to the presence of typical AST chains present in many programs. This can lead to a certain level of false positives in the detection of duplicates. Testing of the algorithm in the task of finding vulnerabilities showed that: The best recognition is observed for the «SQL injection» vulnerability, but it also has the highest number of false positives. Memory leak and null pointer dereferencing vulnerabilities are detected with equal effectiveness and false positives. «Buffer overflow» has the lowest recognition rate but fewer false positives compared to «SQL injection». The study showed that the use of AST allows for the effective detection of duplicate code and vulnerabilities in the software code. The developed tool can help software developers reduce maintenance efforts, improve code quality, and ensure software product security.\",\"PeriodicalId\":22480,\"journal\":{\"name\":\"Technology audit and production reserves\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Technology audit and production reserves\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15587/2706-5448.2023.286472\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technology audit and production reserves","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15587/2706-5448.2023.286472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文的研究对象是基于抽象语法树(AST)的程序代码重复项搜索算法。在本研究框架内解决的主要任务是检测重复代码和查找程序代码中的漏洞。实验结果表明,该算法具有抗1型和2型克隆的能力,能够有效地检测出具有相同或不同文本的相似代码片段。然而,对于类型3和4克隆,由于这些类型克隆的AST结构发生了变化,该算法可能会显示出较低的效率。实验研究表明,由于许多程序中存在典型的AST链,该算法可以检测出不相关文件之间的匹配。这可能导致在重复检测中出现一定程度的假阳性。在寻找漏洞的任务中对算法进行的测试表明:对“SQL注入”漏洞的识别效果最好,但它的误报次数也最多。内存泄漏和空指针解引用漏洞以相同的有效性和误报检测。与“SQL注入”相比,“缓冲区溢出”的识别率最低,但误报较少。研究表明,使用AST可以有效地检测软件代码中的重复代码和漏洞。开发的工具可以帮助软件开发人员减少维护工作量,提高代码质量,并确保软件产品的安全性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Development of an algorithm for code clone detection in source code based on abstract syntax tree
The object of research of this work is the algorithm for searching for duplicates in the program code based on the Abstract Syntaxes Tree (AST). The main tasks solved within the framework of this study are the detection of duplicate code and the search for vulnerabilities in the program code. The obtained results showed that the proposed algorithm is resistant to type 1 and 2 clones, which means its effectiveness in detecting similar code fragments with identical or variant text. However, for type 3 and 4 clones, the algorithm may show less efficiency due to the change in the AST structure for these types of clones. Experimental studies of the proposed algorithm showed that the algorithm can detect matches between unrelated files due to the presence of typical AST chains present in many programs. This can lead to a certain level of false positives in the detection of duplicates. Testing of the algorithm in the task of finding vulnerabilities showed that: The best recognition is observed for the «SQL injection» vulnerability, but it also has the highest number of false positives. Memory leak and null pointer dereferencing vulnerabilities are detected with equal effectiveness and false positives. «Buffer overflow» has the lowest recognition rate but fewer false positives compared to «SQL injection». The study showed that the use of AST allows for the effective detection of duplicate code and vulnerabilities in the software code. The developed tool can help software developers reduce maintenance efforts, improve code quality, and ensure software product security.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
89
审稿时长
8 weeks
期刊最新文献
Technology audit of the Nigerian agricultural sector: towards food security Estimation of global nanomedicine market: status, segment analysis, dynamics, competition and prospects Exploring the possibility of undesirable manufacturing heritage reduction in parts made of composites and their joints Comprehensive physicochemical characterization of Algerian coal powders for the engineering of advanced sustainable materials Research into arsenic (III) effective catalytic oxidation in an aqueous solution on a new active manganese dioxide in a flow column
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1