基于深度学习的源代码漏洞分析调查

IF 4.8 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Computers & Security Pub Date : 2024-09-03 DOI:10.1016/j.cose.2024.104098

Chen Liang, Qiang Wei, Jiang Du, Yisen Wang, Zirui Jiang

{"title":"基于深度学习的源代码漏洞分析调查","authors":"Chen Liang, Qiang Wei, Jiang Du, Yisen Wang, Zirui Jiang","doi":"10.1016/j.cose.2024.104098","DOIUrl":null,"url":null,"abstract":"<div><p>Amidst the rapid development of the software industry and the burgeoning open-source culture, vulnerability detection within the software security domain has emerged as an ever-expanding area of focus. In recent years, the rapid advancement of artificial intelligence, particularly the notable progress in deep learning for pattern recognition and natural language processing, has catalyzed a surge in research endeavors exploring the integration of deep learning for the enhancement of vulnerability detection techniques. In this paper, we investigate contemporary deep learning-based source code analysis methods, with a concentrated emphasis on those pertaining to static code vulnerability detection. We categorize these methods based on various representations of source code employed during the preprocessing stage, including token-based and graph-based representations of source code, and further subdivided based on the types of deep learning algorithms or graph representations employed. We summarize the basic processes of model training and vulnerability detection under these different representation formats. Furthermore, we explore the limitations inherent in current approaches and provide insights into future trends and challenges for research in this field.</p></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"148 ","pages":"Article 104098"},"PeriodicalIF":4.8000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Survey of source code vulnerability analysis based on deep learning\",\"authors\":\"Chen Liang, Qiang Wei, Jiang Du, Yisen Wang, Zirui Jiang\",\"doi\":\"10.1016/j.cose.2024.104098\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Amidst the rapid development of the software industry and the burgeoning open-source culture, vulnerability detection within the software security domain has emerged as an ever-expanding area of focus. In recent years, the rapid advancement of artificial intelligence, particularly the notable progress in deep learning for pattern recognition and natural language processing, has catalyzed a surge in research endeavors exploring the integration of deep learning for the enhancement of vulnerability detection techniques. In this paper, we investigate contemporary deep learning-based source code analysis methods, with a concentrated emphasis on those pertaining to static code vulnerability detection. We categorize these methods based on various representations of source code employed during the preprocessing stage, including token-based and graph-based representations of source code, and further subdivided based on the types of deep learning algorithms or graph representations employed. We summarize the basic processes of model training and vulnerability detection under these different representation formats. Furthermore, we explore the limitations inherent in current approaches and provide insights into future trends and challenges for research in this field.</p></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":\"148 \",\"pages\":\"Article 104098\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404824004036\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824004036","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

随着软件产业的快速发展和开源文化的蓬勃兴起，软件安全领域的漏洞检测已成为一个不断扩大的重点领域。近年来，人工智能的飞速发展，尤其是深度学习在模式识别和自然语言处理方面的显著进步，推动了探索深度学习与漏洞检测技术相结合的研究热潮。在本文中，我们研究了当代基于深度学习的源代码分析方法，重点是与静态代码漏洞检测相关的方法。我们根据预处理阶段采用的各种源代码表示法对这些方法进行分类，包括基于标记的源代码表示法和基于图的源代码表示法，并根据采用的深度学习算法或图表示法的类型进一步细分。我们总结了这些不同表示格式下模型训练和漏洞检测的基本流程。此外，我们还探讨了当前方法固有的局限性，并对该领域研究的未来趋势和挑战提出了见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Survey of source code vulnerability analysis based on deep learning

Amidst the rapid development of the software industry and the burgeoning open-source culture, vulnerability detection within the software security domain has emerged as an ever-expanding area of focus. In recent years, the rapid advancement of artificial intelligence, particularly the notable progress in deep learning for pattern recognition and natural language processing, has catalyzed a surge in research endeavors exploring the integration of deep learning for the enhancement of vulnerability detection techniques. In this paper, we investigate contemporary deep learning-based source code analysis methods, with a concentrated emphasis on those pertaining to static code vulnerability detection. We categorize these methods based on various representations of source code employed during the preprocessing stage, including token-based and graph-based representations of source code, and further subdivided based on the types of deep learning algorithms or graph representations employed. We summarize the basic processes of model training and vulnerability detection under these different representation formats. Furthermore, we explore the limitations inherent in current approaches and provide insights into future trends and challenges for research in this field.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.