一种基于设计的深度学习模型解决模糊分离的有效软件漏洞检测方法

Proceedings of the 2022 3rd International Symposium on Big Data and Artificial Intelligence Pub Date : 2022-12-09 DOI:10.1145/3598438.3598452

Yuankun Liu, Yu Wang

{"title":"一种基于设计的深度学习模型解决模糊分离的有效软件漏洞检测方法","authors":"Yuankun Liu, Yu Wang","doi":"10.1145/3598438.3598452","DOIUrl":null,"url":null,"abstract":"SVD(Software Vulnerability Detection) methods based on automated deep learning is critical in software safety, they are designable and promising. Several function-level deep-learning SVD methods achieve an accuracy of up to 0.97 on open-source C/C++ datasets. However, as vulnerable samples have a low proportion in existing open-source datasets, these methods suffer from high false negative rate, they fail to identify cross-domain software vulnerabilities for neglecting the imbalance and vague separation of existing datasets. This paper proposes a novel framework based on the SeqGAN and TextCNN to fix the vague separation of aggregated 7 open-source C/C++ datasets, therefore improving the performance of SVD. As a result, SeqGAN&TextCNN scores 0.9385 of F1 score, compared with merely adopting the TextCNN, the method achieves an increase of 119% in recall and 31.31% in precision, and from the separations plotted by t-SNE, SeqGAN effectively improves the separation of original datasets. SeqGAN&TextCNN detects more vulnerable samples with low false negative rate, the method’ s F1 score is 79.58% higher than that of leveraging the VulDeePecker on 7 open-source C/C++ datasets.","PeriodicalId":338722,"journal":{"name":"Proceedings of the 2022 3rd International Symposium on Big Data and Artificial Intelligence","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Effective Software Vulnerability Detection Method Based On Devised Deep-Learning Model To Fix The Vague Separation\",\"authors\":\"Yuankun Liu, Yu Wang\",\"doi\":\"10.1145/3598438.3598452\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SVD(Software Vulnerability Detection) methods based on automated deep learning is critical in software safety, they are designable and promising. Several function-level deep-learning SVD methods achieve an accuracy of up to 0.97 on open-source C/C++ datasets. However, as vulnerable samples have a low proportion in existing open-source datasets, these methods suffer from high false negative rate, they fail to identify cross-domain software vulnerabilities for neglecting the imbalance and vague separation of existing datasets. This paper proposes a novel framework based on the SeqGAN and TextCNN to fix the vague separation of aggregated 7 open-source C/C++ datasets, therefore improving the performance of SVD. As a result, SeqGAN&TextCNN scores 0.9385 of F1 score, compared with merely adopting the TextCNN, the method achieves an increase of 119% in recall and 31.31% in precision, and from the separations plotted by t-SNE, SeqGAN effectively improves the separation of original datasets. SeqGAN&TextCNN detects more vulnerable samples with low false negative rate, the method’ s F1 score is 79.58% higher than that of leveraging the VulDeePecker on 7 open-source C/C++ datasets.\",\"PeriodicalId\":338722,\"journal\":{\"name\":\"Proceedings of the 2022 3rd International Symposium on Big Data and Artificial Intelligence\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 3rd International Symposium on Big Data and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3598438.3598452\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 3rd International Symposium on Big Data and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3598438.3598452","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基于自动深度学习的软件漏洞检测方法是软件安全的关键，具有可设计性和应用前景。几种函数级深度学习SVD方法在开源C/ c++数据集上实现了高达0.97的准确率。然而，由于漏洞样本在现有开源数据集中所占比例较低，这些方法存在较高的假阴性率，忽略了现有数据集的不平衡性和模糊分离，无法识别跨域软件漏洞。本文提出了一种基于SeqGAN和TextCNN的框架，解决了7个开源C/ c++数据集聚合后的模糊分离问题，从而提高了奇异值分解的性能。结果表明，SeqGAN和TextCNN的F1得分为0.9385，与单纯采用TextCNN相比，该方法的召回率提高了119%，准确率提高了31.31%，从t-SNE绘制的分离图来看，SeqGAN有效地提高了对原始数据集的分离。SeqGAN&TextCNN检测到更多的脆弱样本，假阴性率低，该方法的F1得分比利用VulDeePecker在7个开源C/ c++数据集上的得分高79.58%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Effective Software Vulnerability Detection Method Based On Devised Deep-Learning Model To Fix The Vague Separation

SVD(Software Vulnerability Detection) methods based on automated deep learning is critical in software safety, they are designable and promising. Several function-level deep-learning SVD methods achieve an accuracy of up to 0.97 on open-source C/C++ datasets. However, as vulnerable samples have a low proportion in existing open-source datasets, these methods suffer from high false negative rate, they fail to identify cross-domain software vulnerabilities for neglecting the imbalance and vague separation of existing datasets. This paper proposes a novel framework based on the SeqGAN and TextCNN to fix the vague separation of aggregated 7 open-source C/C++ datasets, therefore improving the performance of SVD. As a result, SeqGAN&TextCNN scores 0.9385 of F1 score, compared with merely adopting the TextCNN, the method achieves an increase of 119% in recall and 31.31% in precision, and from the separations plotted by t-SNE, SeqGAN effectively improves the separation of original datasets. SeqGAN&TextCNN detects more vulnerable samples with low false negative rate, the method’ s F1 score is 79.58% higher than that of leveraging the VulDeePecker on 7 open-source C/C++ datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2022 3rd International Symposium on Big Data and Artificial Intelligence

自引率

0.00%

发文量