HardVD: High-capacity cross-modal adversarial reprogramming for data-efficient vulnerability detection

IF 8.1 1区计算机科学 N/A COMPUTER SCIENCE, INFORMATION SYSTEMS Information Sciences Pub Date : 2024-08-22 DOI:10.1016/j.ins.2024.121370

{"title":"HardVD: High-capacity cross-modal adversarial reprogramming for data-efficient vulnerability detection","authors":"","doi":"10.1016/j.ins.2024.121370","DOIUrl":null,"url":null,"abstract":"<div>The substantial proliferation of software vulnerabilities poses a persistent threat to system security, driving increased interest in applying deep learning (DL) for vulnerability detection. However, DL-based detectors often operate with a fixed number of input tokens, leading to semantic loss over large code snippets. Additionally, developing these detectors demands substantial labeled data and training time. To address these limitations, this paper proposes HardVD, which explores High-capacity cross-modal adversarial reprogramming for data-efficient Vulnerability Detection. HardVD devises a high-capacity semantic extractor to capture salient features per line of code, which are then arranged as patches to form an image representing the target function. These images are processed using convolutional filters as universal perturbations and non-parametric label remapping to adapt a pretrained Vision Transformer (ViT) for vulnerability detection, updating only the limited parameters of the perturbation filters during training. Extensive experiments demonstrate that HardVD outperforms DL-based baselines in terms of detection effectiveness, data-limited performance, and computational overhead. The ablation study also confirms the essential role of our high-capacity semantic extractor, without which an averaged relative decrease of 5.87% and 7.98% in accuracy and F1 score is observed, respectively.</div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":null,"pages":null},"PeriodicalIF":8.1000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524012842","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"N/A","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The substantial proliferation of software vulnerabilities poses a persistent threat to system security, driving increased interest in applying deep learning (DL) for vulnerability detection. However, DL-based detectors often operate with a fixed number of input tokens, leading to semantic loss over large code snippets. Additionally, developing these detectors demands substantial labeled data and training time. To address these limitations, this paper proposes HardVD, which explores High-capacity cross-modal adversarial reprogramming for data-efficient Vulnerability Detection. HardVD devises a high-capacity semantic extractor to capture salient features per line of code, which are then arranged as patches to form an image representing the target function. These images are processed using convolutional filters as universal perturbations and non-parametric label remapping to adapt a pretrained Vision Transformer (ViT) for vulnerability detection, updating only the limited parameters of the perturbation filters during training. Extensive experiments demonstrate that HardVD outperforms DL-based baselines in terms of detection effectiveness, data-limited performance, and computational overhead. The ablation study also confirms the essential role of our high-capacity semantic extractor, without which an averaged relative decrease of 5.87% and 7.98% in accuracy and F1 score is observed, respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

HardVD：高容量跨模式对抗重编程，实现数据高效漏洞检测

软件漏洞的大量涌现对系统安全构成了持续威胁，促使人们对应用深度学习（DL）进行漏洞检测的兴趣与日俱增。然而，基于深度学习的检测器通常使用固定数量的输入标记进行操作，导致大量代码片段的语义损失。此外，开发这些检测器需要大量的标记数据和训练时间。为了解决这些局限性，本文提出了 HardVD，探索高容量跨模式对抗重编程，以实现数据高效的漏洞检测。HardVD 设计了一种大容量语义提取器，用于捕捉每行代码的显著特征，然后将这些特征排列成补丁，形成代表目标函数的图像。在处理这些图像时，使用卷积滤波器作为通用扰动和非参数标签重映射，以调整用于漏洞检测的预训练视觉变换器（ViT），在训练过程中只更新扰动滤波器的有限参数。大量实验证明，HardVD 在检测效果、数据限制性能和计算开销方面都优于基于 DL 的基线。消融研究还证实了我们的大容量语义提取器的重要作用，如果没有它，准确率和 F1 分数的平均相对降幅分别为 5.87% 和 7.98%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.

期刊最新文献

Ex-RL: Experience-based reinforcement learning Editorial Board Joint consensus kernel learning and adaptive hypergraph regularization for graph-based clustering RT-DIFTWD: A novel data-driven intuitionistic fuzzy three-way decision model with regret theory Granular correlation-based label-specific feature augmentation for multi-label classification