{"title":"HardVD:高容量跨模式对抗重编程,实现数据高效漏洞检测","authors":"","doi":"10.1016/j.ins.2024.121370","DOIUrl":null,"url":null,"abstract":"<div><p>The substantial proliferation of software vulnerabilities poses a persistent threat to system security, driving increased interest in applying deep learning (DL) for vulnerability detection. However, DL-based detectors often operate with a fixed number of input tokens, leading to semantic loss over large code snippets. Additionally, developing these detectors demands substantial labeled data and training time. To address these limitations, this paper proposes HardVD, which explores <u>H</u>igh-capacity cross-modal <u>a</u>dversarial <u>r</u>eprogramming for <u>d</u>ata-efficient <u>V</u>ulnerability <u>D</u>etection. HardVD devises a high-capacity semantic extractor to capture salient features per line of code, which are then arranged as patches to form an image representing the target function. These images are processed using convolutional filters as universal perturbations and non-parametric label remapping to adapt a pretrained Vision Transformer (ViT) for vulnerability detection, updating only the limited parameters of the perturbation filters during training. Extensive experiments demonstrate that HardVD outperforms DL-based baselines in terms of detection effectiveness, data-limited performance, and computational overhead. The ablation study also confirms the essential role of our high-capacity semantic extractor, without which an averaged relative decrease of 5.87% and 7.98% in accuracy and F1 score is observed, respectively.</p></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":null,"pages":null},"PeriodicalIF":8.1000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HardVD: High-capacity cross-modal adversarial reprogramming for data-efficient vulnerability detection\",\"authors\":\"\",\"doi\":\"10.1016/j.ins.2024.121370\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The substantial proliferation of software vulnerabilities poses a persistent threat to system security, driving increased interest in applying deep learning (DL) for vulnerability detection. However, DL-based detectors often operate with a fixed number of input tokens, leading to semantic loss over large code snippets. Additionally, developing these detectors demands substantial labeled data and training time. To address these limitations, this paper proposes HardVD, which explores <u>H</u>igh-capacity cross-modal <u>a</u>dversarial <u>r</u>eprogramming for <u>d</u>ata-efficient <u>V</u>ulnerability <u>D</u>etection. HardVD devises a high-capacity semantic extractor to capture salient features per line of code, which are then arranged as patches to form an image representing the target function. These images are processed using convolutional filters as universal perturbations and non-parametric label remapping to adapt a pretrained Vision Transformer (ViT) for vulnerability detection, updating only the limited parameters of the perturbation filters during training. Extensive experiments demonstrate that HardVD outperforms DL-based baselines in terms of detection effectiveness, data-limited performance, and computational overhead. The ablation study also confirms the essential role of our high-capacity semantic extractor, without which an averaged relative decrease of 5.87% and 7.98% in accuracy and F1 score is observed, respectively.</p></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2024-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025524012842\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"N/A\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524012842","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"N/A","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
HardVD: High-capacity cross-modal adversarial reprogramming for data-efficient vulnerability detection
The substantial proliferation of software vulnerabilities poses a persistent threat to system security, driving increased interest in applying deep learning (DL) for vulnerability detection. However, DL-based detectors often operate with a fixed number of input tokens, leading to semantic loss over large code snippets. Additionally, developing these detectors demands substantial labeled data and training time. To address these limitations, this paper proposes HardVD, which explores High-capacity cross-modal adversarial reprogramming for data-efficient Vulnerability Detection. HardVD devises a high-capacity semantic extractor to capture salient features per line of code, which are then arranged as patches to form an image representing the target function. These images are processed using convolutional filters as universal perturbations and non-parametric label remapping to adapt a pretrained Vision Transformer (ViT) for vulnerability detection, updating only the limited parameters of the perturbation filters during training. Extensive experiments demonstrate that HardVD outperforms DL-based baselines in terms of detection effectiveness, data-limited performance, and computational overhead. The ablation study also confirms the essential role of our high-capacity semantic extractor, without which an averaged relative decrease of 5.87% and 7.98% in accuracy and F1 score is observed, respectively.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.