Exploring Semantic Redundancy using Backdoor Triggers: A Complementary Insight into the Challenges facing DNN-based Software Vulnerability Detection

IF 6.6 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING ACM Transactions on Software Engineering and Methodology Pub Date : 2024-01-24 DOI:10.1145/3640333
Changjie Shao, Gaolei Li, Jun Wu, Xi Zheng
{"title":"Exploring Semantic Redundancy using Backdoor Triggers: A Complementary Insight into the Challenges facing DNN-based Software Vulnerability Detection","authors":"Changjie Shao, Gaolei Li, Jun Wu, Xi Zheng","doi":"10.1145/3640333","DOIUrl":null,"url":null,"abstract":"<p>To detect software vulnerabilities with better performance, deep neural networks (DNNs) have received extensive attention recently. However, these vulnerability detection DNN models trained with code representations are vulnerable to specific perturbations on code representations. This motivates us to rethink the bane of software vulnerability detection and find function-agnostic features during code representation which we name as semantic redundant features. This paper first identifies a tight correlation between function-agnostic triggers and semantic redundant feature space (where the redundant features reside) in these DNN models. For correlation identification, we propose a novel Backdoor-based Semantic Redundancy Exploration (BSemRE) framework. In BSemRE, the sensitivity of the trained models to function-agnostic triggers is observed to verify the existence of semantic redundancy in various code representations. Specifically, acting as the typical manifestations of semantic redundancy, naming conventions, ternary operators and identically-true conditions are exploited to generate function-agnostic triggers. Extensive comparative experiments on 1613823 samples of 8 representative vulnerability datasets and state-of-the-art code representation techniques and vulnerability detection models demonstrate that the existence of semantic redundancy determines the upper trustworthiness limit of DNN-based software vulnerability detection. To the best of our knowledge, this is the first work exploring the bane of software vulnerability detection using backdoor triggers.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"5 1","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3640333","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

To detect software vulnerabilities with better performance, deep neural networks (DNNs) have received extensive attention recently. However, these vulnerability detection DNN models trained with code representations are vulnerable to specific perturbations on code representations. This motivates us to rethink the bane of software vulnerability detection and find function-agnostic features during code representation which we name as semantic redundant features. This paper first identifies a tight correlation between function-agnostic triggers and semantic redundant feature space (where the redundant features reside) in these DNN models. For correlation identification, we propose a novel Backdoor-based Semantic Redundancy Exploration (BSemRE) framework. In BSemRE, the sensitivity of the trained models to function-agnostic triggers is observed to verify the existence of semantic redundancy in various code representations. Specifically, acting as the typical manifestations of semantic redundancy, naming conventions, ternary operators and identically-true conditions are exploited to generate function-agnostic triggers. Extensive comparative experiments on 1613823 samples of 8 representative vulnerability datasets and state-of-the-art code representation techniques and vulnerability detection models demonstrate that the existence of semantic redundancy determines the upper trustworthiness limit of DNN-based software vulnerability detection. To the best of our knowledge, this is the first work exploring the bane of software vulnerability detection using backdoor triggers.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用后门触发器探索语义冗余:基于 DNN 的软件漏洞检测所面临挑战的补充见解
为了以更好的性能检测软件漏洞,深度神经网络(DNN)近来受到广泛关注。然而,这些用代码表示法训练的漏洞检测 DNN 模型很容易受到代码表示法特定扰动的影响。这促使我们重新思考软件漏洞检测的祸根,并在代码表示过程中找到与功能无关的特征,我们将其命名为语义冗余特征。本文首先确定了这些 DNN 模型中功能无关触发器与语义冗余特征空间(冗余特征所在)之间的紧密相关性。为了识别相关性,我们提出了一种新颖的基于后门的语义冗余探索(BSemRE)框架。在 BSemRE 中,我们观察了训练有素的模型对与功能无关的触发器的敏感性,以验证各种代码表示中是否存在语义冗余。具体来说,作为语义冗余的典型表现形式,命名约定、三元运算符和同真条件被用来生成功能无关触发器。对 8 个代表性漏洞数据集的 1613823 个样本以及最先进的代码表示技术和漏洞检测模型进行的广泛对比实验表明,语义冗余的存在决定了基于 DNN 的软件漏洞检测的可信度上限。据我们所知,这是第一项探索利用后门触发器进行软件漏洞检测的难题的工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology 工程技术-计算机:软件工程
CiteScore
6.30
自引率
4.50%
发文量
164
审稿时长
>12 weeks
期刊介绍: Designing and building a large, complex software system is a tremendous challenge. ACM Transactions on Software Engineering and Methodology (TOSEM) publishes papers on all aspects of that challenge: specification, design, development and maintenance. It covers tools and methodologies, languages, data structures, and algorithms. TOSEM also reports on successful efforts, noting practical lessons that can be scaled and transferred to other projects, and often looks at applications of innovative technologies. The tone is scholarly but readable; the content is worthy of study; the presentation is effective.
期刊最新文献
Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement Learning Bitmap-Based Security Monitoring for Deeply Embedded Systems Harmonising Contributions: Exploring Diversity in Software Engineering through CQA Mining on Stack Overflow An Empirical Study on the Characteristics of Database Access Bugs in Java Applications Self-planning Code Generation with Large Language Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1