Software Testing, Verification and Reliability最新文献_第2页

Test code evolution and mutation testing 测试代码演化和突变测试

Software Testing, Verification and Reliability

Pub Date : 2024-04-03 DOI: 10.1002/stvr.1877

Yves Le Traon, Tao Xie

In this issue, we are pleased to present two papers on test code evolution and mutation testing, respectively.

The first paper, “Towards automatically identifying the co-change of production and test code” by Yuan Huang, Zhicao Tang, Xiangping Chen and Xiaocong Zhou, presents a method named Jtup that uses machine learning to identify the cochange of production and test code. When a developer makes modifications to a class in production code, Jtup analyses the modified class and determines whether its corresponding test class needs to be modified as well. For machine learning, Jtup incorporates three types of features (code change features, code complexity features and code semantic features). The experimental results show the superior performance of Jtup in both within-project and multiclassification settings, surpassing multiple competing methods (Recommended by Wing Kwong Chan).

The second paper, “A new perspective on the competent programmer hypothesis through the reproduction of real faults with repeated mutations” by Zaheed Ahmed, Eike Schwass, Steffen Herbold, Fabian Trautsch and Jens Grabowski, presents a study of the competent programmer hypothesis based on the ability to reproduce faults through mutation operators. In contrast, previous work only considered how many tokens are changed by bugs or manually compared mutations with faults. The authors reframe the problem of transforming a correct into a buggy AST as a path search problem, where each step of a path is a mutation. The study results support the competent programmer hypothesis and also show that mutation operators are often not in line with the slight differences in correct code introduced by developers (Recommended by Marcio Delamaro).

We hope that these papers will inspire further research in related directions.

本期，我们将分别介绍两篇关于测试代码演化和突变测试的论文。第一篇论文题为《自动识别生产代码和测试代码的共同变化》，作者是黄源、唐志超、陈向平和周小聪，论文介绍了一种名为 Jtup 的方法，该方法利用机器学习来识别生产代码和测试代码的共同变化。当开发人员对生产代码中的类进行修改时，Jtup 会分析修改后的类，并判断其对应的测试类是否也需要修改。在机器学习方面，Jtup 采用了三种特征（代码变更特征、代码复杂性特征和代码语义特征）。实验结果表明，Jtup 在项目内和多分类设置中均表现出色，超越了多种竞争方法（推荐人：Wing Kwong Chan）。第二篇论文是由 Zaheed Ahmed、Eike Schwass、Steffen Herbold、Fabian Trautsch 和 Jens Grabowski 撰写的 "通过重复突变重现真实故障，从新的角度看待合格程序员假说"，该论文基于通过突变算子重现故障的能力，对合格程序员假说进行了研究。相比之下，之前的研究只考虑了有多少标记被错误改变，或人工比较了突变与故障。作者将把正确的 AST 转变为有错误的 AST 的问题重构为一个路径搜索问题，其中路径的每一步都是一个突变。研究结果支持称职程序员假设，同时也表明突变算子往往与开发人员引入的正确代码中的细微差别不一致（推荐人：Marcio Delamaro）。我们希望这些论文能激发相关方向的进一步研究。

{"title":"Test code evolution and mutation testing","authors":"Yves Le Traon, Tao Xie","doi":"10.1002/stvr.1877","DOIUrl":"https://doi.org/10.1002/stvr.1877","url":null,"abstract":"In this issue, we are pleased to present two papers on test code evolution and mutation testing, respectively.\u0000The first paper, “Towards automatically identifying the co-change of production and test code” by Yuan Huang, Zhicao Tang, Xiangping Chen and Xiaocong Zhou, presents a method named Jtup that uses machine learning to identify the cochange of production and test code. When a developer makes modifications to a class in production code, Jtup analyses the modified class and determines whether its corresponding test class needs to be modified as well. For machine learning, Jtup incorporates three types of features (code change features, code complexity features and code semantic features). The experimental results show the superior performance of Jtup in both within-project and multiclassification settings, surpassing multiple competing methods (Recommended by Wing Kwong Chan).\u0000The second paper, “A new perspective on the competent programmer hypothesis through the reproduction of real faults with repeated mutations” by Zaheed Ahmed, Eike Schwass, Steffen Herbold, Fabian Trautsch and Jens Grabowski, presents a study of the competent programmer hypothesis based on the ability to reproduce faults through mutation operators. In contrast, previous work only considered how many tokens are changed by bugs or manually compared mutations with faults. The authors reframe the problem of transforming a correct into a buggy AST as a path search problem, where each step of a path is a mutation. The study results support the competent programmer hypothesis and also show that mutation operators are often not in line with the slight differences in correct code introduced by developers (Recommended by Marcio Delamaro).\u0000We hope that these papers will inspire further research in related directions.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SafeNet: Towards mitigating replaceable unsafe Rust code via a recommendation‐based approach SafeNet：通过基于推荐的方法减少可替换的不安全 Rust 代码

Software Testing, Verification and Reliability

Pub Date : 2024-03-02 DOI: 10.1002/stvr.1875

Yan Dong, Zhicong Zhang, Mohan Cui, Hui Xu

Rust is a system‐level programming language with advantages in memory safety. It ensures that any Rust programs without unsafe code should not incur undefined behaviours. However, unsafe code still plays an essential role in Rust to achieve low‐level control. Therefore, a major design pattern of Rust programs is interior unsafe, which wraps unsafe code as safe APIs and handles all undefined behaviours internally. Rust standard library already provides a rich set of safe APIs to facilitate Rust code development. Nevertheless, due to unfamiliarity with these APIs, developers may misuse unnecessary unsafe code and suffer memory‐safety risks. In this paper, we investigate an approach to mitigate replaceable unsafe code. We first analyse unsafe APIs of the Rust standard library and summarize their common usage patterns. Each pattern corresponds to one or several code samples in our knowledge base. Then, we develop an approach to automatically recognize the usage pattern and recommend corresponding code samples. Our approach leverages dataflow analysis to exclude impossible patterns and employs a BERT‐based machine learning model to find the most similar pattern among the rest. We have conducted evaluation experiments with 472 unsafe code snippets collected from GitHub projects and successfully recognized the pattern of 394 snippets. We hope our approach can assist developers in detecting unnecessary unsafe code and suggesting safe alternatives.

Rust 是一种系统级编程语言，在内存安全方面具有优势。它可以确保任何没有不安全代码的 Rust 程序不会产生未定义的行为。不过，不安全代码在 Rust 中仍扮演着实现底层控制的重要角色。因此，Rust 程序的一个主要设计模式是内部不安全（interior unsafe），它将不安全代码封装为安全的应用程序接口（API），并在内部处理所有未定义的行为。Rust 标准库已经提供了丰富的安全 API，为 Rust 代码开发提供了便利。然而，由于不熟悉这些 API，开发人员可能会误用不必要的不安全代码，并遭受内存安全风险。在本文中，我们研究了一种减轻可替换不安全代码的方法。我们首先分析了 Rust 标准库中的不安全 API，并总结了它们的常见使用模式。每种模式都与我们知识库中的一个或多个代码示例相对应。然后，我们开发了一种自动识别使用模式并推荐相应代码示例的方法。我们的方法利用数据流分析来排除不可能的模式，并采用基于 BERT 的机器学习模型从其他模式中找出最相似的模式。我们使用从 GitHub 项目中收集的 472 个不安全代码片段进行了评估实验，并成功识别了 394 个片段的模式。我们希望我们的方法能帮助开发人员检测出不必要的不安全代码，并提出安全的替代方案。

{"title":"SafeNet: Towards mitigating replaceable unsafe Rust code via a recommendation‐based approach","authors":"Yan Dong, Zhicong Zhang, Mohan Cui, Hui Xu","doi":"10.1002/stvr.1875","DOIUrl":"https://doi.org/10.1002/stvr.1875","url":null,"abstract":"Rust is a system‐level programming language with advantages in memory safety. It ensures that any Rust programs without unsafe code should not incur undefined behaviours. However, unsafe code still plays an essential role in Rust to achieve low‐level control. Therefore, a major design pattern of Rust programs is interior unsafe, which wraps unsafe code as safe APIs and handles all undefined behaviours internally. Rust standard library already provides a rich set of safe APIs to facilitate Rust code development. Nevertheless, due to unfamiliarity with these APIs, developers may misuse unnecessary unsafe code and suffer memory‐safety risks. In this paper, we investigate an approach to mitigate replaceable unsafe code. We first analyse unsafe APIs of the Rust standard library and summarize their common usage patterns. Each pattern corresponds to one or several code samples in our knowledge base. Then, we develop an approach to automatically recognize the usage pattern and recommend corresponding code samples. Our approach leverages dataflow analysis to exclude impossible patterns and employs a BERT‐based machine learning model to find the most similar pattern among the rest. We have conducted evaluation experiments with 472 unsafe code snippets collected from GitHub projects and successfully recognized the pattern of 394 snippets. We hope our approach can assist developers in detecting unnecessary unsafe code and suggesting safe alternatives.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140018140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new perspective on the competent programmer hypothesis through the reproduction of real faults with repeated mutations 通过重复突变再现真实故障，从新的角度看合格程序员假说

Software Testing, Verification and Reliability

Pub Date : 2024-02-29 DOI: 10.1002/stvr.1874

Zaheed Ahmed, Eike Schwass, Steffen Herbold, Fabian Trautsch, Jens Grabowski

The competent programmer hypothesis is one of the fundamental assumptions of mutation testing, which claims that most programmers are competent enough to create correct or almost correct source code. This implies that faults should usually manifest through small variations of the correct code. Consequently, researchers assumed that the synthetic faults injected in source code through the mutation operators closely resemble the real faults. Unfortunately, it is still unclear whether the competent programmer hypothesis holds, as past research presents contradictory claims. Within this article, we provide a new perspective on the competent programmer hypothesis and its relation to mutation testing. We try to re-create real-world faults through chains of mutations to understand if there is a direct link between mutation testing and faults. The lengths of these paths help us to understand if the source code is really almost correct, or if large variations are required. Our experiments used a state-of-the-art benchmark database of real faults named Defects4J 2.0.0. It contains 835 reproducible real-world faults in 17 open-source projects that comprise a total of 1044 bug-fix pairs of files. Our results indicate that while the competent programmer hypothesis seems to be true, mutation testing is missing important operators to generate representative real-world faults.

合格程序员假设是突变测试的基本假设之一，它认为大多数程序员都有足够的能力创建正确或基本正确的源代码。这意味着故障通常会通过正确代码的微小变化表现出来。因此，研究人员假定，通过突变算子注入源代码的合成故障与真实故障非常相似。遗憾的是，由于过去的研究提出了相互矛盾的说法，因此目前仍不清楚合格程序员假设是否成立。在本文中，我们将从一个新的角度来探讨合格程序员假说及其与突变测试的关系。我们试图通过突变链来重现现实世界中的故障，以了解突变测试与故障之间是否存在直接联系。这些路径的长度有助于我们了解源代码是否真的几乎正确，或者是否需要较大的变化。我们的实验使用了名为 Defects4J 2.0.0 的最先进的真实故障基准数据库，其中包含 17 个开源项目中 835 个可重现的真实故障，共包括 1044 个错误-修复文件对。我们的结果表明，虽然合格程序员的假设似乎是正确的，但突变测试在生成具有代表性的真实故障方面缺少重要的操作符。

{"title":"A new perspective on the competent programmer hypothesis through the reproduction of real faults with repeated mutations","authors":"Zaheed Ahmed, Eike Schwass, Steffen Herbold, Fabian Trautsch, Jens Grabowski","doi":"10.1002/stvr.1874","DOIUrl":"https://doi.org/10.1002/stvr.1874","url":null,"abstract":"The competent programmer hypothesis is one of the fundamental assumptions of mutation testing, which claims that most programmers are competent enough to create correct or almost correct source code. This implies that faults should usually manifest through small variations of the correct code. Consequently, researchers assumed that the synthetic faults injected in source code through the mutation operators closely resemble the real faults. Unfortunately, it is still unclear whether the competent programmer hypothesis holds, as past research presents contradictory claims. Within this article, we provide a new perspective on the competent programmer hypothesis and its relation to mutation testing. We try to re-create real-world faults through chains of mutations to understand if there is a direct link between mutation testing and faults. The lengths of these paths help us to understand if the source code is really almost correct, or if large variations are required. Our experiments used a state-of-the-art benchmark database of real faults named Defects4J 2.0.0. It contains 835 reproducible real-world faults in 17 open-source projects that comprise a total of 1044 bug-fix pairs of files. Our results indicate that while the competent programmer hypothesis seems to be true, mutation testing is missing important operators to generate representative real-world faults.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140025770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bud hunting with directed fuzz testing and source code vulnerability detection with advanced graph neural networks 利用有向模糊测试猎杀芽孢，利用高级图神经网络检测源代码漏洞

Software Testing, Verification and Reliability

Pub Date : 2024-02-13 DOI: 10.1002/stvr.1876

Yves Le Traon, Tao Xie

In this edition, we present two papers that offer significant contributions related to fuzz testing on one hand and vulnerability detection on the other hand, respectively, delving into directed greybox fuzzing (DGF) and tensor-based gated graph neural networks for automatic vulnerability detection in source code.

The first paper, ‘Greybox fuzzing, a scalable and practical approach for software testing’, by Pengfei Wang, Xu Zhou, Tai Yue, Peihong Lin, Yingying Liu and Kai Lu, proposes to go improve greybox fuzzing tools to uncover bugs, with directed greybox fuzzing (DGF). DFG emerges as a strategic alternative to undirected coverage-guided approaches, by allocating its resources purposefully, targeting specific zones like bug-prone areas. This makes DGF particularly effective for patch testing, bug reproduction and specialized bug detection scenarios. The paper conducts a comprehensive study, analysing 42 state-of-the-art fuzzers closely related to DGF. By categorizing DGF into location-directed and behaviour-directed types, the authors unveil its benefits, limitations and potential research avenues. This work not only provides a snapshot of the current state of DGF but also identifies gaps and proposes areas for future investigation.

The second paper, entitled ‘Tensor-based gated graph neural network for automatic vulnerability detection in source code’, is embracing the issue of the rapid expansion of smart devices that intensifies the demand for robust vulnerability detection in source code. Jia Yang, Ou Ruan and JiXin Zhang address this overall challenge by proposing a tensor-based gated graph neural network, named TensorGNN, for function-level vulnerability detection in source code. TensorGNN treats codes as graphs with node features by combining different code graph representations, leading to an accurate code embeddings. The TensorGNN model outperforms existing state-of-the-art works in terms of accuracy and F1 for vulnerability detection across various open-source code corpora. Notably, it achieves these results with significantly fewer training parameters and reduced training time. By introducing a novel perspective to vulnerability detection, this paper opens avenues for further exploration in the intersection of tensor technology and software security.

In conclusion, these two different papers contribute to complementary facets of software quality improvement. As STVR navigates the complexities of deploying safe and secure software, I wish you a pleasant reading that may inspire follow-up research in these two directions.

本期我们将介绍两篇论文，这两篇论文分别在模糊测试和漏洞检测方面做出了重要贡献，它们深入研究了有向灰盒模糊（DGF）和基于张量的门控图神经网络在源代码中的漏洞自动检测。第一篇论文题为 "灰盒模糊，一种可扩展的实用软件测试方法"，由王鹏飞、周旭、岳泰、林佩红、刘颖颖和卢凯撰写，提出利用有向灰盒模糊（DGF）改进灰盒模糊工具以发现漏洞。定向灰盒模糊是无定向覆盖引导方法的战略替代方案，它有目的地分配资源，针对特定区域（如漏洞易发区）。这使得 DGF 在补丁测试、错误再现和专门的错误检测场景中特别有效。本文进行了全面的研究，分析了与 DGF 密切相关的 42 种最先进的模糊器。通过将 DGF 分为位置导向型和行为导向型，作者揭示了其优点、局限性和潜在的研究途径。第二篇论文题为 "基于张量的门控图神经网络用于源代码中的漏洞自动检测"，探讨了智能设备的快速发展加剧了对源代码中稳健漏洞检测的需求这一问题。杨佳、阮欧和张继新针对这一总体挑战，提出了一种基于张量的门控图神经网络（TensorGNN），用于源代码中的函数级漏洞检测。TensorGNN 将代码视为具有节点特征的图，并结合了不同的代码图表示方法，从而实现了精确的代码嵌入。TensorGNN 模型在各种开源代码语料库的漏洞检测准确率和 F1 方面均优于现有的先进技术。值得注意的是，它只用了更少的训练参数和更短的训练时间就取得了这些成果。通过为漏洞检测引入新的视角，本文为进一步探索张量技术与软件安全的交叉领域开辟了道路。随着 STVR 在部署安全可靠软件的复杂性中不断前行，我祝愿您阅读愉快，并在这两个方向的后续研究中有所启发。

{"title":"Bud hunting with directed fuzz testing and source code vulnerability detection with advanced graph neural networks","authors":"Yves Le Traon, Tao Xie","doi":"10.1002/stvr.1876","DOIUrl":"https://doi.org/10.1002/stvr.1876","url":null,"abstract":"In this edition, we present two papers that offer significant contributions related to fuzz testing on one hand and vulnerability detection on the other hand, respectively, delving into directed greybox fuzzing (DGF) and tensor-based gated graph neural networks for automatic vulnerability detection in source code.\u0000The first paper, ‘Greybox fuzzing, a scalable and practical approach for software testing’, by Pengfei Wang, Xu Zhou, Tai Yue, Peihong Lin, Yingying Liu and Kai Lu, proposes to go improve greybox fuzzing tools to uncover bugs, with directed greybox fuzzing (DGF). DFG emerges as a strategic alternative to undirected coverage-guided approaches, by allocating its resources purposefully, targeting specific zones like bug-prone areas. This makes DGF particularly effective for patch testing, bug reproduction and specialized bug detection scenarios. The paper conducts a comprehensive study, analysing 42 state-of-the-art fuzzers closely related to DGF. By categorizing DGF into location-directed and behaviour-directed types, the authors unveil its benefits, limitations and potential research avenues. This work not only provides a snapshot of the current state of DGF but also identifies gaps and proposes areas for future investigation.\u0000The second paper, entitled ‘Tensor-based gated graph neural network for automatic vulnerability detection in source code’, is embracing the issue of the rapid expansion of smart devices that intensifies the demand for robust vulnerability detection in source code. Jia Yang, Ou Ruan and JiXin Zhang address this overall challenge by proposing a tensor-based gated graph neural network, named TensorGNN, for function-level vulnerability detection in source code. TensorGNN treats codes as graphs with node features by combining different code graph representations, leading to an accurate code embeddings. The TensorGNN model outperforms existing state-of-the-art works in terms of accuracy and F1 for vulnerability detection across various open-source code corpora. Notably, it achieves these results with significantly fewer training parameters and reduced training time. By introducing a novel perspective to vulnerability detection, this paper opens avenues for further exploration in the intersection of tensor technology and software security.\u0000In conclusion, these two different papers contribute to complementary facets of software quality improvement. As STVR navigates the complexities of deploying safe and secure software, I wish you a pleasant reading that may inspire follow-up research in these two directions.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139920161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating the impact of transient hardware faults on deep learning neural network inference 研究瞬态硬件故障对深度学习神经网络推理的影响

Software Testing, Verification and Reliability

Pub Date : 2024-02-01 DOI: 10.1002/stvr.1873

Md Hasanur Rahman, Sabuj Laskar, Guanpeng Li

Safety-critical applications, such as autonomous vehicles, healthcare, and space applications, have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies have consistently been a prevalent cause of misclassifications, even in modern DNNs. Simultaneously, with an ongoing effort to minimize the footprint of contemporary chip design, there is a continual rise in the likelihood of transient hardware faults in deployed DNN models. Consequently, researchers have wondered the extent to which these faults contribute to DNN misclassifications compared to algorithmic inaccuracies. This article delves into the impact of DNN misclassifications caused by transient hardware faults and intrinsic algorithmic inaccuracies in safety-critical applications. Initially, we enhance a cutting-edge fault injector, TensorFI, for TensorFlow applications to facilitate fault injections on modern DNN non-sequential models in a scalable manner. Subsequently, we analyse the DNN-inferred outcomes based on our defined safety-critical metrics. Finally, we conduct extensive fault injection experiments and a comprehensive analysis to achieve the following objectives: (1) investigate the impact of different target class groupings on DNN failures and (2) pinpoint the most vulnerable bit locations within tensors, as well as DNN layers accountable for the majority of safety-critical misclassifications. Our findings regarding different grouping formations reveal that failures induced by transient hardware faults can have a substantially greater impact (with a probability up to 4

� � � \times � �$$ times $$�

higher) on safety-critical applications compared to those resulting from algorithmic inaccuracies. Additionally, our investigation demonstrates that higher order bit positions in tensors, as well as initial and final layers of DNNs, necessitate prioritized protection compared to other regions.

自动驾驶汽车、医疗保健和太空应用等对安全至关重要的应用领域广泛部署了深度神经网络（DNN）。固有算法的不准确性一直是造成分类错误的主要原因，即使在现代 DNN 中也是如此。与此同时，随着人们不断努力减少当代芯片设计的占用空间，部署的 DNN 模型出现瞬时硬件故障的可能性也在持续上升。因此，研究人员不禁要问，与算法的不准确性相比，这些故障在多大程度上导致了 DNN 的错误分类。本文深入探讨了瞬态硬件故障和内在算法不准确在安全关键型应用中造成的 DNN 错误分类的影响。首先，我们为 TensorFlow 应用程序增强了尖端的故障注入器 TensorFI，以促进以可扩展的方式对现代 DNN 非序列模型进行故障注入。随后，我们根据定义的安全关键指标分析 DNN 推断的结果。最后，我们进行了广泛的故障注入实验和综合分析，以实现以下目标：(1) 研究不同目标类别分组对 DNN 故障的影响；(2) 确定张量中最脆弱的位位置，以及造成大多数安全关键错误分类的 DNN 层。我们关于不同分组形式的研究结果表明，与算法不准确导致的故障相比，瞬时硬件故障导致的故障对安全关键型应用的影响要大得多（概率高达 4×$$ times$$）。此外，我们的研究表明，与其他区域相比，需要优先保护张量中的高阶位位置以及 DNN 的初始层和最终层。

{"title":"Investigating the impact of transient hardware faults on deep learning neural network inference","authors":"Md Hasanur Rahman, Sabuj Laskar, Guanpeng Li","doi":"10.1002/stvr.1873","DOIUrl":"https://doi.org/10.1002/stvr.1873","url":null,"abstract":"Safety-critical applications, such as autonomous vehicles, healthcare, and space applications, have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies have consistently been a prevalent cause of misclassifications, even in modern DNNs. Simultaneously, with an ongoing effort to minimize the footprint of contemporary chip design, there is a continual rise in the likelihood of transient hardware faults in deployed DNN models. Consequently, researchers have wondered the extent to which these faults contribute to DNN misclassifications compared to algorithmic inaccuracies. This article delves into the impact of DNN misclassifications caused by transient hardware faults and intrinsic algorithmic inaccuracies in safety-critical applications. Initially, we enhance a cutting-edge fault injector, TensorFI, for TensorFlow applications to facilitate fault injections on modern DNN non-sequential models in a scalable manner. Subsequently, we analyse the DNN-inferred outcomes based on our defined safety-critical metrics. Finally, we conduct extensive fault injection experiments and a comprehensive analysis to achieve the following objectives: (1) investigate the impact of different target class groupings on DNN failures and (2) pinpoint the most vulnerable bit locations within tensors, as well as DNN layers accountable for the majority of safety-critical misclassifications. Our findings regarding different grouping formations reveal that failures induced by transient hardware faults can have a substantially greater impact (with a probability up to 4\u0000<math altimg=\"urn:x-wiley:stvr:media:stvr1873:stvr1873-math-0001\" display=\"inline\" location=\"graphic/stvr1873-math-0001.png\">\u0000<semantics>\u0000<mrow>\u0000<mo>×</mo>\u0000</mrow>\u0000$$ times $$</annotation>\u0000</semantics></math> higher) on safety-critical applications compared to those resulting from algorithmic inaccuracies. Additionally, our investigation demonstrates that higher order bit positions in tensors, as well as initial and final layers of DNNs, necessitate prioritized protection compared to other regions.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139680151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Delta4Ms: Improving mutation-based fault localization by eliminating mutant bias Delta4Ms：通过消除突变偏差改进基于突变的故障定位

Software Testing, Verification and Reliability

Pub Date : 2024-01-16 DOI: 10.1002/stvr.1872

Hengyuan Liu, Zheng Li, Baolong Han, Yangtao Liu, Xiang Chen, Yong Liu

Fault localization is a complex, costly and time-consuming task in software debugging. Numerous automated techniques have been developed to expedite this process. Mutation-based fault localization (MBFL) is one of the most widely studied techniques which uses mutation analysis to generate mutants for revealing potential faults in the program. However, our theoretical analysis exposes an inherent conflict between the fundamental assumption and the essential meaning of existing MBFL suspiciousness. This conflict is caused by mutant bias. Intuitively, the suspiciousness can be corrected by eliminating the mutant bias for more accurately measuring the faulty probability of the corresponding mutant statement. In this paper, we introduce Delta4Ms, a fault localization approach designed to eliminate mutant bias. Delta4Ms integrates the principles of signal theory, modelling the actual suspiciousness and mutant bias as the desired and false signal components, respectively. Based on theoretical derivation, the average suspiciousness of mutants serves as an estimate of mutant bias. Delta4Ms effectively mitigates mutant bias, extracting the desired signal and yielding corrected suspiciousness for fault localization. To precisely estimate mutant bias, higher order mutants (HOMs) are incorporated. We conduct an extensive experimental evaluation of Delta4Ms on 320 real-fault programs from Codeflaws. The results indicate that our model significantly outperforms existing SBFL and MBFL techniques, showing a considerable improvement in fault localization effectiveness. We further assessed the robustness of Delta4Ms by examining different HOM ratios and HOM generation strategies. Moreover, Delta4Ms achieves a substantial reduction in mutation execution cost and minimal accuracy loss through the implementation of test case reduction. Finally, we perform preliminary experiments on 15 real-fault programs from the Defects4J benchmark to assess the generalization of the model's fault localization effectiveness.

故障定位是软件调试中一项复杂、昂贵和耗时的任务。为了加快这一过程，人们开发了许多自动化技术。基于突变的故障定位（MBFL）是研究最广泛的技术之一，它利用突变分析生成突变体，以揭示程序中的潜在故障。然而，我们的理论分析揭示了现有 MBFL 可疑性的基本假设和本质意义之间的内在冲突。这种冲突是由突变体偏差造成的。直观地说，可以通过消除突变偏差来纠正可疑度，从而更准确地测量相应突变语句的故障概率。本文介绍了一种旨在消除突变偏差的故障定位方法 Delta4Ms。Delta4Ms 融合了信号理论的原理，将实际可疑度和突变偏差分别模拟为期望信号和虚假信号成分。根据理论推导，突变体的平均可疑度可作为突变体偏差的估计值。Delta4Ms 可以有效减轻突变体偏差，提取理想信号，并得出校正后的可疑度，用于故障定位。为了精确估计突变体偏差，我们加入了高阶突变体（HOMs）。我们在 Codeflaws 提供的 320 个真实故障程序上对 Delta4Ms 进行了广泛的实验评估。结果表明，我们的模型明显优于现有的 SBFL 和 MBFL 技术，在故障定位效果方面有了显著提高。通过研究不同的 HOM 比率和 HOM 生成策略，我们进一步评估了 Delta4Ms 的鲁棒性。此外，Delta4Ms 还通过实施测试用例缩减实现了突变执行成本的大幅降低和最小的精度损失。最后，我们在 Defects4J 基准的 15 个真实故障程序上进行了初步实验，以评估该模型故障定位效果的通用性。

{"title":"Delta4Ms: Improving mutation-based fault localization by eliminating mutant bias","authors":"Hengyuan Liu, Zheng Li, Baolong Han, Yangtao Liu, Xiang Chen, Yong Liu","doi":"10.1002/stvr.1872","DOIUrl":"https://doi.org/10.1002/stvr.1872","url":null,"abstract":"Fault localization is a complex, costly and time-consuming task in software debugging. Numerous automated techniques have been developed to expedite this process. Mutation-based fault localization (MBFL) is one of the most widely studied techniques which uses mutation analysis to generate mutants for revealing potential faults in the program. However, our theoretical analysis exposes an inherent conflict between the fundamental assumption and the essential meaning of existing MBFL suspiciousness. This conflict is caused by mutant bias. Intuitively, the suspiciousness can be corrected by eliminating the mutant bias for more accurately measuring the faulty probability of the corresponding mutant statement. In this paper, we introduce Delta4Ms, a fault localization approach designed to eliminate mutant bias. Delta4Ms integrates the principles of signal theory, modelling the actual suspiciousness and mutant bias as the desired and false signal components, respectively. Based on theoretical derivation, the average suspiciousness of mutants serves as an estimate of mutant bias. Delta4Ms effectively mitigates mutant bias, extracting the desired signal and yielding corrected suspiciousness for fault localization. To precisely estimate mutant bias, higher order mutants (HOMs) are incorporated. We conduct an extensive experimental evaluation of Delta4Ms on 320 real-fault programs from Codeflaws. The results indicate that our model significantly outperforms existing SBFL and MBFL techniques, showing a considerable improvement in fault localization effectiveness. We further assessed the robustness of Delta4Ms by examining different HOM ratios and HOM generation strategies. Moreover, Delta4Ms achieves a substantial reduction in mutation execution cost and minimal accuracy loss through the implementation of test case reduction. Finally, we perform preliminary experiments on 15 real-fault programs from the Defects4J benchmark to assess the generalization of the model's fault localization effectiveness.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139474771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards automatically identifying the co-change of production and test code 实现自动识别生产代码和测试代码的共同变化

Software Testing, Verification and Reliability

Pub Date : 2024-01-11 DOI: 10.1002/stvr.1870

Yuan Huang, Zhicao Tang, Xiangping Chen, Xiaocong Zhou

In software evolution, keeping the test code co-change with the production code is important, because the outdated test code may not work and is ineffective in revealing faults in the production code. However, due to the tight development time, the production and test code may not be co-changed immediately by developers. For example, we analysed the top 1003 popular Java projects on GitHub and found that nearly 9.3% of cases (i.e., 464,417) did not update their production and test code at the same time, that is, the production code is updated first, and then the test code is updated at intervals. The result indicates that much test code will not be updated in time. In this paper, we propose a novel approach, Jtup, to remind developers to co-change the production code and test code in time. Specifically, we first define the co-changed production and test code as a positive instance, while unchanged test code (i.e., production code changed and test code unchanged) as a negative instance. Then, we extract multidimensional features from the production code to characterize the possibility of their co-change, including code change features, code complexity features, and code semantic features. Finally, several machine learning-based methods are employed to identify the co-changed production and test code. We conduct comprehensive experiments on 20 datasets, and the results show that the Accuracy, Precision, and Recall achieved by Jtup are 76.7%, 78.1%, and 77.4%, which outperforms the state-of-the-art method.

在软件进化过程中，保持测试代码与生产代码的同步变化非常重要，因为过时的测试代码可能无法工作，也无法有效揭示生产代码中的故障。然而，由于开发时间紧迫，开发人员可能无法立即共同修改生产代码和测试代码。例如，我们分析了 GitHub 上最受欢迎的 1003 个 Java 项目，发现近 9.3% 的案例（即 464 417 个）没有同时更新生产代码和测试代码，即先更新生产代码，然后隔一段时间再更新测试代码。结果表明，很多测试代码不会及时更新。在本文中，我们提出了一种新方法--Jtup，以提醒开发人员及时共同修改生产代码和测试代码。具体来说，我们首先将共同更改的生产代码和测试代码定义为正实例，而将未更改的测试代码（即生产代码更改而测试代码未更改）定义为负实例。然后，我们从生产代码中提取多维特征，包括代码变化特征、代码复杂性特征和代码语义特征，来描述它们共同变化的可能性。最后，我们采用了几种基于机器学习的方法来识别共同变更的生产代码和测试代码。我们在 20 个数据集上进行了全面的实验，结果表明 Jtup 的准确率、精确率和召回率分别为 76.7%、78.1% 和 77.4%，优于最先进的方法。

{"title":"Towards automatically identifying the co-change of production and test code","authors":"Yuan Huang, Zhicao Tang, Xiangping Chen, Xiaocong Zhou","doi":"10.1002/stvr.1870","DOIUrl":"https://doi.org/10.1002/stvr.1870","url":null,"abstract":"In software evolution, keeping the test code co-change with the production code is important, because the outdated test code may not work and is ineffective in revealing faults in the production code. However, due to the tight development time, the production and test code may not be co-changed immediately by developers. For example, we analysed the top 1003 popular Java projects on GitHub and found that nearly 9.3% of cases (i.e., 464,417) did not update their production and test code at the same time, that is, the production code is updated first, and then the test code is updated at intervals. The result indicates that much test code will not be updated in time. In this paper, we propose a novel approach, Jtup, to remind developers to co-change the production code and test code in time. Specifically, we first define the co-changed production and test code as a positive instance, while unchanged test code (i.e., production code changed and test code unchanged) as a negative instance. Then, we extract multidimensional features from the production code to characterize the possibility of their co-change, including code change features, code complexity features, and code semantic features. Finally, several machine learning-based methods are employed to identify the co-changed production and test code. We conduct comprehensive experiments on 20 datasets, and the results show that the Accuracy, Precision, and Recall achieved by Jtup are 76.7%, 78.1%, and 77.4%, which outperforms the state-of-the-art method.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139465249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Test case prioritization and mutation testing 测试用例优先级和突变测试

Software Testing, Verification and Reliability

Pub Date : 2023-12-14 DOI: 10.1002/stvr.1871

Yves Le Traon, Tao Xie

In this issue, we are pleased to present two papers on test case prioritization and mutation testing, respectively.

The first paper, ‘Semantic-aware two-phase test case prioritization for continuous integration’ by Yingling Li, Ziao Wang, Junjie Wang, Jie Chen, Rui Mou, and Guibing Li, presents the SatTCP framework to conduct precise prioritization with low time overhead, in order to improve the cost effectiveness of typical continuous integration (CI) testing with frequent code submissions. In SatTCP, coarse-grained filtering based on information retrieval (IR) techniques roughly sorts test cases and selects a certain number of tests for the subsequent prioritization; then fine-grained prioritization based on pretrained Siamese network conducts precise prioritization of initially ranked test sets. The evaluation results show that SatTCP outperforms all the baselines under comparison, and achieves the lowest test costs. (Recommended by Yves Le Traon).

The second paper, ‘Mutation testing optimisations using the Clang front-end’ by Sten Vercammen, Serge Demeyer, Markus Borg, Niklas Pettersson, and Görel Hedin, presents an investigation to which extent the Clang front-end and its state-of-the-art program analysis facilities allow to implement existing strategies for mutation optimization within the C language family. The authors develop a proof-of-concept tool used to collect detailed measurements for each mutation phase. The authors conduct evaluation of the proof-of-concept tool on four open-source C++ libraries and one industrial component. The evaluation results show that the ‘Generate Mutants’ and ‘Detect (Un)Reachable Mutants’ steps are for all practical purposes negligible; the ‘Compile Mutants’ step takes a significant amount of time and the compilation of the invalid and unreachable mutants is considerable; the ‘Execute Mutants’ step is the other dominant factor. (Recommended by Mike Papadakis).

We hope that these papers will inspire further research in related directions.

第一篇论文是李颖玲、王娇、王俊杰、陈杰、牟锐和李桂兵的《持续集成中语义感知的两阶段测试用例优先级排序》（Semantic-aware two-phase test case prioritization for continuous integration），该论文提出了 SatTCP 框架，以较低的时间开销进行精确的优先级排序，从而提高代码提交频繁的典型持续集成（CI）测试的性价比。在 SatTCP 中，基于信息检索（IR）技术的粗粒度过滤对测试用例进行粗略分类，并选择一定数量的测试进行后续优先级排序；然后基于预训练的连体网络的细粒度优先级排序对初始排序的测试集进行精确优先级排序。评估结果表明，SatTCP 的性能优于所有比较基准，而且测试成本最低。(第二篇论文是由 Sten Vercammen、Serge Demeyer、Markus Borg、Niklas Pettersson 和 Görel Hedin 撰写的 "使用 Clang 前端的突变测试优化"，介绍了 Clang 前端及其最先进的程序分析设施在多大程度上允许在 C 语言家族中实施现有的突变优化策略。作者开发了一个概念验证工具，用于收集每个突变阶段的详细测量结果。作者在四个开源 C++ 库和一个工业组件上对概念验证工具进行了评估。评估结果表明，"生成突变体 "和 "检测（无法）到达的突变体 "这两个步骤实际上可以忽略不计；"编译突变体 "步骤需要花费大量时间，而且编译无效和无法到达的突变体所需的时间也相当可观；"执行突变体 "步骤是另一个主要因素。(迈克-帕帕达基斯（Mike Papadakis）推荐）。我们希望这些论文能激发相关方向的进一步研究。

{"title":"Test case prioritization and mutation testing","authors":"Yves Le Traon, Tao Xie","doi":"10.1002/stvr.1871","DOIUrl":"https://doi.org/10.1002/stvr.1871","url":null,"abstract":"In this issue, we are pleased to present two papers on test case prioritization and mutation testing, respectively.\u0000The first paper, ‘Semantic-aware two-phase test case prioritization for continuous integration’ by Yingling Li, Ziao Wang, Junjie Wang, Jie Chen, Rui Mou, and Guibing Li, presents the SatTCP framework to conduct precise prioritization with low time overhead, in order to improve the cost effectiveness of typical continuous integration (CI) testing with frequent code submissions. In SatTCP, coarse-grained filtering based on information retrieval (IR) techniques roughly sorts test cases and selects a certain number of tests for the subsequent prioritization; then fine-grained prioritization based on pretrained Siamese network conducts precise prioritization of initially ranked test sets. The evaluation results show that SatTCP outperforms all the baselines under comparison, and achieves the lowest test costs. (Recommended by Yves Le Traon).\u0000The second paper, ‘Mutation testing optimisations using the Clang front-end’ by Sten Vercammen, Serge Demeyer, Markus Borg, Niklas Pettersson, and Görel Hedin, presents an investigation to which extent the Clang front-end and its state-of-the-art program analysis facilities allow to implement existing strategies for mutation optimization within the C language family. The authors develop a proof-of-concept tool used to collect detailed measurements for each mutation phase. The authors conduct evaluation of the proof-of-concept tool on four open-source C++ libraries and one industrial component. The evaluation results show that the ‘Generate Mutants’ and ‘Detect (Un)Reachable Mutants’ steps are for all practical purposes negligible; the ‘Compile Mutants’ step takes a significant amount of time and the compilation of the invalid and unreachable mutants is considerable; the ‘Execute Mutants’ step is the other dominant factor. (Recommended by Mike Papadakis).\u0000We hope that these papers will inspire further research in related directions.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"171 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138681353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The progress, challenges, and perspectives of directed greybox fuzzing 定向灰盒模糊的进展、挑战和前景

Software Testing, Verification and Reliability

Pub Date : 2023-12-14 DOI: 10.1002/stvr.1869

Pengfei Wang, Xu Zhou, Tai Yue, Peihong Lin, Yingying Liu, Kai Lu

Greybox fuzzing is a scalable and practical approach for software testing. Most greybox fuzzing tools are coverage-guided as reaching high code coverage is more likely to find bugs. However, since most covered codes may not contain bugs, blindly extending code coverage is less efficient, especially for corner cases. Unlike coverage-guided greybox fuzzing which increases code coverage in an undirected manner, directed greybox fuzzing (DGF) spends most of its time allocation on reaching specific targets (e.g. the bug-prone zone) without wasting resources stressing unrelated parts. Thus, DGF is particularly suitable for scenarios such as patch testing, bug reproduction, and special bug detection. For now, DGF has become an active research area. However, DGF has general limitations and challenges that are worth further studying. Based on the investigation of 42 state-of-the-art fuzzers that are closely related to DGF, we conducted the first in-depth study to summarize the empirical evidence on the research progress of DGF. This paper studies DGF from a broader view, which takes into account not only the location-directed type that targets specific code parts but also the behavior-directed type that aims to expose abnormal program behaviors. By analyzing the benefits and limitations of DGF research, we try to identify gaps in current research, meanwhile, reveal new research opportunities and suggest areas for further investigation.

灰盒模糊测试是一种可扩展的实用软件测试方法。大多数灰盒模糊工具都以覆盖率为导向，因为达到高代码覆盖率更有可能发现错误。然而，由于大多数被覆盖的代码可能并不包含错误，盲目扩大代码覆盖率的效率较低，尤其是对于边角情况。与以覆盖率为导向的灰盒模糊不同，定向灰盒模糊（DGF）是以不定向的方式提高代码覆盖率的，它将大部分时间分配用于达到特定目标（如错误易发区），而不会浪费资源强调无关部分。因此，DGF 特别适用于补丁测试、错误重现和特殊错误检测等场景。目前，DGF 已成为一个活跃的研究领域。然而，DGF 也存在普遍的局限性和挑战，值得进一步研究。基于对与 DGF 密切相关的 42 种最先进模糊器的调查，我们进行了首次深入研究，总结了有关 DGF 研究进展的实证证据。本文从更广阔的视角研究 DGF，不仅考虑了针对特定代码部分的位置定向类型，还考虑了旨在揭露异常程序行为的行为定向类型。通过分析 DGF 研究的优势和局限性，我们试图找出当前研究的不足，同时揭示新的研究机会，并提出进一步研究的领域。

{"title":"The progress, challenges, and perspectives of directed greybox fuzzing","authors":"Pengfei Wang, Xu Zhou, Tai Yue, Peihong Lin, Yingying Liu, Kai Lu","doi":"10.1002/stvr.1869","DOIUrl":"https://doi.org/10.1002/stvr.1869","url":null,"abstract":"Greybox fuzzing is a scalable and practical approach for software testing. Most greybox fuzzing tools are coverage-guided as reaching high code coverage is more likely to find bugs. However, since most covered codes may not contain bugs, blindly extending code coverage is less efficient, especially for corner cases. Unlike coverage-guided greybox fuzzing which increases code coverage in an undirected manner, directed greybox fuzzing (DGF) spends most of its time allocation on reaching specific targets (e.g. the bug-prone zone) without wasting resources stressing unrelated parts. Thus, DGF is particularly suitable for scenarios such as patch testing, bug reproduction, and special bug detection. For now, DGF has become an active research area. However, DGF has general limitations and challenges that are worth further studying. Based on the investigation of 42 state-of-the-art fuzzers that are closely related to DGF, we conducted the first in-depth study to summarize the empirical evidence on the research progress of DGF. This paper studies DGF from a broader view, which takes into account not only the location-directed type that targets specific code parts but also the behavior-directed type that aims to expose abnormal program behaviors. By analyzing the benefits and limitations of DGF research, we try to identify gaps in current research, meanwhile, reveal new research opportunities and suggest areas for further investigation.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"105 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138715825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tensor-based gated graph neural network for automatic vulnerability detection in source code 基于张量的门控图神经网络漏洞自动检测的源代码

Software Testing, Verification and Reliability

Pub Date : 2023-11-27 DOI: 10.1002/stvr.1867

Jia Yang, Ou Ruan, JiXin Zhang

The rapid expansion of smart devices leads to the increasing demand for vulnerability detection in the cyber security field. Writing secure source codes is crucial to protect applications and software. Recent vulnerability detection methods are mainly using machine learning and deep learning. However, there are still some challenges, how to learn accurate source code semantic embedding at the function level, how to effectively perform vulnerability detection using the learned semantic embedding of source code and how to solve the overfitting problem of learning-based models. In this paper, we consider codes as various graphs with node features and propose a tensor-based gated graph neural network called TensorGNN to produce code embedding for function-level vulnerability detection. First, we propose a high-dimensional tensor for combining different code graph representations. Second, inspired by the work of tensor technology, we propose the TensorGNN model to produce accurate code representations using the graph tensor. We evaluate our model on 7 C and C++ large open-source code corpus (e.g. SARD&NVD, Debian, SATE IV, FFmpeg, libpng&LibTiff, Wireshark and Github datasets), which contains 13 types of vulnerabilities. Our TensorGNN model improves on existing state-of-the-art works by 10%–30% on average in terms of vulnerability detection accuracy and F1, while our TensorGNN model needs less training time and model parameters. Specifically, compared with other existing works, our model reduces 25–47 times of the number of parameters and decreases 3–10 times of training time. Results of evaluations show that TensorGNN has better performance while using fewer training parameters and less training time.

智能设备的快速扩张导致网络安全领域对漏洞检测的需求不断增加。编写安全的源代码对于保护应用程序和软件至关重要。最近的漏洞检测方法主要是利用机器学习和深度学习。然而，如何在功能层学习准确的源代码语义嵌入，如何利用学习到的源代码语义嵌入有效地进行漏洞检测，如何解决基于学习的模型的过拟合问题，仍然存在一些挑战。本文将代码视为具有节点特征的各种图，提出了一种基于张量的门控图神经网络TensorGNN，用于生成用于函数级漏洞检测的代码嵌入。首先，我们提出了一个高维张量来组合不同的代码图表示。其次，受张量技术的启发，我们提出了TensorGNN模型来使用图张量生成准确的代码表示。我们在7个C和c++大型开源代码语料库(例如sardnvd、Debian、SATE IV、FFmpeg、libpnglibtiff、Wireshark和Github数据集)上评估了我们的模型，其中包含13种类型的漏洞。我们的TensorGNN模型在漏洞检测精度和F1方面比现有的先进成果平均提高了10%-30%，同时我们的TensorGNN模型需要更少的训练时间和模型参数。具体来说，与其他已有作品相比，我们的模型减少了25-47倍的参数数量，减少了3-10倍的训练时间。评价结果表明，使用更少的训练参数和更少的训练时间，TensorGNN具有更好的性能。

{"title":"Tensor-based gated graph neural network for automatic vulnerability detection in source code","authors":"Jia Yang, Ou Ruan, JiXin Zhang","doi":"10.1002/stvr.1867","DOIUrl":"https://doi.org/10.1002/stvr.1867","url":null,"abstract":"The rapid expansion of smart devices leads to the increasing demand for vulnerability detection in the cyber security field. Writing secure source codes is crucial to protect applications and software. Recent vulnerability detection methods are mainly using machine learning and deep learning. However, there are still some challenges, how to learn accurate source code semantic embedding at the function level, how to effectively perform vulnerability detection using the learned semantic embedding of source code and how to solve the overfitting problem of learning-based models. In this paper, we consider codes as various graphs with node features and propose a tensor-based gated graph neural network called TensorGNN to produce code embedding for function-level vulnerability detection. First, we propose a high-dimensional tensor for combining different code graph representations. Second, inspired by the work of tensor technology, we propose the TensorGNN model to produce accurate code representations using the graph tensor. We evaluate our model on 7 C and C++ large open-source code corpus (e.g. SARD&NVD, Debian, SATE IV, FFmpeg, libpng&LibTiff, Wireshark and Github datasets), which contains 13 types of vulnerabilities. Our TensorGNN model improves on existing state-of-the-art works by 10%–30% on average in terms of vulnerability detection accuracy and F1, while our TensorGNN model needs less training time and model parameters. Specifically, compared with other existing works, our model reduces 25–47 times of the number of parameters and decreases 3–10 times of training time. Results of evaluations show that TensorGNN has better performance while using fewer training parameters and less training time.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138528120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0