首页 > 最新文献

ACM Transactions on Software Engineering and Methodology最新文献

英文 中文
Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction Tasks 数据复杂性:分析缺陷预测任务难度的新视角
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-26 DOI: 10.1145/3649596
Xiaohui Wan, Zheng Zheng, Fangyun Qin, Xuhui Lu

Defect prediction is crucial for software quality assurance and has been extensively researched over recent decades. However, prior studies rarely focus on data complexity in defect prediction tasks, and even less on understanding the difficulties of these tasks from the perspective of data complexity. In this paper, we conduct an empirical study to estimate the hardness of over 33,000 instances, employing a set of measures to characterize the inherent difficulty of instances and the characteristics of defect datasets. Our findings indicate that: (1) instance hardness in both classes displays a right-skewed distribution, with the defective class exhibiting a more scattered distribution; (2) class overlap is the primary factor influencing instance hardness and can be characterized through feature, structural, and instance-level overlap; (3) no universal preprocessing technique is applicable to all datasets, and it may not consistently reduce data complexity, fortunately, dataset complexity measures can help identify suitable techniques for specific datasets; (4) integrating data complexity information into the learning process can enhance an algorithm’s learning capacity. In summary, this empirical study highlights the crucial role of data complexity in defect prediction tasks, and provides a novel perspective for advancing research in defect prediction techniques.

缺陷预测对软件质量保证至关重要,近几十年来对其进行了广泛的研究。然而,之前的研究很少关注缺陷预测任务中的数据复杂性,更少从数据复杂性的角度来理解这些任务的难度。在本文中,我们进行了一项实证研究,对超过 33,000 个实例的硬度进行了估计,并采用了一系列测量方法来描述实例的内在难度和缺陷数据集的特征。我们的研究结果表明(1) 两类实例的硬度呈右斜分布,缺陷类实例的硬度分布更为分散;(2) 类重叠是影响实例硬度的主要因素,可以通过特征、结构和实例级重叠来表征;(3) 没有一种通用的预处理技术适用于所有数据集,它可能无法持续降低数据复杂性,幸运的是,数据集复杂性度量可以帮助识别适用于特定数据集的技术;(4) 将数据复杂性信息整合到学习过程中可以增强算法的学习能力。总之,这项实证研究强调了数据复杂性在缺陷预测任务中的关键作用,为推进缺陷预测技术的研究提供了一个新的视角。
{"title":"Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction Tasks","authors":"Xiaohui Wan, Zheng Zheng, Fangyun Qin, Xuhui Lu","doi":"10.1145/3649596","DOIUrl":"https://doi.org/10.1145/3649596","url":null,"abstract":"<p>Defect prediction is crucial for software quality assurance and has been extensively researched over recent decades. However, prior studies rarely focus on data complexity in defect prediction tasks, and even less on understanding the difficulties of these tasks from the perspective of data complexity. In this paper, we conduct an empirical study to estimate the hardness of over 33,000 instances, employing a set of measures to characterize the inherent difficulty of instances and the characteristics of defect datasets. Our findings indicate that: (1) instance hardness in both classes displays a right-skewed distribution, with the defective class exhibiting a more scattered distribution; (2) class overlap is the primary factor influencing instance hardness and can be characterized through feature, structural, and instance-level overlap; (3) no universal preprocessing technique is applicable to all datasets, and it may not consistently reduce data complexity, fortunately, dataset complexity measures can help identify suitable techniques for specific datasets; (4) integrating data complexity information into the learning process can enhance an algorithm’s learning capacity. In summary, this empirical study highlights the crucial role of data complexity in defect prediction tasks, and provides a novel perspective for advancing research in defect prediction techniques.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"36 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139977851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Risky Dynamic Typing Related Practices in Python: An Empirical Study Python 中与动态类型相关的风险实践:实证研究
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-26 DOI: 10.1145/3649593
Zhifei Chen, Lin Chen, Yibiao Yang, Qiong Feng, Xuansong Li, Wei Song

Python’s dynamic typing nature provides developers with powerful programming abstractions. However, many type related bugs are accumulated in code bases of Python due to the misuse of dynamic typing. The goal of this paper is to aid in the understanding of developers’ high-risk practices towards dynamic typing and the early detection of type related bugs. We first formulate the rules of six types of risky dynamic typing related practices (type smells for short) in Python. We then develop a rule-based tool named RUPOR which builds an accurate type base to detect type smells. Our evaluation shows that RUPOR outperforms the existing type smell detection techniques (including the LLM-based approaches, Mypy, and PYDYPE) on a benchmark of 900 Python methods. Based on RUPOR, we conduct an empirical study on 25 real-world projects. We find that type smells are significantly related to the occurrence of post-release faults. The fault-proneness prediction model built with type smell features slightly outperforms the model built without them. We also summarize the common patterns including inserting type check to fix type smell bugs. These findings provide valuable insights for preventing and fixing type related bugs in the programs written in dynamic-typed languages.

Python 的动态类型特性为开发人员提供了强大的编程抽象。然而,由于滥用动态类型,Python 代码库中积累了许多与类型相关的错误。本文的目的是帮助理解开发人员对动态类型的高风险做法,并及早发现与类型相关的错误。我们首先制定了 Python 中六种与动态类型相关的高风险实践(简称类型臭味)的规则。然后,我们开发了一种名为 RUPOR 的基于规则的工具,它可以建立一个精确的类型库来检测类型气味。我们的评估表明,在一个包含 900 种 Python 方法的基准测试中,RUPOR 的性能优于现有的类型气味检测技术(包括基于 LLM 的方法、Mypy 和 PYDYPE)。基于 RUPOR,我们对 25 个现实世界的项目进行了实证研究。我们发现,类型气味与发布后故障的发生密切相关。利用类型气味特征构建的故障倾向性预测模型略优于不利用类型气味特征构建的模型。我们还总结了常见的模式,包括插入类型检查以修复类型气味错误。这些发现为防止和修复用动态类型语言编写的程序中与类型相关的错误提供了有价值的见解。
{"title":"Risky Dynamic Typing Related Practices in Python: An Empirical Study","authors":"Zhifei Chen, Lin Chen, Yibiao Yang, Qiong Feng, Xuansong Li, Wei Song","doi":"10.1145/3649593","DOIUrl":"https://doi.org/10.1145/3649593","url":null,"abstract":"<p>Python’s dynamic typing nature provides developers with powerful programming abstractions. However, many type related bugs are accumulated in code bases of Python due to the misuse of dynamic typing. The goal of this paper is to aid in the understanding of developers’ high-risk practices towards dynamic typing and the early detection of type related bugs. We first formulate the rules of six types of risky dynamic typing related practices (type smells for short) in Python. We then develop a rule-based tool named RUPOR which builds an accurate type base to detect type smells. Our evaluation shows that RUPOR outperforms the existing type smell detection techniques (including the LLM-based approaches, Mypy, and PYDYPE) on a benchmark of 900 Python methods. Based on RUPOR, we conduct an empirical study on 25 real-world projects. We find that type smells are significantly related to the occurrence of post-release faults. The fault-proneness prediction model built with type smell features slightly outperforms the model built without them. We also summarize the common patterns including inserting type check to fix type smell bugs. These findings provide valuable insights for preventing and fixing type related bugs in the programs written in dynamic-typed languages.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"2014 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139968714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Requirement Engineering Methods for Virtual Reality Software Product Development - A Mapping Study 虚拟现实软件产品开发的需求工程方法 - 一项绘图研究
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-26 DOI: 10.1145/3649595
Sai Anirudh Karre, Y. Raghu Reddy, Raghav Mittal

Software practitioners use various methods in Requirements Engineering (RE) to elicit, analyze and specify the requirements of a enterprise products. The methods impact the final product characteristics and influence product delivery. Ad-hoc usage of the methods by software practitioners can lead to inconsistency and ambiguity in the product. With the notable rise in enterprise products, games, etc. across various domains, Virtual Reality (VR) has become an essential technology for the future. The methods adopted for requirement engineering for developing VR products requires a detailed study. This paper presents a mapping study on requirement engineering methods prescribed and used for developing VR applications including requirements elicitation, requirements analysis, and requirements specification. Our study provides insights into the use of such methods in the VR community and suggests using specific requirement engineering methods in various fields of interest. We also discuss future directions in requirement engineering for VR products.

软件从业人员使用需求工程(Requirements Engineering,RE)中的各种方法来获取、分析和指定企业产品的需求。这些方法会影响最终产品的特性,并影响产品的交付。软件从业人员临时使用这些方法会导致产品的不一致性和模糊性。随着企业产品、游戏等各个领域的显著增长,虚拟现实(VR)已成为未来的一项重要技术。开发 VR 产品所采用的需求工程方法需要详细研究。本文对开发 VR 应用程序所规定和使用的需求工程方法进行了摸底研究,包括需求征询、需求分析和需求规范。我们的研究深入探讨了这些方法在 VR 界的使用情况,并建议在各相关领域使用特定的需求工程方法。我们还讨论了 VR 产品需求工程的未来发展方向。
{"title":"Requirement Engineering Methods for Virtual Reality Software Product Development - A Mapping Study","authors":"Sai Anirudh Karre, Y. Raghu Reddy, Raghav Mittal","doi":"10.1145/3649595","DOIUrl":"https://doi.org/10.1145/3649595","url":null,"abstract":"<p>Software practitioners use various methods in Requirements Engineering (RE) to elicit, analyze and specify the requirements of a enterprise products. The methods impact the final product characteristics and influence product delivery. Ad-hoc usage of the methods by software practitioners can lead to inconsistency and ambiguity in the product. With the notable rise in enterprise products, games, etc. across various domains, Virtual Reality (VR) has become an essential technology for the future. The methods adopted for requirement engineering for developing VR products requires a detailed study. This paper presents a mapping study on requirement engineering methods prescribed and used for developing VR applications including requirements elicitation, requirements analysis, and requirements specification. Our study provides insights into the use of such methods in the VR community and suggests using specific requirement engineering methods in various fields of interest. We also discuss future directions in requirement engineering for VR products.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"111 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139968466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-Autoregressive Line-Level Code Completion 非自回归线级代码补全
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-26 DOI: 10.1145/3649594
Fang Liu, Zhiyi Fu, Ge Li, Zhi Jin, Hui Liu, Yiyang Hao, Li Zhang

Software developers frequently use code completion tools to accelerate software development by suggesting the following code elements. Researchers usually employ AutoRegressive (AR) decoders to complete code sequences in a left-to-right, token-by-token fashion. To improve the accuracy and efficiency of code completion, we argue that tokens within a code statement have the potential to be predicted concurrently. In this paper, we first conduct an empirical study to analyze the dependency among the target tokens in line-level code completion. The results suggest that it is potentially practical to generate all statement tokens in parallel. To this end, we introduce SANAR, a simple and effective syntax-aware non-autoregressive model for line-level code completion. To further improve the quality of the generated code, we propose an adaptive and syntax-aware sampling strategy to boost the model’s performance. The experimental results obtained from two widely used datasets indicate that our model outperforms state-of-the-art code completion approaches of similar model size by a considerable margin, and is faster than these models with up to 9 × speed-up. Moreover, the extensive results additionally demonstrate that the enhancements achieved by SANAR become even more pronounced with larger model sizes, highlighting their significance.

软件开发人员经常使用代码自动补全工具,通过提示以下代码元素来加速软件开发。研究人员通常采用自动回归(AR)解码器,以从左到右、逐个标记的方式完成代码序列。为了提高代码完成的准确性和效率,我们认为代码语句中的标记有可能被同时预测。在本文中,我们首先进行了一项实证研究,分析了行级代码自动补全中目标标记之间的依赖关系。结果表明,并行生成所有语句标记是可行的。为此,我们引入了 SANAR,一种用于行级代码自动补全的简单有效的语法感知非自回归模型。为了进一步提高生成代码的质量,我们提出了一种自适应的语法感知采样策略,以提高模型的性能。从两个广泛使用的数据集中获得的实验结果表明,我们的模型在相当大的程度上优于模型大小相似的最先进的代码完成方法,而且比这些模型快 9 倍。此外,这些广泛的结果还表明,随着模型规模的增大,SANAR 所实现的增强效果会更加明显,从而凸显了其重要性。
{"title":"Non-Autoregressive Line-Level Code Completion","authors":"Fang Liu, Zhiyi Fu, Ge Li, Zhi Jin, Hui Liu, Yiyang Hao, Li Zhang","doi":"10.1145/3649594","DOIUrl":"https://doi.org/10.1145/3649594","url":null,"abstract":"<p>Software developers frequently use code completion tools to accelerate software development by suggesting the following code elements. Researchers usually employ AutoRegressive (AR) decoders to complete code sequences in a left-to-right, token-by-token fashion. To improve the accuracy and efficiency of code completion, we argue that tokens within a code statement have the potential to be predicted concurrently. In this paper, we first conduct an empirical study to analyze the dependency among the target tokens in line-level code completion. The results suggest that it is potentially practical to generate all statement tokens in parallel. To this end, we introduce SANAR, a simple and effective syntax-aware non-autoregressive model for line-level code completion. To further improve the quality of the generated code, we propose an adaptive and syntax-aware sampling strategy to boost the model’s performance. The experimental results obtained from two widely used datasets indicate that our model outperforms state-of-the-art code completion approaches of similar model size by a considerable margin, and is faster than these models with up to 9 × speed-up. Moreover, the extensive results additionally demonstrate that the enhancements achieved by SANAR become even more pronounced with larger model sizes, highlighting their significance.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"14 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139977717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enumerating Valid Non-Alpha-Equivalent Programs for Interpreter Testing 为口译测试枚举有效的非α-等效程序
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-12 DOI: 10.1145/3647994
Xinmeng Xia, Yang Feng, Qingkai Shi, James A. Jones, Xiangyu Zhang, Baowen Xu

Skeletal program enumeration (SPE) can generate a great number of test programs for validating the correctness of compilers or interpreters. The classic SPE generates programs by exhaustively enumerating all possible variable usage patterns into a given syntactic structure. Even though it is capable of producing many test programs, the exhaustive enumeration strategy generates a large number of invalid programs, which may waste plenty of testing time and resources. To address the problem, this paper proposes a tree-based SPE technique. Compared to the state-of-the-art, the key merit of the tree-based approach is that it allows us to take the dependency information into consideration when producing test programs and, thus, make it possible to (1) directly generate non-equivalent programs, and (2) apply dominance relations to eliminate invalid test programs that have undefined variables. Hence, our approach significantly saves the cost of the naïve SPE approach. We have implemented our approach into an automated testing tool, IFuzzer, and applied it to test eight different implementations of Python interpreters, including CPython, PyPy, IronPython, Jython, RustPython, GPython, Pyston, and Codon. In three months of fuzzing, IFuzzer detected 142 bugs, of which 87 have been confirmed to be previously unknown bugs, of which 34 have been fixed. Compared to the state-of-the-art SPE techniques, IFuzzer takes only 61.0% of the time cost given the same number of testing seeds and improves 5.3% source code function coverage in the same time budget of testing.

骨架程序枚举(SPE)可以生成大量测试程序,用于验证编译器或解释器的正确性。经典的 SPE 通过穷举给定语法结构中所有可能的变量使用模式来生成程序。尽管它能生成许多测试程序,但穷举策略会生成大量无效程序,从而浪费大量测试时间和资源。针对这一问题,本文提出了一种基于树的 SPE 技术。与最先进的技术相比,基于树的方法的主要优点是,它允许我们在生成测试程序时考虑依赖性信息,从而可以:(1) 直接生成非等价程序;(2) 应用支配关系剔除包含未定义变量的无效测试程序。因此,我们的方法大大节省了天真 SPE 方法的成本。我们在自动测试工具 IFuzzer 中实施了我们的方法,并将其用于测试 Python 解释器的八种不同实现,包括 CPython、PyPy、IronPython、Jython、RustPython、GPython、Pyston 和 Codon。在三个月的模糊测试中,IFuzzer 发现了 142 个漏洞,其中 87 个已确认为以前未知的漏洞,34 个已得到修复。与最先进的 SPE 技术相比,在测试种子数量相同的情况下,IFuzzer 仅花费了 61.0% 的时间成本,并在相同的测试时间预算内提高了 5.3% 的源代码函数覆盖率。
{"title":"Enumerating Valid Non-Alpha-Equivalent Programs for Interpreter Testing","authors":"Xinmeng Xia, Yang Feng, Qingkai Shi, James A. Jones, Xiangyu Zhang, Baowen Xu","doi":"10.1145/3647994","DOIUrl":"https://doi.org/10.1145/3647994","url":null,"abstract":"<p>Skeletal program enumeration (SPE) can generate a great number of test programs for validating the correctness of compilers or interpreters. The classic SPE generates programs by exhaustively enumerating all possible variable usage patterns into a given syntactic structure. Even though it is capable of producing many test programs, the exhaustive enumeration strategy generates a large number of invalid programs, which may waste plenty of testing time and resources. To address the problem, this paper proposes a tree-based SPE technique. Compared to the state-of-the-art, the key merit of the tree-based approach is that it allows us to take the dependency information into consideration when producing test programs and, thus, make it possible to (1) directly generate non-equivalent programs, and (2) apply dominance relations to eliminate invalid test programs that have undefined variables. Hence, our approach significantly saves the cost of the naïve SPE approach. We have implemented our approach into an automated testing tool, <span>IFuzzer</span>, and applied it to test eight different implementations of Python interpreters, including CPython, PyPy, IronPython, Jython, RustPython, GPython, Pyston, and Codon. In three months of fuzzing, <span>IFuzzer</span> detected 142 bugs, of which 87 have been confirmed to be previously unknown bugs, of which 34 have been fixed. Compared to the state-of-the-art SPE techniques, <span>IFuzzer</span> takes only 61.0% of the time cost given the same number of testing seeds and improves 5.3% source code function coverage in the same time budget of testing.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"31 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139757851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
sGuard+: Machine Learning Guided Rule-based Automated Vulnerability Repair on Smart Contracts. sGuard+:智能合约上基于机器学习规则的自动漏洞修复。
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-08 DOI: 10.1145/3641846
Cuifeng Gao, Wenzhang Yang, Jiaming Ye, Yinxing Xue, Jun Sun

Smart contracts are becoming appealing targets for hackers because of the vast amount of cryptocurrencies under their control. Asset loss due to the exploitation of smart contract codes has increased significantly in recent years. To guarantee that smart contracts are vulnerability-free, there are many works to detect the vulnerabilities of smart contracts, but only a few vulnerability repair works have been proposed. Repairing smart contract vulnerabilities at the source code level is attractive as it is transparent to users, whereas existing repair tools, such as SCRepair and sGuard, suffer from many limitations: (1) ignoring the code of vulnerability prevention; (2) possibly applying the repair to the wrong statements and changing the original business logic of smart contracts; (3) showing poor performance in terms of time and gas overhead.

In this work, we propose machine learning guided rule-based automated vulnerability repair on smart contracts to improve the effectiveness and efficiency of sGuard. To address the limitations mentioned above, we design the features that characterize both the symptoms of vulnerabilities and the methods of vulnerability prevention to learn various vulnerability patterns and reduce false positives. Additionally, a fine-grained localization algorithm is designed by traversing the nodes of the abstract syntax tree, and we refine and extend the repair rules of sGuard to preserve the original business logic of smart contracts and support new vulnerability types. Our tool, named sGuard+, reduces time overhead based on machine learning models, and reduces gas overhead by fewer code changes and precise patching.

In our experiment, we collect a publicly available vulnerability dataset from CVE, SWC and SmartBugs Curated as a ground truth for evaluations. Overall, sGuard+ repairs more vulnerabilities with less time and gas overhead than state-of-the-art tools. Furthermore, we reproduce about 9,000 historical transactions for regression testing. It is shown that sGuard+ has no impact on the original business logic of smart contracts.

由于黑客控制着大量加密货币,智能合约正成为黑客们青睐的目标。近年来,因智能合约代码被利用而造成的资产损失大幅增加。为了保证智能合约不存在漏洞,目前有许多检测智能合约漏洞的工作,但只有少数漏洞修复工作被提出。在源代码层面修复智能合约漏洞对用户来说是透明的,因此很有吸引力,而现有的修复工具,如 SCRepair 和 sGuard,存在很多局限性:(1)忽略了漏洞预防代码;(2)可能将修复应用于错误的语句,改变了智能合约原有的业务逻辑;(3)在时间和气体开销方面表现不佳。在这项工作中,我们提出了基于机器学习引导规则的智能合约自动漏洞修复方法,以提高 sGuard 的有效性和效率。针对上述局限性,我们设计了既能描述漏洞症状又能描述漏洞预防方法的特征,以学习各种漏洞模式,减少误报。此外,我们还通过遍历抽象语法树的节点设计了一种细粒度定位算法,并完善和扩展了 sGuard 的修复规则,以保留智能合约的原始业务逻辑并支持新的漏洞类型。我们的工具被命名为 sGuard+,它基于机器学习模型减少了时间开销,并通过减少代码修改和精确修补减少了气体开销。在实验中,我们收集了来自 CVE、SWC 和 SmartBugs Curated 的公开漏洞数据集作为评估的基本事实。总体而言,与最先进的工具相比,sGuard+ 能以更少的时间和气体开销修复更多的漏洞。此外,我们还重现了约 9,000 个历史事务进行回归测试。结果表明,sGuard+ 对智能合约的原始业务逻辑没有影响。
{"title":"sGuard+: Machine Learning Guided Rule-based Automated Vulnerability Repair on Smart Contracts.","authors":"Cuifeng Gao, Wenzhang Yang, Jiaming Ye, Yinxing Xue, Jun Sun","doi":"10.1145/3641846","DOIUrl":"https://doi.org/10.1145/3641846","url":null,"abstract":"<p>Smart contracts are becoming appealing targets for hackers because of the vast amount of cryptocurrencies under their control. Asset loss due to the exploitation of smart contract codes has increased significantly in recent years. To guarantee that smart contracts are vulnerability-free, there are many works to detect the vulnerabilities of smart contracts, but only a few vulnerability repair works have been proposed. Repairing smart contract vulnerabilities at the source code level is attractive as it is transparent to users, whereas existing repair tools, such as <span>SCRepair</span> and <span>sGuard</span>, suffer from many limitations: (1) ignoring the code of vulnerability prevention; (2) possibly applying the repair to the wrong statements and changing the original business logic of smart contracts; (3) showing poor performance in terms of time and gas overhead. </p><p>In this work, we propose machine learning guided rule-based automated vulnerability repair on smart contracts to improve the effectiveness and efficiency of <span>sGuard</span>. To address the limitations mentioned above, we design the features that characterize both the symptoms of vulnerabilities and the methods of vulnerability prevention to learn various vulnerability patterns and reduce false positives. Additionally, a fine-grained localization algorithm is designed by traversing the nodes of the abstract syntax tree, and we refine and extend the repair rules of <span>sGuard</span> to preserve the original business logic of smart contracts and support new vulnerability types. Our tool, named <span>sGuard+</span>, reduces time overhead based on machine learning models, and reduces gas overhead by fewer code changes and precise patching. </p><p>In our experiment, we collect a publicly available vulnerability dataset from CVE, SWC and SmartBugs Curated as a ground truth for evaluations. Overall, <span>sGuard+</span> repairs more vulnerabilities with less time and gas overhead than state-of-the-art tools. Furthermore, we reproduce about 9,000 historical transactions for regression testing. It is shown that <span>sGuard+</span> has no impact on the original business logic of smart contracts.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"67 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139757626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supporting Safety Analysis of Image-processing DNNs through Clustering-based Approaches 通过基于聚类的方法支持图像处理 DNN 的安全分析
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-07 DOI: 10.1145/3643671
Mohammed Oualid Attaoui, Hazem Fahmy, Fabrizio Pastore, Lionel Briand

The adoption of deep neural networks (DNNs) in safety-critical contexts is often prevented by the lack of effective means to explain their results, especially when they are erroneous. In our previous work, we proposed a white-box approach (HUDD) and a black-box approach (SAFE) to automatically characterize DNN failures. They both identify clusters of similar images from a potentially large set of images leading to DNN failures. However, the analysis pipelines for HUDD and SAFE were instantiated in specific ways according to common practices, deferring the analysis of other pipelines to future work.

In this paper, we report on an empirical evaluation of 99 different pipelines for root cause analysis of DNN failures. They combine transfer learning, autoencoders, heatmaps of neuron relevance, dimensionality reduction techniques, and different clustering algorithms. Our results show that the best pipeline combines transfer learning, DBSCAN, and UMAP. It leads to clusters almost exclusively capturing images of the same failure scenario, thus facilitating root cause analysis. Further, it generates distinct clusters for each root cause of failure, thus enabling engineers to detect all the unsafe scenarios. Interestingly, these results hold even for failure scenarios that are only observed in a small percentage of the failing images.

在安全关键型环境中采用深度神经网络(DNN),往往会因为缺乏有效的方法来解释其结果,尤其是当它们出现错误时。在我们之前的工作中,我们提出了一种白盒方法(HUDD)和一种黑盒方法(SAFE)来自动描述 DNN 故障。这两种方法都能从可能导致 DNN 故障的大量图像中识别出相似图像集群。不过,HUDD 和 SAFE 的分析管道是按照常见做法以特定方式实例化的,对其他管道的分析则推迟到了未来的工作中。在本文中,我们报告了对 99 种不同管道进行 DNN 故障根源分析的实证评估。它们结合了迁移学习、自动编码器、神经元相关性热图、降维技术和不同的聚类算法。我们的研究结果表明,最佳管道结合了迁移学习、DBSCAN 和 UMAP。它所产生的聚类几乎完全捕捉到了相同故障场景的图像,从而促进了根本原因分析。此外,它还能为每个故障根源生成不同的聚类,从而使工程师能够检测到所有不安全的情况。有趣的是,这些结果甚至适用于仅在一小部分故障图像中观察到的故障场景。
{"title":"Supporting Safety Analysis of Image-processing DNNs through Clustering-based Approaches","authors":"Mohammed Oualid Attaoui, Hazem Fahmy, Fabrizio Pastore, Lionel Briand","doi":"10.1145/3643671","DOIUrl":"https://doi.org/10.1145/3643671","url":null,"abstract":"<p>The adoption of deep neural networks (DNNs) in safety-critical contexts is often prevented by the lack of effective means to explain their results, especially when they are erroneous. In our previous work, we proposed a white-box approach (HUDD) and a black-box approach (SAFE) to automatically characterize DNN failures. They both identify clusters of similar images from a potentially large set of images leading to DNN failures. However, the analysis pipelines for HUDD and SAFE were instantiated in specific ways according to common practices, deferring the analysis of other pipelines to future work. </p><p>In this paper, we report on an empirical evaluation of 99 different pipelines for root cause analysis of DNN failures. They combine transfer learning, autoencoders, heatmaps of neuron relevance, dimensionality reduction techniques, and different clustering algorithms. Our results show that the best pipeline combines transfer learning, DBSCAN, and UMAP. It leads to clusters almost exclusively capturing images of the same failure scenario, thus facilitating root cause analysis. Further, it generates distinct clusters for each root cause of failure, thus enabling engineers to detect all the unsafe scenarios. Interestingly, these results hold even for failure scenarios that are only observed in a small percentage of the failing images.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"17 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139757636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Try with Simpler – An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection 用更简单的方法尝试 - 基于日志的异常检测中改进的主成分分析评估
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-07 DOI: 10.1145/3644386
Lin Yang, Junjie Chen, Shutao Gao, Zhihao Gong, Hongyu Zhang, Yue Kang, Huaan Li

With the rapid development of deep learning (DL), the recent trend of log-based anomaly detection focuses on extracting semantic information from log events (i.e., templates of log messages) and designing more advanced DL models for anomaly detection. Indeed, the effectiveness of log-based anomaly detection can be improved, but these DL-based techniques further suffer from the limitations of more heavy dependency on training data (such as data quality or data labels) and higher costs in time and resources due to the complexity and scale of DL models, which hinder their practical use. On the contrary, the techniques based on traditional machine learning or data mining algorithms are less dependent on training data and more efficient, but produce worse effectiveness than DL-based techniques which is mainly caused by the problem of unseen log events (some log events in incoming log messages are unseen in training data) confirmed by our motivating study. Intuitively, if we can improve the effectiveness of traditional techniques to be comparable with advanced DL-based techniques, log-based anomaly detection can be more practical. Indeed, an existing study in the other area (i.e., linking questions posted on Stack Overflow) has pointed out that traditional techniques with some optimizations can indeed achieve comparable effectiveness with the state-of-the-art DL-based technique, indicating the feasibility of enhancing traditional log-based anomaly detection techniques to some degree.

Inspired by the idea of “try-with-simpler”, we conducted the first empirical study to explore the potential of improving traditional techniques for more practical log-based anomaly detection. In this work, we optimized the traditional unsupervised PCA (Principal Component Analysis) technique by incorporating a lightweight semantic-based log representation in it, called SemPCA, and conducted an extensive study to investigate the potential of SemPCA for more practical log-based anomaly detection. By comparing seven log-based anomaly detection techniques (including four DL-based techniques, two traditional techniques, and SemPCA) on both public and industrial datasets, our results show that SemPCA achieves comparable effectiveness as advanced supervised/semi-supervised DL-based techniques while being much more stable under insufficient training data and more efficient, demonstrating that the traditional technique can still excel after small but useful adaptation.

随着深度学习(DL)的快速发展,近期基于日志的异常检测趋势侧重于从日志事件(即日志消息模板)中提取语义信息,并设计更先进的 DL 模型用于异常检测。事实上,基于日志的异常检测的有效性是可以提高的,但这些基于 DL 的技术还存在对训练数据(如数据质量或数据标签)依赖性较强、DL 模型的复杂性和规模导致时间和资源成本较高等局限性,这些都阻碍了它们的实际应用。相反,基于传统机器学习或数据挖掘算法的技术对训练数据的依赖性较低,效率较高,但效果却不如基于 DL 的技术,这主要是由于未见日志事件的问题造成的(输入日志信息中的一些日志事件在训练数据中是未见的)。直观地说,如果我们能提高传统技术的有效性,使其与先进的基于 DL 的技术相媲美,那么基于日志的异常检测就会更加实用。事实上,另一个领域的现有研究(即在 Stack Overflow 上发布的链接问题)已经指出,传统技术经过一些优化后确实可以达到与最先进的基于 DL 的技术相当的效果,这表明在一定程度上增强传统的基于日志的异常检测技术是可行的。受 "简化尝试"(try-with-simpler)思想的启发,我们首次开展了实证研究,探索改进传统技术以实现更实用的基于日志的异常检测的潜力。在这项工作中,我们优化了传统的无监督 PCA(主成分分析)技术,在其中加入了一种轻量级的基于语义的日志表示法,称为 SemPCA,并开展了一项广泛的研究,以探讨 SemPCA 在更实用的基于日志的异常检测中的潜力。通过在公共数据集和工业数据集上比较七种基于日志的异常检测技术(包括四种基于DL的技术、两种传统技术和SemPCA),我们的结果表明,SemPCA与先进的基于监督/半监督DL的技术效果相当,而且在训练数据不足的情况下更加稳定,效率更高,这表明传统技术在经过微小但有益的调整后仍能发挥出色的作用。
{"title":"Try with Simpler – An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection","authors":"Lin Yang, Junjie Chen, Shutao Gao, Zhihao Gong, Hongyu Zhang, Yue Kang, Huaan Li","doi":"10.1145/3644386","DOIUrl":"https://doi.org/10.1145/3644386","url":null,"abstract":"<p>With the rapid development of deep learning (DL), the recent trend of log-based anomaly detection focuses on extracting semantic information from log events (i.e., templates of log messages) and designing more advanced DL models for anomaly detection. Indeed, the effectiveness of log-based anomaly detection can be improved, but these DL-based techniques further suffer from the limitations of more heavy dependency on training data (such as data quality or data labels) and higher costs in time and resources due to the complexity and scale of DL models, which hinder their practical use. On the contrary, the techniques based on traditional machine learning or data mining algorithms are less dependent on training data and more efficient, but produce worse effectiveness than DL-based techniques which is mainly caused by the problem of unseen log events (some log events in incoming log messages are unseen in training data) confirmed by our motivating study. Intuitively, if we can improve the effectiveness of traditional techniques to be comparable with advanced DL-based techniques, log-based anomaly detection can be more practical. Indeed, an existing study in the other area (i.e., linking questions posted on Stack Overflow) has pointed out that traditional techniques with some optimizations can indeed achieve comparable effectiveness with the state-of-the-art DL-based technique, indicating the feasibility of enhancing traditional log-based anomaly detection techniques to some degree. </p><p>Inspired by the idea of “try-with-simpler”, we conducted the first empirical study to explore the potential of improving traditional techniques for more practical log-based anomaly detection. In this work, we optimized the traditional unsupervised PCA (Principal Component Analysis) technique by incorporating a lightweight semantic-based log representation in it, called <i>SemPCA</i>, and conducted an extensive study to investigate the potential of <i>SemPCA</i> for more practical log-based anomaly detection. By comparing seven log-based anomaly detection techniques (including four DL-based techniques, two traditional techniques, and <i>SemPCA</i>) on both public and industrial datasets, our results show that <i>SemPCA</i> achieves comparable effectiveness as advanced supervised/semi-supervised DL-based techniques while being much more stable under insufficient training data and more efficient, demonstrating that the traditional technique can still excel after small but useful adaptation.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"44 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139757786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep Neural Networks DeepGD: 深度神经网络的多目标黑盒测试选择方法
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-07 DOI: 10.1145/3644388
Zohreh Aghababaeyan, Manel Abdellatif, Mahboubeh Dadkhah, Lionel Briand

Deep neural networks (DNNs) are widely used in various application domains such as image processing, speech recognition, and natural language processing. However, testing DNN models may be challenging due to the complexity and size of their input domain. Particularly, testing DNN models often requires generating or exploring large unlabeled datasets. In practice, DNN test oracles, which identify the correct outputs for inputs, often require expensive manual effort to label test data, possibly involving multiple experts to ensure labeling correctness. In this paper, we propose DeepGD, a black-box multi-objective test selection approach for DNN models. It reduces the cost of labeling by prioritizing the selection of test inputs with high fault-revealing power from large unlabeled datasets. DeepGD not only selects test inputs with high uncertainty scores to trigger as many mispredicted inputs as possible but also maximizes the probability of revealing distinct faults in the DNN model by selecting diverse mispredicted inputs. The experimental results conducted on four widely used datasets and five DNN models show that in terms of fault-revealing ability: (1) White-box, coverage-based approaches fare poorly, (2) DeepGD outperforms existing black-box test selection approaches in terms of fault detection, and (3) DeepGD also leads to better guidance for DNN model retraining when using selected inputs to augment the training set.

深度神经网络(DNN)被广泛应用于图像处理、语音识别和自然语言处理等多个应用领域。然而,由于输入域的复杂性和规模,测试 DNN 模型可能具有挑战性。特别是,测试 DNN 模型通常需要生成或探索大型无标记数据集。在实践中,为输入识别正确输出的 DNN 测试谕令通常需要昂贵的人工工作来标注测试数据,可能需要多个专家参与以确保标注的正确性。在本文中,我们提出了针对 DNN 模型的黑盒多目标测试选择方法 DeepGD。它通过优先从大型未标注数据集中选择具有高故障揭示能力的测试输入来降低标注成本。DeepGD 不仅选择不确定性得分高的测试输入,以触发尽可能多的错误预测输入,而且还通过选择不同的错误预测输入,最大限度地提高 DNN 模型揭示明显故障的概率。在四个广泛使用的数据集和五个 DNN 模型上进行的实验结果表明,在故障揭示能力方面:(1) 基于覆盖率的白盒方法表现不佳;(2) DeepGD 在故障检测方面优于现有的黑盒测试选择方法;(3) 当使用选定的输入来增强训练集时,DeepGD 还能为 DNN 模型的再训练提供更好的指导。
{"title":"DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep Neural Networks","authors":"Zohreh Aghababaeyan, Manel Abdellatif, Mahboubeh Dadkhah, Lionel Briand","doi":"10.1145/3644388","DOIUrl":"https://doi.org/10.1145/3644388","url":null,"abstract":"<p>Deep neural networks (DNNs) are widely used in various application domains such as image processing, speech recognition, and natural language processing. However, testing DNN models may be challenging due to the complexity and size of their input domain. Particularly, testing DNN models often requires generating or exploring large unlabeled datasets. In practice, DNN test oracles, which identify the correct outputs for inputs, often require expensive manual effort to label test data, possibly involving multiple experts to ensure labeling correctness. In this paper, we propose <i>DeepGD</i>, a black-box multi-objective test selection approach for DNN models. It reduces the cost of labeling by prioritizing the selection of test inputs with high fault-revealing power from large unlabeled datasets. <i>DeepGD</i> not only selects test inputs with high uncertainty scores to trigger as many mispredicted inputs as possible but also maximizes the probability of revealing distinct faults in the DNN model by selecting diverse mispredicted inputs. The experimental results conducted on four widely used datasets and five DNN models show that in terms of fault-revealing ability: (1) White-box, coverage-based approaches fare poorly, (2) <i>DeepGD</i> outperforms existing black-box test selection approaches in terms of fault detection, and (3) <i>DeepGD</i> also leads to better guidance for DNN model retraining when using selected inputs to augment the training set.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"324 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139757704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Abstraction and Refinement: Towards Scalable and Exact Verification of Neural Networks 抽象与细化:实现神经网络的可扩展和精确验证
IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-05 DOI: 10.1145/3644387
Jiaxiang Liu, Yunhan Xing, Xiaomu Shi, Fu Song, Zhiwu Xu, Zhong Ming

As a new programming paradigm, deep neural networks (DNNs) have been increasingly deployed in practice, but the lack of robustness hinders their applications in safety-critical domains. While there are techniques for verifying DNNs with formal guarantees, they are limited in scalability and accuracy. In this paper, we present a novel counterexample-guided abstraction refinement (CEGAR) approach for scalable and exact verification of DNNs. Specifically, we propose a novel abstraction to break down the size of DNNs by over-approximation. The result of verifying the abstract DNN is conclusive if no spurious counterexample is reported. To eliminate each spurious counterexample introduced by abstraction, we propose a novel counterexample-guided refinement that refines the abstract DNN to exclude the spurious counterexample while still over-approximating the original one, leading to a sound, complete yet efficient CEGAR approach. Our approach is orthogonal to and can be integrated with many existing verification techniques. For demonstration, we implement our approach using two promising tools Marabou and Planet as the underlying verification engines, and evaluate on widely-used benchmarks for three datasets ACAS Xu, MNIST and CIFAR-10. The results show that our approach can boost their performance by solving more problems in the same time limit, reducing on average 13.4%–86.3% verification time of Marabou on almost all the verification tasks, and reducing on average 8.3%–78.0% verification time of Planet on all the verification tasks. Compared to the most relevant CEGAR-based approach, our approach is 11.6–26.6 times faster.

作为一种新的编程范式,深度神经网络(DNN)在实践中的应用越来越广泛,但其鲁棒性的缺乏阻碍了其在安全关键领域的应用。虽然有一些技术可以用形式保证来验证 DNN,但它们在可扩展性和准确性方面都很有限。在本文中,我们提出了一种新颖的反例引导抽象细化(CEGAR)方法,用于对 DNN 进行可扩展的精确验证。具体来说,我们提出了一种新颖的抽象,通过过度逼近来分解 DNN 的大小。如果没有虚假反例,抽象 DNN 的验证结果就是确定的。为了消除由抽象引入的每一个虚假反例,我们提出了一种新颖的反例引导细化方法,它可以细化抽象 DNN 以排除虚假反例,同时仍然过度逼近原始 DNN,从而形成一种完善、完整而高效的 CEGAR 方法。我们的方法与许多现有的验证技术正交,并可与之集成。为了进行演示,我们使用 Marabou 和 Planet 这两个前景看好的工具作为底层验证引擎来实现我们的方法,并在 ACAS Xu、MNIST 和 CIFAR-10 这三个数据集的广泛使用基准上进行了评估。结果表明,我们的方法可以在相同时限内解决更多问题,从而提高它们的性能,在几乎所有验证任务中,Marabou 的验证时间平均缩短了 13.4% 到 86.3%,而 Planet 在所有验证任务中的验证时间平均缩短了 8.3% 到 78.0%。与最相关的基于 CEGAR 的方法相比,我们的方法快 11.6-26.6 倍。
{"title":"Abstraction and Refinement: Towards Scalable and Exact Verification of Neural Networks","authors":"Jiaxiang Liu, Yunhan Xing, Xiaomu Shi, Fu Song, Zhiwu Xu, Zhong Ming","doi":"10.1145/3644387","DOIUrl":"https://doi.org/10.1145/3644387","url":null,"abstract":"<p>As a new programming paradigm, deep neural networks (DNNs) have been increasingly deployed in practice, but the lack of robustness hinders their applications in safety-critical domains. While there are techniques for verifying DNNs with formal guarantees, they are limited in scalability and accuracy. In this paper, we present a novel counterexample-guided abstraction refinement (CEGAR) approach for scalable and exact verification of DNNs. Specifically, we propose a novel abstraction to break down the size of DNNs by over-approximation. The result of verifying the abstract DNN is conclusive if no spurious counterexample is reported. To eliminate each spurious counterexample introduced by abstraction, we propose a novel counterexample-guided refinement that refines the abstract DNN to exclude the spurious counterexample while still over-approximating the original one, leading to a sound, complete yet efficient CEGAR approach. Our approach is orthogonal to and can be integrated with many existing verification techniques. For demonstration, we implement our approach using two promising tools <span>Marabou</span> and <span>Planet</span> as the underlying verification engines, and evaluate on widely-used benchmarks for three datasets <monospace>ACAS</monospace> <monospace>Xu</monospace>, <monospace>MNIST</monospace> and <monospace>CIFAR-10</monospace>. The results show that our approach can boost their performance by solving more problems in the same time limit, reducing on average 13.4%–86.3% verification time of <span>Marabou</span> on almost all the verification tasks, and reducing on average 8.3%–78.0% verification time of <span>Planet</span> on all the verification tasks. Compared to the most relevant CEGAR-based approach, our approach is 11.6–26.6 times faster.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"29 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139757624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Software Engineering and Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1