Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis最新文献_第2页

Finding permission bugs in smart contracts with role mining 使用角色挖掘在智能合约中查找权限错误

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534372

Ye Liu, Yi Li, Shang-Wei Lin, Cyrille Artho

Smart contracts deployed on permissionless blockchains, such as Ethereum, are accessible to any user in a trustless environment. Therefore, most smart contract applications implement access control policies to protect their valuable assets from unauthorized accesses. A difficulty in validating the conformance to such policies, i.e., whether the contract implementation adheres to the expected behaviors, is the lack of policy specifications. In this paper, we mine past transactions of a contract to recover a likely access control model, which can then be checked against various information flow policies and identify potential bugs related to user permissions. We implement our role mining and security policy validation in tool SPCon. The experimental evaluation on labeled smart contract role mining benchmark demonstrates that SPCon effectively mines more accurate user roles compared to the state-of-the-art role mining tools. Moreover, the experimental evaluation on real-world smart contract benchmark and access control CVEs indicates SPCon effectively detects potential permission bugs while having better scalability and lower false-positive rate compared to the state-of-the-art security tools, finding 11 previously unknown bugs and detecting six CVEs that no other tool can find.

部署在无需许可的区块链(如以太坊)上的智能合约，在无需信任的环境中可供任何用户访问。因此，大多数智能合约应用程序实现访问控制策略，以保护其有价值的资产免受未经授权的访问。验证这些策略的一致性的一个困难，即合同实现是否遵循预期的行为，是缺乏策略规范。在本文中，我们挖掘合约的过去交易以恢复可能的访问控制模型，然后可以根据各种信息流策略检查该模型，并识别与用户权限相关的潜在错误。我们在SPCon工具中实现角色挖掘和安全策略验证。对标记智能合约角色挖掘基准的实验评估表明，与最先进的角色挖掘工具相比，SPCon有效地挖掘了更准确的用户角色。此外，对现实世界智能合约基准和访问控制cve的实验评估表明，与最先进的安全工具相比，SPCon有效地检测到潜在的权限漏洞，同时具有更好的可扩展性和更低的误报率，发现了11个以前未知的漏洞，检测了6个其他工具无法发现的cve。

{"title":"Finding permission bugs in smart contracts with role mining","authors":"Ye Liu, Yi Li, Shang-Wei Lin, Cyrille Artho","doi":"10.1145/3533767.3534372","DOIUrl":"https://doi.org/10.1145/3533767.3534372","url":null,"abstract":"Smart contracts deployed on permissionless blockchains, such as Ethereum, are accessible to any user in a trustless environment. Therefore, most smart contract applications implement access control policies to protect their valuable assets from unauthorized accesses. A difficulty in validating the conformance to such policies, i.e., whether the contract implementation adheres to the expected behaviors, is the lack of policy specifications. In this paper, we mine past transactions of a contract to recover a likely access control model, which can then be checked against various information flow policies and identify potential bugs related to user permissions. We implement our role mining and security policy validation in tool SPCon. The experimental evaluation on labeled smart contract role mining benchmark demonstrates that SPCon effectively mines more accurate user roles compared to the state-of-the-art role mining tools. Moreover, the experimental evaluation on real-world smart contract benchmark and access control CVEs indicates SPCon effectively detects potential permission bugs while having better scalability and lower false-positive rate compared to the state-of-the-art security tools, finding 11 previously unknown bugs and detecting six CVEs that no other tool can find.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122321363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Testing Dafny (experience paper) 测试丹尼(经验报告)

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534382

A. Irfan, Sorawee Porncharoenwase, Zvonimir Rakamaric, Neha Rungta, E. Torlak

Verification toolchains are widely used to prove the correctness of critical software systems. To build confidence in their results, it is important to develop testing frameworks that help detect bugs in these toolchains. Inspired by the success of fuzzing in finding bugs in compilers and SMT solvers, we have built the first fuzzing and differential testing framework for Dafny, a high-level programming language with a Floyd-Hoare-style program verifier and compilers to C#, Java, Go, and Javascript. This paper presents our experience building and using XDsmith, a testing framework that targets the entire Dafny toolchain, from verification to compilation. XDsmith randomly generates annotated programs in a subset of Dafny that is free of loops and heap-mutating operations. The generated programs include preconditions, postconditions, and assertions, and they have a known verification outcome. These programs are used to test the soundness and precision of the Dafny verifier, and to perform differential testing on the four Dafny compilers. Using XDsmith, we uncovered 31 bugs across the Dafny verifier and compilers, each of which has been confirmed by the Dafny developers. Moreover, 8 of these bugs have been fixed in the mainline release of Dafny.

验证工具链被广泛用于证明关键软件系统的正确性。为了建立对其结果的信心，开发有助于检测这些工具链中的错误的测试框架是很重要的。受模糊测试在查找编译器和SMT求解器中的错误方面的成功启发，我们为Dafny构建了第一个模糊测试和差异测试框架，Dafny是一种高级编程语言，具有floyd - hoore风格的程序验证器和c#、Java、Go和Javascript的编译器。本文介绍了我们构建和使用XDsmith的经验，XDsmith是一个针对整个Dafny工具链的测试框架，从验证到编译。XDsmith在没有循环和堆变化操作的Dafny子集中随机生成带注释的程序。生成的程序包括前置条件、后置条件和断言，并且它们具有已知的验证结果。这些程序用于测试Dafny验证器的可靠性和精度，并对四个Dafny编译器进行差分测试。使用XDsmith，我们在Dafny验证器和编译器中发现了31个bug，每个bug都得到了Dafny开发人员的确认。此外，在Dafny的主线版本中已经修复了其中的8个bug。

{"title":"Testing Dafny (experience paper)","authors":"A. Irfan, Sorawee Porncharoenwase, Zvonimir Rakamaric, Neha Rungta, E. Torlak","doi":"10.1145/3533767.3534382","DOIUrl":"https://doi.org/10.1145/3533767.3534382","url":null,"abstract":"Verification toolchains are widely used to prove the correctness of critical software systems. To build confidence in their results, it is important to develop testing frameworks that help detect bugs in these toolchains. Inspired by the success of fuzzing in finding bugs in compilers and SMT solvers, we have built the first fuzzing and differential testing framework for Dafny, a high-level programming language with a Floyd-Hoare-style program verifier and compilers to C#, Java, Go, and Javascript. This paper presents our experience building and using XDsmith, a testing framework that targets the entire Dafny toolchain, from verification to compilation. XDsmith randomly generates annotated programs in a subset of Dafny that is free of loops and heap-mutating operations. The generated programs include preconditions, postconditions, and assertions, and they have a known verification outcome. These programs are used to test the soundness and precision of the Dafny verifier, and to perform differential testing on the four Dafny compilers. Using XDsmith, we uncovered 31 bugs across the Dafny verifier and compilers, each of which has been confirmed by the Dafny developers. Moreover, 8 of these bugs have been fixed in the mainline release of Dafny.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129017462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Unicorn: detect runtime errors in time-series databases with hybrid input synthesis Unicorn:通过混合输入合成检测时间序列数据库中的运行时错误

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534364

Zhiyong Wu, Jie Liang, Mingzhe Wang, Chijin Zhou, Yu Jiang

The ubiquitous use of time-series databases in the safety-critical Internet of Things domain demands strict security and correctness. One successful approach in database bug detection is fuzzing, where hundreds of bugs have been detected automatically in relational databases. However, it cannot be easily applied to time-series databases: the bulk of time-series logic is unreachable because of mismatched query specifications, and serious bugs are undetectable because of implicitly handled exceptions. In this paper, we propose Unicorn to secure time-series databases with automated fuzzing. First, we design hybrid input synthesis to generate high-quality queries which not only cover time-series features but also ensure grammar correctness. Then, Unicorn uses proactive exception detection to discover minuscule-symptom bugs which hide behind implicit exception handling. With the specialized design oriented to time-series databases, Unicorn outperforms the state-of-the-art database fuzzers in terms of coverage and bugs. Specifically, Unicorn outperforms SQLsmith and SQLancer on widely used time-series databases IoTDB, KairosDB, TimescaleDB, TDEngine, QuestDB, and GridDB in the number of basic blocks by 21%-199% and 34%-693%, respectively. More importantly, Unicorn has discovered 42 previously unknown bugs.

时间序列数据库在安全关键的物联网领域的普遍使用要求严格的安全性和正确性。在数据库错误检测中，一种成功的方法是模糊测试，在关系数据库中已经自动检测到数百个错误。然而，它不容易应用于时间序列数据库:由于查询规范不匹配，大部分时间序列逻辑无法访问，并且由于隐式处理的异常，严重的错误无法检测到。在本文中，我们提出Unicorn通过自动模糊测试来保护时间序列数据库。首先，我们设计了混合输入合成来生成高质量的查询，这些查询不仅涵盖了时间序列特征，而且保证了语法的正确性。然后，Unicorn使用主动异常检测来发现隐藏在隐式异常处理背后的微小症状错误。凭借面向时间序列数据库的专门设计，Unicorn在覆盖率和bug方面优于最先进的数据库fuzzers。具体来说，在广泛使用的时间序列数据库IoTDB、KairosDB、TimescaleDB、TDEngine、QuestDB和GridDB上，Unicorn的基本块数量分别比SQLsmith和SQLancer高出21%-199%和34%-693%。更重要的是，独角兽已经发现了42个以前未知的漏洞。

{"title":"Unicorn: detect runtime errors in time-series databases with hybrid input synthesis","authors":"Zhiyong Wu, Jie Liang, Mingzhe Wang, Chijin Zhou, Yu Jiang","doi":"10.1145/3533767.3534364","DOIUrl":"https://doi.org/10.1145/3533767.3534364","url":null,"abstract":"The ubiquitous use of time-series databases in the safety-critical Internet of Things domain demands strict security and correctness. One successful approach in database bug detection is fuzzing, where hundreds of bugs have been detected automatically in relational databases. However, it cannot be easily applied to time-series databases: the bulk of time-series logic is unreachable because of mismatched query specifications, and serious bugs are undetectable because of implicitly handled exceptions. In this paper, we propose Unicorn to secure time-series databases with automated fuzzing. First, we design hybrid input synthesis to generate high-quality queries which not only cover time-series features but also ensure grammar correctness. Then, Unicorn uses proactive exception detection to discover minuscule-symptom bugs which hide behind implicit exception handling. With the specialized design oriented to time-series databases, Unicorn outperforms the state-of-the-art database fuzzers in terms of coverage and bugs. Specifically, Unicorn outperforms SQLsmith and SQLancer on widely used time-series databases IoTDB, KairosDB, TimescaleDB, TDEngine, QuestDB, and GridDB in the number of basic blocks by 21%-199% and 34%-693%, respectively. More importantly, Unicorn has discovered 42 previously unknown bugs.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133944109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Combining solution reuse and bound tightening for efficient analysis of evolving systems 结合解决方案重用和界紧，有效分析演化系统

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534399

Clay Stevens, H. Bagheri

Software engineers have long employed formal verification to ensure the safety and validity of their system designs. As the system changes---often via predictable, domain-specific operations---their models must also change, requiring system designers to repeatedly execute the same formal verification on similar system models. State-of-the-art formal verification techniques can be expensive at scale, the cost of which is multiplied by repeated analysis. This paper presents a novel analysis technique---implemented in a tool called SoRBoT---which can automatically determine domain-specific optimizations that can dramatically reduce the cost of repeatedly analyzing evolving systems. Different from all prior approaches, which focus on either tightening the bounds for analysis or reusing all or part of prior solutions, SoRBoT's automated derivation of domain-specific optimizations combines the benefits of both solution reuse and bound tightening while avoiding the main pitfalls of each. We experimentally evaluate SoRBoT against state-of-the-art techniques for verifying evolving specifications, demonstrating that SoRBoT substantially exceeds the run-time performance of those state-of-the-art techniques while introducing only a negligible overhead, in contrast to the expensive additional computations required by the state-of-the-art verification techniques.

软件工程师长期以来一直使用形式验证来确保系统设计的安全性和有效性。当系统变更时——通常是通过可预测的、特定于领域的操作——它们的模型也必须变更，这就要求系统设计者对类似的系统模型重复执行相同的形式化验证。最先进的正式验证技术在规模上可能是昂贵的，其成本会被重复分析所增加。本文提出了一种新的分析技术——在一个名为SoRBoT的工具中实现——它可以自动确定特定于领域的优化，从而大大降低重复分析进化系统的成本。与之前的所有方法不同，这些方法要么专注于收紧分析边界，要么专注于重用全部或部分先前解决方案，SoRBoT的特定领域优化的自动派生结合了解决方案重用和绑定收紧的优点，同时避免了两者的主要缺陷。我们通过实验评估SoRBoT与最先进的技术来验证不断发展的规范，证明SoRBoT实质上超过了那些最先进的技术的运行时性能，同时只引入了可以忽略不计的开销，与最先进的验证技术所需的昂贵的额外计算形成对比。

{"title":"Combining solution reuse and bound tightening for efficient analysis of evolving systems","authors":"Clay Stevens, H. Bagheri","doi":"10.1145/3533767.3534399","DOIUrl":"https://doi.org/10.1145/3533767.3534399","url":null,"abstract":"Software engineers have long employed formal verification to ensure the safety and validity of their system designs. As the system changes---often via predictable, domain-specific operations---their models must also change, requiring system designers to repeatedly execute the same formal verification on similar system models. State-of-the-art formal verification techniques can be expensive at scale, the cost of which is multiplied by repeated analysis. This paper presents a novel analysis technique---implemented in a tool called SoRBoT---which can automatically determine domain-specific optimizations that can dramatically reduce the cost of repeatedly analyzing evolving systems. Different from all prior approaches, which focus on either tightening the bounds for analysis or reusing all or part of prior solutions, SoRBoT's automated derivation of domain-specific optimizations combines the benefits of both solution reuse and bound tightening while avoiding the main pitfalls of each. We experimentally evaluate SoRBoT against state-of-the-art techniques for verifying evolving specifications, demonstrating that SoRBoT substantially exceeds the run-time performance of those state-of-the-art techniques while introducing only a negligible overhead, in contrast to the expensive additional computations required by the state-of-the-art verification techniques.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129287894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Faster mutation analysis with MeMu 使用MeMu进行更快的突变分析

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3543288

Ali Ghanbari, Andrian Marcus

Mutation analysis is a program analysis method with applications in assessing the quality of test cases, fault localization, test input generation, security analysis, etc. The method involves repeated running of test suites against a large number of program mutants, often leading to poor scalability. A large body of research is aimed at accelerating mutation analysis via a variety of approaches such as, reducing the number of mutants, reducing the number of test cases to run, or reducing the execution time of individual mutants. This paper presents the implementation of a novel technique, named MeMu, for reducing mutant execution time, through memoizing the most expensive methods in the system. Memoization is a program optimization technique that allows bypassing the execution of expensive methods and reusing pre-calculated results, when repeated inputs are detected. MeMu can be used on its own or alongside existing mutation analysis acceleration techniques. The current implementation of MeMu achieves, on average, an 18.15% speed-up for PITest JVM-based mutation testing tool.

突变分析是一种应用于测试用例质量评估、故障定位、测试输入生成、安全性分析等方面的程序分析方法。该方法涉及针对大量程序突变重复运行测试套件，通常导致较差的可伸缩性。大量的研究旨在通过各种方法加速突变分析，例如减少突变的数量，减少要运行的测试用例的数量，或者减少单个突变的执行时间。本文介绍了一种名为MeMu的新技术的实现，通过记忆系统中最昂贵的方法来减少突变的执行时间。记忆是一种程序优化技术，当检测到重复输入时，它允许绕过昂贵方法的执行并重用预先计算的结果。MeMu可以单独使用，也可以与现有的突变分析加速技术一起使用。对于基于jvm的PITest突变测试工具，MeMu的当前实现平均实现了18.15%的加速。

引用次数: 1

Efficient greybox fuzzing of applications in Linux-based IoT devices via enhanced user-mode emulation 通过增强的用户模式仿真，对基于linux的物联网设备中的应用程序进行高效灰盒模糊测试

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534414

Yaowen Zheng, Yuekang Li, Cen Zhang, Hongsong Zhu, Yang Liu, Limin Sun

Greybox fuzzing has become one of the most effective vulnerability discovery techniques. However, greybox fuzzing techniques cannot be directly applied to applications in IoT devices. The main reason is that executing these applications highly relies on specific system environments and hardware. To execute the applications in Linux-based IoT devices, most existing fuzzing techniques use full-system emulation for the purpose of maximizing compatibility. However, compared with user-mode emulation, full-system emulation suffersfrom great overhead. Therefore, some previous works, such as Firm-AFL, propose to combine full-system emulation and user-mode emulation to speed up the fuzzing process. Despite the attempts of trying to shift the application towards user-mode emulation, no existing technique supports to execute these applications fully in the user-mode emulation. To address this issue, we propose EQUAFL, which can automatically set up the execution environment to execute embedded applications under user-mode emulation. EQUAFL first executes the application under full-system emulation and observe for the key points where the program may get stuck or even crash during user-mode emulation. With the observed information, EQUAFL can migrate the needed environment for user-mode emulation. Then, EQUAFL uses an enhanced user-mode emulation to replay system calls of network, and resource management behaviors to fulfill the needs of the embedded application during its execution. We evaluate EQUAFL on 70 network applications from different series of IoT devices. The result shows EQUAFL outperforms the state-of-the-arts in fuzzing efficiency (on average, 26 times faster than AFL-QEMU with full-system emulation, 14 times than Firm-AFL). We have also discovered ten vulnerabilities including six CVEs from the tested firmware images.

灰盒模糊已经成为最有效的漏洞发现技术之一。然而，灰盒模糊技术不能直接应用于物联网设备中的应用。主要原因是执行这些应用程序高度依赖于特定的系统环境和硬件。为了在基于linux的物联网设备中执行应用程序，大多数现有的模糊测试技术使用全系统仿真来最大化兼容性。但是，与用户模式仿真相比，全系统仿真的开销较大。因此，一些先前的工作，如Firm-AFL，提出将全系统仿真和用户模式仿真相结合，以加快模糊化过程。尽管尝试将应用程序转向用户模式模拟，但没有现有的技术支持在用户模式模拟中完全执行这些应用程序。为了解决这个问题，我们提出了EQUAFL，它可以自动设置执行环境，在用户模式仿真下执行嵌入式应用程序。EQUAFL首先在全系统模拟下执行应用程序，并观察程序在用户模式模拟期间可能卡住甚至崩溃的关键点。根据观察到的信息，EQUAFL可以迁移所需的环境进行用户模式仿真。然后，EQUAFL使用增强的用户模式仿真来重放网络系统调用和资源管理行为，以满足嵌入式应用程序在执行过程中的需求。我们在来自不同系列物联网设备的70个网络应用中评估了EQUAFL。结果表明，EQUAFL在模糊化效率方面优于最先进的技术(在全系统仿真下，平均比AFL-QEMU快26倍，比Firm-AFL快14倍)。我们还从测试的固件映像中发现了10个漏洞，包括6个cve。

{"title":"Efficient greybox fuzzing of applications in Linux-based IoT devices via enhanced user-mode emulation","authors":"Yaowen Zheng, Yuekang Li, Cen Zhang, Hongsong Zhu, Yang Liu, Limin Sun","doi":"10.1145/3533767.3534414","DOIUrl":"https://doi.org/10.1145/3533767.3534414","url":null,"abstract":"Greybox fuzzing has become one of the most effective vulnerability discovery techniques. However, greybox fuzzing techniques cannot be directly applied to applications in IoT devices. The main reason is that executing these applications highly relies on specific system environments and hardware. To execute the applications in Linux-based IoT devices, most existing fuzzing techniques use full-system emulation for the purpose of maximizing compatibility. However, compared with user-mode emulation, full-system emulation suffersfrom great overhead. Therefore, some previous works, such as Firm-AFL, propose to combine full-system emulation and user-mode emulation to speed up the fuzzing process. Despite the attempts of trying to shift the application towards user-mode emulation, no existing technique supports to execute these applications fully in the user-mode emulation. To address this issue, we propose EQUAFL, which can automatically set up the execution environment to execute embedded applications under user-mode emulation. EQUAFL first executes the application under full-system emulation and observe for the key points where the program may get stuck or even crash during user-mode emulation. With the observed information, EQUAFL can migrate the needed environment for user-mode emulation. Then, EQUAFL uses an enhanced user-mode emulation to replay system calls of network, and resource management behaviors to fulfill the needs of the embedded application during its execution. We evaluate EQUAFL on 70 network applications from different series of IoT devices. The result shows EQUAFL outperforms the state-of-the-arts in fuzzing efficiency (on average, 26 times faster than AFL-QEMU with full-system emulation, 14 times than Firm-AFL). We have also discovered ten vulnerabilities including six CVEs from the tested firmware images.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126109952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Deadlock prediction via generalized dependency 通过广义依赖预测死锁

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534377

Jinpeng Zhou, Hanmei Yang, J. Lange, Tongping Liu

Deadlocks are notorious bugs in multithreaded programs, causing serious reliability issues. However, they are difficult to be fully expunged before deployment, as their appearances typically depend on specific inputs and thread schedules, which require the assistance of dynamic tools. However, existing deadlock detection tools mainly focus on locks, but cannot detect deadlocks related to condition variables. This paper presents a novel approach to fill this gap. It extends the classic lock dependency to generalized dependency by abstracting the signal for the condition variable as a special resource so that communication deadlocks can be modeled as hold-and-wait cycles as well. It further designs multiple practical mechanisms to record and analyze generalized dependencies. In the end, this paper presents the implementation of the tool, called UnHang. Experimental results on real applications show that UnHang is able to find all known deadlocks and uncover two new deadlocks. Overall, UnHang only imposes around 3% performance overhead and 8% memory overhead, making it a practical tool for the deployment environment.

死锁是多线程程序中臭名昭著的bug，会导致严重的可靠性问题。然而，在部署之前很难完全清除它们，因为它们的外观通常取决于特定的输入和线程调度，这需要动态工具的帮助。然而，现有的死锁检测工具主要关注锁，而不能检测与条件变量相关的死锁。本文提出了一种新的方法来填补这一空白。它通过将条件变量的信号抽象为一种特殊资源，将经典的锁依赖关系扩展为广义依赖关系，这样通信死锁也可以建模为保持和等待周期。它进一步设计了多种实用机制来记录和分析广义依赖关系。最后，本文给出了该工具的具体实现。实际应用的实验结果表明，UnHang能够找到所有已知的死锁，并发现两个新的死锁。总的来说，UnHang只增加了大约3%的性能开销和8%的内存开销，使其成为部署环境的实用工具。

{"title":"Deadlock prediction via generalized dependency","authors":"Jinpeng Zhou, Hanmei Yang, J. Lange, Tongping Liu","doi":"10.1145/3533767.3534377","DOIUrl":"https://doi.org/10.1145/3533767.3534377","url":null,"abstract":"Deadlocks are notorious bugs in multithreaded programs, causing serious reliability issues. However, they are difficult to be fully expunged before deployment, as their appearances typically depend on specific inputs and thread schedules, which require the assistance of dynamic tools. However, existing deadlock detection tools mainly focus on locks, but cannot detect deadlocks related to condition variables. This paper presents a novel approach to fill this gap. It extends the classic lock dependency to generalized dependency by abstracting the signal for the condition variable as a special resource so that communication deadlocks can be modeled as hold-and-wait cycles as well. It further designs multiple practical mechanisms to record and analyze generalized dependencies. In the end, this paper presents the implementation of the tool, called UnHang. Experimental results on real applications show that UnHang is able to find all known deadlocks and uncover two new deadlocks. Overall, UnHang only imposes around 3% performance overhead and 8% memory overhead, making it a practical tool for the deployment environment.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125243578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Evolution-aware detection of order-dependent flaky tests 顺序相关片状测试的进化感知检测

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534404

Chengpeng Li, A. Shi

Regression testing is an important part of the software development process but suffers from the presence of flaky tests. Flaky tests are tests that can nondeterministically pass or fail regardless of code changes. Order-dependent flaky tests are a prominent kind of flaky tests whose outcome depends on the test order in which they are run. Prior work has focused on detecting order-dependent flaky tests through rerunning all tests in different test orders on a single version of code. As code is constantly changing, rerunning all tests in different test orders after every change is costly. In this work, we propose IncIDFlakies, a technique to detect order-dependent flaky tests by analyzing code changes to detect newly-introduced order-dependent flaky tests due to those changes. Building upon existing work in iDFlakies that reruns tests in dif- ferent test orders, IncIDFlakies analyzes and selects to run only the tests that (1) are affected by the change, and (2) can potentially result in a test-order dependency among each other due to potential shared state. Running IncIDFlakies on 67 order-dependent flaky tests across changes in code in their respective projects, including the changes where they became flaky, we find that IncIDFlakies can select to run on average 65.4% of all the tests, resulting in running 68.4% of the time that baseline iDFlakies would use when running the same number of test orders with the full test suite. Furthermore, we find that IncIDFlakies can still ensure that the test orders it runs can potentially detect the order-dependent flaky tests.

回归测试是软件开发过程的重要组成部分，但存在一些不稳定的测试。不稳定的测试是指无论代码是否更改，都无法确定地通过或失败的测试。顺序相关的片状测试是一类突出的片状测试，其结果取决于运行它们的测试顺序。以前的工作重点是通过在单个版本的代码上以不同的测试顺序重新运行所有测试来检测顺序相关的片状测试。由于代码不断更改，在每次更改之后以不同的测试顺序重新运行所有测试的成本很高。在这项工作中，我们提出了IncIDFlakies，这是一种通过分析代码更改来检测顺序相关片状测试的技术，以检测由于这些更改而引入的新顺序相关片状测试。基于iDFlakies中以不同测试顺序重新运行测试的现有工作，IncIDFlakies分析并选择仅运行以下测试:(1)受更改影响，以及(2)由于潜在的共享状态而可能导致彼此之间的测试顺序依赖关系。在67个顺序相关的片状测试上运行IncIDFlakies，这些测试跨越各自项目中的代码变更，包括它们变得片状的变更，我们发现IncIDFlakies可以选择平均运行65.4%的所有测试，从而在使用完整测试套件运行相同数量的测试订单时，运行基准iDFlakies将使用的68.4%的时间。此外，我们发现IncIDFlakies仍然可以确保它运行的测试顺序能够潜在地检测到依赖于顺序的片状测试。

{"title":"Evolution-aware detection of order-dependent flaky tests","authors":"Chengpeng Li, A. Shi","doi":"10.1145/3533767.3534404","DOIUrl":"https://doi.org/10.1145/3533767.3534404","url":null,"abstract":"Regression testing is an important part of the software development process but suffers from the presence of flaky tests. Flaky tests are tests that can nondeterministically pass or fail regardless of code changes. Order-dependent flaky tests are a prominent kind of flaky tests whose outcome depends on the test order in which they are run. Prior work has focused on detecting order-dependent flaky tests through rerunning all tests in different test orders on a single version of code. As code is constantly changing, rerunning all tests in different test orders after every change is costly. In this work, we propose IncIDFlakies, a technique to detect order-dependent flaky tests by analyzing code changes to detect newly-introduced order-dependent flaky tests due to those changes. Building upon existing work in iDFlakies that reruns tests in dif- ferent test orders, IncIDFlakies analyzes and selects to run only the tests that (1) are affected by the change, and (2) can potentially result in a test-order dependency among each other due to potential shared state. Running IncIDFlakies on 67 order-dependent flaky tests across changes in code in their respective projects, including the changes where they became flaky, we find that IncIDFlakies can select to run on average 65.4% of all the tests, resulting in running 68.4% of the time that baseline iDFlakies would use when running the same number of test orders with the full test suite. Furthermore, we find that IncIDFlakies can still ensure that the test orders it runs can potentially detect the order-dependent flaky tests.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126471556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

ASRTest: automated testing for deep-neural-network-driven speech recognition systems ASRTest:深度神经网络驱动语音识别系统的自动化测试

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534391

Pin Ji, Yang Feng, Jia Liu, Zhihong Zhao, Zhenyu Chen

With the rapid development of deep neural networks and end-to-end learning techniques, automatic speech recognition (ASR) systems have been deployed into our daily and assist in various tasks. However, despite their tremendous progress, ASR systems could also suffer from software defects and exhibit incorrect behaviors. While the nature of DNN makes conventional software testing techniques inapplicable for ASR systems, lacking diverse tests and oracle information further hinders their testing. In this paper, we propose and implement a testing approach, namely ASR, specifically for the DNN-driven ASR systems. ASRTest is built upon the theory of metamorphic testing. We first design the metamorphic relation for ASR systems and then implement three families of transformation operators that can simulate practical application scenarios to generate speeches. Furthermore, we adopt Gini impurity to guide the generation process and improve the testing efficiency. To validate the effectiveness of ASRTest, we apply ASRTest to four ASR models with four widely-used datasets. The results show that ASRTest can detect erroneous behaviors under different realistic application conditions efficiently and improve 19.1% recognition performance on average via retraining with the generated data. Also, we conduct a case study on an industrial ASR system to investigate the performance of ASRTest under the real usage scenario. The study shows that ASRTest can detect errors and improve the performance of DNN-driven ASR systems effectively.

随着深度神经网络和端到端学习技术的快速发展，自动语音识别(ASR)系统已经部署到我们的日常生活中，并协助完成各种任务。然而，尽管他们取得了巨大的进步，ASR系统也可能遭受软件缺陷和表现出不正确的行为。虽然深度神经网络的性质使得传统的软件测试技术不适用于ASR系统，但缺乏多样化的测试和oracle信息进一步阻碍了他们的测试。在本文中，我们提出并实现了一种测试方法，即ASR，专门针对dnn驱动的ASR系统。ASRTest是建立在变质试验理论的基础上的。我们首先设计了ASR系统的变形关系，然后实现了三种变换算子族，可以模拟实际应用场景来生成语音。此外，我们采用基尼杂质来指导生成过程，提高检测效率。为了验证ASRTest的有效性，我们将ASRTest应用于四个ASR模型和四个广泛使用的数据集。结果表明，通过对生成的数据进行再训练，ASRTest可以有效地检测出不同实际应用条件下的错误行为，平均提高19.1%的识别性能。此外，我们还对一个工业ASR系统进行了案例研究，以考察ASRTest在实际使用场景下的性能。研究表明，ASRTest可以有效地检测错误，提高dnn驱动ASR系统的性能。

{"title":"ASRTest: automated testing for deep-neural-network-driven speech recognition systems","authors":"Pin Ji, Yang Feng, Jia Liu, Zhihong Zhao, Zhenyu Chen","doi":"10.1145/3533767.3534391","DOIUrl":"https://doi.org/10.1145/3533767.3534391","url":null,"abstract":"With the rapid development of deep neural networks and end-to-end learning techniques, automatic speech recognition (ASR) systems have been deployed into our daily and assist in various tasks. However, despite their tremendous progress, ASR systems could also suffer from software defects and exhibit incorrect behaviors. While the nature of DNN makes conventional software testing techniques inapplicable for ASR systems, lacking diverse tests and oracle information further hinders their testing. In this paper, we propose and implement a testing approach, namely ASR, specifically for the DNN-driven ASR systems. ASRTest is built upon the theory of metamorphic testing. We first design the metamorphic relation for ASR systems and then implement three families of transformation operators that can simulate practical application scenarios to generate speeches. Furthermore, we adopt Gini impurity to guide the generation process and improve the testing efficiency. To validate the effectiveness of ASRTest, we apply ASRTest to four ASR models with four widely-used datasets. The results show that ASRTest can detect erroneous behaviors under different realistic application conditions efficiently and improve 19.1% recognition performance on average via retraining with the generated data. Also, we conduct a case study on an industrial ASR system to investigate the performance of ASRTest under the real usage scenario. The study shows that ASRTest can detect errors and improve the performance of DNN-driven ASR systems effectively.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133950570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Detecting and fixing data loss issues in Android apps 检测和修复Android应用程序中的数据丢失问题

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534402

Wunan Guo, Zhen Dong, Liwei Shen, Wei Tian, Ting Su, Xin Peng

Android apps are event-driven, and their execution is often interrupted by external events. This interruption can cause data loss issues that annoy users. For instance, when the screen is rotated, the current app page will be destroyed and recreated. If the app state is improperly preserved, user data will be lost. In this work, we present an approach and tool iFixDataloss that automatically detects and fixes data loss issues in Android apps. To achieve this, we identify scenarios in which data loss issues may occur, develop strategies to reveal data loss issues, and design patch templates to fix them. Our experiments on 66 Android apps show iFixDataloss detected 374 data loss issues (284 of them were previously unknown) and successfully generated patches for 188 of the 374 issues. Out of 20 submitted patches, 16 have been accepted by developers. In comparison with state-of-the-art techniques, iFixDataloss performed significantly better in terms of the number of detected data loss issues and the quality of generated patches.

Android应用程序是事件驱动的，它们的执行经常被外部事件打断。这种中断可能会导致数据丢失问题，从而惹恼用户。例如，当屏幕旋转时，当前的应用页面将被销毁并重新创建。如果app状态保存不当，将会导致用户数据丢失。在这项工作中，我们提出了一种方法和工具iFixDataloss，可以自动检测和修复Android应用程序中的数据丢失问题。为了实现这一点，我们确定了可能发生数据丢失问题的场景，制定了揭示数据丢失问题的策略，并设计了补丁模板来修复它们。我们在66个Android应用上的实验显示，iFixDataloss检测到374个数据丢失问题(其中284个以前是未知的)，并成功为374个问题中的188个生成了补丁。在提交的20个补丁中，有16个已被开发人员接受。与最先进的技术相比，iFixDataloss在检测到的数据丢失问题的数量和生成的补丁的质量方面表现得更好。

引用次数: 7