Smart contracts deployed on permissionless blockchains, such as Ethereum, are accessible to any user in a trustless environment. Therefore, most smart contract applications implement access control policies to protect their valuable assets from unauthorized accesses. A difficulty in validating the conformance to such policies, i.e., whether the contract implementation adheres to the expected behaviors, is the lack of policy specifications. In this paper, we mine past transactions of a contract to recover a likely access control model, which can then be checked against various information flow policies and identify potential bugs related to user permissions. We implement our role mining and security policy validation in tool SPCon. The experimental evaluation on labeled smart contract role mining benchmark demonstrates that SPCon effectively mines more accurate user roles compared to the state-of-the-art role mining tools. Moreover, the experimental evaluation on real-world smart contract benchmark and access control CVEs indicates SPCon effectively detects potential permission bugs while having better scalability and lower false-positive rate compared to the state-of-the-art security tools, finding 11 previously unknown bugs and detecting six CVEs that no other tool can find.
{"title":"Finding permission bugs in smart contracts with role mining","authors":"Ye Liu, Yi Li, Shang-Wei Lin, Cyrille Artho","doi":"10.1145/3533767.3534372","DOIUrl":"https://doi.org/10.1145/3533767.3534372","url":null,"abstract":"Smart contracts deployed on permissionless blockchains, such as Ethereum, are accessible to any user in a trustless environment. Therefore, most smart contract applications implement access control policies to protect their valuable assets from unauthorized accesses. A difficulty in validating the conformance to such policies, i.e., whether the contract implementation adheres to the expected behaviors, is the lack of policy specifications. In this paper, we mine past transactions of a contract to recover a likely access control model, which can then be checked against various information flow policies and identify potential bugs related to user permissions. We implement our role mining and security policy validation in tool SPCon. The experimental evaluation on labeled smart contract role mining benchmark demonstrates that SPCon effectively mines more accurate user roles compared to the state-of-the-art role mining tools. Moreover, the experimental evaluation on real-world smart contract benchmark and access control CVEs indicates SPCon effectively detects potential permission bugs while having better scalability and lower false-positive rate compared to the state-of-the-art security tools, finding 11 previously unknown bugs and detecting six CVEs that no other tool can find.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122321363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Irfan, Sorawee Porncharoenwase, Zvonimir Rakamaric, Neha Rungta, E. Torlak
Verification toolchains are widely used to prove the correctness of critical software systems. To build confidence in their results, it is important to develop testing frameworks that help detect bugs in these toolchains. Inspired by the success of fuzzing in finding bugs in compilers and SMT solvers, we have built the first fuzzing and differential testing framework for Dafny, a high-level programming language with a Floyd-Hoare-style program verifier and compilers to C#, Java, Go, and Javascript. This paper presents our experience building and using XDsmith, a testing framework that targets the entire Dafny toolchain, from verification to compilation. XDsmith randomly generates annotated programs in a subset of Dafny that is free of loops and heap-mutating operations. The generated programs include preconditions, postconditions, and assertions, and they have a known verification outcome. These programs are used to test the soundness and precision of the Dafny verifier, and to perform differential testing on the four Dafny compilers. Using XDsmith, we uncovered 31 bugs across the Dafny verifier and compilers, each of which has been confirmed by the Dafny developers. Moreover, 8 of these bugs have been fixed in the mainline release of Dafny.
{"title":"Testing Dafny (experience paper)","authors":"A. Irfan, Sorawee Porncharoenwase, Zvonimir Rakamaric, Neha Rungta, E. Torlak","doi":"10.1145/3533767.3534382","DOIUrl":"https://doi.org/10.1145/3533767.3534382","url":null,"abstract":"Verification toolchains are widely used to prove the correctness of critical software systems. To build confidence in their results, it is important to develop testing frameworks that help detect bugs in these toolchains. Inspired by the success of fuzzing in finding bugs in compilers and SMT solvers, we have built the first fuzzing and differential testing framework for Dafny, a high-level programming language with a Floyd-Hoare-style program verifier and compilers to C#, Java, Go, and Javascript. This paper presents our experience building and using XDsmith, a testing framework that targets the entire Dafny toolchain, from verification to compilation. XDsmith randomly generates annotated programs in a subset of Dafny that is free of loops and heap-mutating operations. The generated programs include preconditions, postconditions, and assertions, and they have a known verification outcome. These programs are used to test the soundness and precision of the Dafny verifier, and to perform differential testing on the four Dafny compilers. Using XDsmith, we uncovered 31 bugs across the Dafny verifier and compilers, each of which has been confirmed by the Dafny developers. Moreover, 8 of these bugs have been fixed in the mainline release of Dafny.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129017462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiyong Wu, Jie Liang, Mingzhe Wang, Chijin Zhou, Yu Jiang
The ubiquitous use of time-series databases in the safety-critical Internet of Things domain demands strict security and correctness. One successful approach in database bug detection is fuzzing, where hundreds of bugs have been detected automatically in relational databases. However, it cannot be easily applied to time-series databases: the bulk of time-series logic is unreachable because of mismatched query specifications, and serious bugs are undetectable because of implicitly handled exceptions. In this paper, we propose Unicorn to secure time-series databases with automated fuzzing. First, we design hybrid input synthesis to generate high-quality queries which not only cover time-series features but also ensure grammar correctness. Then, Unicorn uses proactive exception detection to discover minuscule-symptom bugs which hide behind implicit exception handling. With the specialized design oriented to time-series databases, Unicorn outperforms the state-of-the-art database fuzzers in terms of coverage and bugs. Specifically, Unicorn outperforms SQLsmith and SQLancer on widely used time-series databases IoTDB, KairosDB, TimescaleDB, TDEngine, QuestDB, and GridDB in the number of basic blocks by 21%-199% and 34%-693%, respectively. More importantly, Unicorn has discovered 42 previously unknown bugs.
{"title":"Unicorn: detect runtime errors in time-series databases with hybrid input synthesis","authors":"Zhiyong Wu, Jie Liang, Mingzhe Wang, Chijin Zhou, Yu Jiang","doi":"10.1145/3533767.3534364","DOIUrl":"https://doi.org/10.1145/3533767.3534364","url":null,"abstract":"The ubiquitous use of time-series databases in the safety-critical Internet of Things domain demands strict security and correctness. One successful approach in database bug detection is fuzzing, where hundreds of bugs have been detected automatically in relational databases. However, it cannot be easily applied to time-series databases: the bulk of time-series logic is unreachable because of mismatched query specifications, and serious bugs are undetectable because of implicitly handled exceptions. In this paper, we propose Unicorn to secure time-series databases with automated fuzzing. First, we design hybrid input synthesis to generate high-quality queries which not only cover time-series features but also ensure grammar correctness. Then, Unicorn uses proactive exception detection to discover minuscule-symptom bugs which hide behind implicit exception handling. With the specialized design oriented to time-series databases, Unicorn outperforms the state-of-the-art database fuzzers in terms of coverage and bugs. Specifically, Unicorn outperforms SQLsmith and SQLancer on widely used time-series databases IoTDB, KairosDB, TimescaleDB, TDEngine, QuestDB, and GridDB in the number of basic blocks by 21%-199% and 34%-693%, respectively. More importantly, Unicorn has discovered 42 previously unknown bugs.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133944109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software engineers have long employed formal verification to ensure the safety and validity of their system designs. As the system changes---often via predictable, domain-specific operations---their models must also change, requiring system designers to repeatedly execute the same formal verification on similar system models. State-of-the-art formal verification techniques can be expensive at scale, the cost of which is multiplied by repeated analysis. This paper presents a novel analysis technique---implemented in a tool called SoRBoT---which can automatically determine domain-specific optimizations that can dramatically reduce the cost of repeatedly analyzing evolving systems. Different from all prior approaches, which focus on either tightening the bounds for analysis or reusing all or part of prior solutions, SoRBoT's automated derivation of domain-specific optimizations combines the benefits of both solution reuse and bound tightening while avoiding the main pitfalls of each. We experimentally evaluate SoRBoT against state-of-the-art techniques for verifying evolving specifications, demonstrating that SoRBoT substantially exceeds the run-time performance of those state-of-the-art techniques while introducing only a negligible overhead, in contrast to the expensive additional computations required by the state-of-the-art verification techniques.
{"title":"Combining solution reuse and bound tightening for efficient analysis of evolving systems","authors":"Clay Stevens, H. Bagheri","doi":"10.1145/3533767.3534399","DOIUrl":"https://doi.org/10.1145/3533767.3534399","url":null,"abstract":"Software engineers have long employed formal verification to ensure the safety and validity of their system designs. As the system changes---often via predictable, domain-specific operations---their models must also change, requiring system designers to repeatedly execute the same formal verification on similar system models. State-of-the-art formal verification techniques can be expensive at scale, the cost of which is multiplied by repeated analysis. This paper presents a novel analysis technique---implemented in a tool called SoRBoT---which can automatically determine domain-specific optimizations that can dramatically reduce the cost of repeatedly analyzing evolving systems. Different from all prior approaches, which focus on either tightening the bounds for analysis or reusing all or part of prior solutions, SoRBoT's automated derivation of domain-specific optimizations combines the benefits of both solution reuse and bound tightening while avoiding the main pitfalls of each. We experimentally evaluate SoRBoT against state-of-the-art techniques for verifying evolving specifications, demonstrating that SoRBoT substantially exceeds the run-time performance of those state-of-the-art techniques while introducing only a negligible overhead, in contrast to the expensive additional computations required by the state-of-the-art verification techniques.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129287894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mutation analysis is a program analysis method with applications in assessing the quality of test cases, fault localization, test input generation, security analysis, etc. The method involves repeated running of test suites against a large number of program mutants, often leading to poor scalability. A large body of research is aimed at accelerating mutation analysis via a variety of approaches such as, reducing the number of mutants, reducing the number of test cases to run, or reducing the execution time of individual mutants. This paper presents the implementation of a novel technique, named MeMu, for reducing mutant execution time, through memoizing the most expensive methods in the system. Memoization is a program optimization technique that allows bypassing the execution of expensive methods and reusing pre-calculated results, when repeated inputs are detected. MeMu can be used on its own or alongside existing mutation analysis acceleration techniques. The current implementation of MeMu achieves, on average, an 18.15% speed-up for PITest JVM-based mutation testing tool.
{"title":"Faster mutation analysis with MeMu","authors":"Ali Ghanbari, Andrian Marcus","doi":"10.1145/3533767.3543288","DOIUrl":"https://doi.org/10.1145/3533767.3543288","url":null,"abstract":"Mutation analysis is a program analysis method with applications in assessing the quality of test cases, fault localization, test input generation, security analysis, etc. The method involves repeated running of test suites against a large number of program mutants, often leading to poor scalability. A large body of research is aimed at accelerating mutation analysis via a variety of approaches such as, reducing the number of mutants, reducing the number of test cases to run, or reducing the execution time of individual mutants. This paper presents the implementation of a novel technique, named MeMu, for reducing mutant execution time, through memoizing the most expensive methods in the system. Memoization is a program optimization technique that allows bypassing the execution of expensive methods and reusing pre-calculated results, when repeated inputs are detected. MeMu can be used on its own or alongside existing mutation analysis acceleration techniques. The current implementation of MeMu achieves, on average, an 18.15% speed-up for PITest JVM-based mutation testing tool.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114780910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaowen Zheng, Yuekang Li, Cen Zhang, Hongsong Zhu, Yang Liu, Limin Sun
Greybox fuzzing has become one of the most effective vulnerability discovery techniques. However, greybox fuzzing techniques cannot be directly applied to applications in IoT devices. The main reason is that executing these applications highly relies on specific system environments and hardware. To execute the applications in Linux-based IoT devices, most existing fuzzing techniques use full-system emulation for the purpose of maximizing compatibility. However, compared with user-mode emulation, full-system emulation suffersfrom great overhead. Therefore, some previous works, such as Firm-AFL, propose to combine full-system emulation and user-mode emulation to speed up the fuzzing process. Despite the attempts of trying to shift the application towards user-mode emulation, no existing technique supports to execute these applications fully in the user-mode emulation. To address this issue, we propose EQUAFL, which can automatically set up the execution environment to execute embedded applications under user-mode emulation. EQUAFL first executes the application under full-system emulation and observe for the key points where the program may get stuck or even crash during user-mode emulation. With the observed information, EQUAFL can migrate the needed environment for user-mode emulation. Then, EQUAFL uses an enhanced user-mode emulation to replay system calls of network, and resource management behaviors to fulfill the needs of the embedded application during its execution. We evaluate EQUAFL on 70 network applications from different series of IoT devices. The result shows EQUAFL outperforms the state-of-the-arts in fuzzing efficiency (on average, 26 times faster than AFL-QEMU with full-system emulation, 14 times than Firm-AFL). We have also discovered ten vulnerabilities including six CVEs from the tested firmware images.
{"title":"Efficient greybox fuzzing of applications in Linux-based IoT devices via enhanced user-mode emulation","authors":"Yaowen Zheng, Yuekang Li, Cen Zhang, Hongsong Zhu, Yang Liu, Limin Sun","doi":"10.1145/3533767.3534414","DOIUrl":"https://doi.org/10.1145/3533767.3534414","url":null,"abstract":"Greybox fuzzing has become one of the most effective vulnerability discovery techniques. However, greybox fuzzing techniques cannot be directly applied to applications in IoT devices. The main reason is that executing these applications highly relies on specific system environments and hardware. To execute the applications in Linux-based IoT devices, most existing fuzzing techniques use full-system emulation for the purpose of maximizing compatibility. However, compared with user-mode emulation, full-system emulation suffersfrom great overhead. Therefore, some previous works, such as Firm-AFL, propose to combine full-system emulation and user-mode emulation to speed up the fuzzing process. Despite the attempts of trying to shift the application towards user-mode emulation, no existing technique supports to execute these applications fully in the user-mode emulation. To address this issue, we propose EQUAFL, which can automatically set up the execution environment to execute embedded applications under user-mode emulation. EQUAFL first executes the application under full-system emulation and observe for the key points where the program may get stuck or even crash during user-mode emulation. With the observed information, EQUAFL can migrate the needed environment for user-mode emulation. Then, EQUAFL uses an enhanced user-mode emulation to replay system calls of network, and resource management behaviors to fulfill the needs of the embedded application during its execution. We evaluate EQUAFL on 70 network applications from different series of IoT devices. The result shows EQUAFL outperforms the state-of-the-arts in fuzzing efficiency (on average, 26 times faster than AFL-QEMU with full-system emulation, 14 times than Firm-AFL). We have also discovered ten vulnerabilities including six CVEs from the tested firmware images.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126109952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deadlocks are notorious bugs in multithreaded programs, causing serious reliability issues. However, they are difficult to be fully expunged before deployment, as their appearances typically depend on specific inputs and thread schedules, which require the assistance of dynamic tools. However, existing deadlock detection tools mainly focus on locks, but cannot detect deadlocks related to condition variables. This paper presents a novel approach to fill this gap. It extends the classic lock dependency to generalized dependency by abstracting the signal for the condition variable as a special resource so that communication deadlocks can be modeled as hold-and-wait cycles as well. It further designs multiple practical mechanisms to record and analyze generalized dependencies. In the end, this paper presents the implementation of the tool, called UnHang. Experimental results on real applications show that UnHang is able to find all known deadlocks and uncover two new deadlocks. Overall, UnHang only imposes around 3% performance overhead and 8% memory overhead, making it a practical tool for the deployment environment.
{"title":"Deadlock prediction via generalized dependency","authors":"Jinpeng Zhou, Hanmei Yang, J. Lange, Tongping Liu","doi":"10.1145/3533767.3534377","DOIUrl":"https://doi.org/10.1145/3533767.3534377","url":null,"abstract":"Deadlocks are notorious bugs in multithreaded programs, causing serious reliability issues. However, they are difficult to be fully expunged before deployment, as their appearances typically depend on specific inputs and thread schedules, which require the assistance of dynamic tools. However, existing deadlock detection tools mainly focus on locks, but cannot detect deadlocks related to condition variables. This paper presents a novel approach to fill this gap. It extends the classic lock dependency to generalized dependency by abstracting the signal for the condition variable as a special resource so that communication deadlocks can be modeled as hold-and-wait cycles as well. It further designs multiple practical mechanisms to record and analyze generalized dependencies. In the end, this paper presents the implementation of the tool, called UnHang. Experimental results on real applications show that UnHang is able to find all known deadlocks and uncover two new deadlocks. Overall, UnHang only imposes around 3% performance overhead and 8% memory overhead, making it a practical tool for the deployment environment.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125243578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Regression testing is an important part of the software development process but suffers from the presence of flaky tests. Flaky tests are tests that can nondeterministically pass or fail regardless of code changes. Order-dependent flaky tests are a prominent kind of flaky tests whose outcome depends on the test order in which they are run. Prior work has focused on detecting order-dependent flaky tests through rerunning all tests in different test orders on a single version of code. As code is constantly changing, rerunning all tests in different test orders after every change is costly. In this work, we propose IncIDFlakies, a technique to detect order-dependent flaky tests by analyzing code changes to detect newly-introduced order-dependent flaky tests due to those changes. Building upon existing work in iDFlakies that reruns tests in dif- ferent test orders, IncIDFlakies analyzes and selects to run only the tests that (1) are affected by the change, and (2) can potentially result in a test-order dependency among each other due to potential shared state. Running IncIDFlakies on 67 order-dependent flaky tests across changes in code in their respective projects, including the changes where they became flaky, we find that IncIDFlakies can select to run on average 65.4% of all the tests, resulting in running 68.4% of the time that baseline iDFlakies would use when running the same number of test orders with the full test suite. Furthermore, we find that IncIDFlakies can still ensure that the test orders it runs can potentially detect the order-dependent flaky tests.
{"title":"Evolution-aware detection of order-dependent flaky tests","authors":"Chengpeng Li, A. Shi","doi":"10.1145/3533767.3534404","DOIUrl":"https://doi.org/10.1145/3533767.3534404","url":null,"abstract":"Regression testing is an important part of the software development process but suffers from the presence of flaky tests. Flaky tests are tests that can nondeterministically pass or fail regardless of code changes. Order-dependent flaky tests are a prominent kind of flaky tests whose outcome depends on the test order in which they are run. Prior work has focused on detecting order-dependent flaky tests through rerunning all tests in different test orders on a single version of code. As code is constantly changing, rerunning all tests in different test orders after every change is costly. In this work, we propose IncIDFlakies, a technique to detect order-dependent flaky tests by analyzing code changes to detect newly-introduced order-dependent flaky tests due to those changes. Building upon existing work in iDFlakies that reruns tests in dif- ferent test orders, IncIDFlakies analyzes and selects to run only the tests that (1) are affected by the change, and (2) can potentially result in a test-order dependency among each other due to potential shared state. Running IncIDFlakies on 67 order-dependent flaky tests across changes in code in their respective projects, including the changes where they became flaky, we find that IncIDFlakies can select to run on average 65.4% of all the tests, resulting in running 68.4% of the time that baseline iDFlakies would use when running the same number of test orders with the full test suite. Furthermore, we find that IncIDFlakies can still ensure that the test orders it runs can potentially detect the order-dependent flaky tests.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126471556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pin Ji, Yang Feng, Jia Liu, Zhihong Zhao, Zhenyu Chen
With the rapid development of deep neural networks and end-to-end learning techniques, automatic speech recognition (ASR) systems have been deployed into our daily and assist in various tasks. However, despite their tremendous progress, ASR systems could also suffer from software defects and exhibit incorrect behaviors. While the nature of DNN makes conventional software testing techniques inapplicable for ASR systems, lacking diverse tests and oracle information further hinders their testing. In this paper, we propose and implement a testing approach, namely ASR, specifically for the DNN-driven ASR systems. ASRTest is built upon the theory of metamorphic testing. We first design the metamorphic relation for ASR systems and then implement three families of transformation operators that can simulate practical application scenarios to generate speeches. Furthermore, we adopt Gini impurity to guide the generation process and improve the testing efficiency. To validate the effectiveness of ASRTest, we apply ASRTest to four ASR models with four widely-used datasets. The results show that ASRTest can detect erroneous behaviors under different realistic application conditions efficiently and improve 19.1% recognition performance on average via retraining with the generated data. Also, we conduct a case study on an industrial ASR system to investigate the performance of ASRTest under the real usage scenario. The study shows that ASRTest can detect errors and improve the performance of DNN-driven ASR systems effectively.
{"title":"ASRTest: automated testing for deep-neural-network-driven speech recognition systems","authors":"Pin Ji, Yang Feng, Jia Liu, Zhihong Zhao, Zhenyu Chen","doi":"10.1145/3533767.3534391","DOIUrl":"https://doi.org/10.1145/3533767.3534391","url":null,"abstract":"With the rapid development of deep neural networks and end-to-end learning techniques, automatic speech recognition (ASR) systems have been deployed into our daily and assist in various tasks. However, despite their tremendous progress, ASR systems could also suffer from software defects and exhibit incorrect behaviors. While the nature of DNN makes conventional software testing techniques inapplicable for ASR systems, lacking diverse tests and oracle information further hinders their testing. In this paper, we propose and implement a testing approach, namely ASR, specifically for the DNN-driven ASR systems. ASRTest is built upon the theory of metamorphic testing. We first design the metamorphic relation for ASR systems and then implement three families of transformation operators that can simulate practical application scenarios to generate speeches. Furthermore, we adopt Gini impurity to guide the generation process and improve the testing efficiency. To validate the effectiveness of ASRTest, we apply ASRTest to four ASR models with four widely-used datasets. The results show that ASRTest can detect erroneous behaviors under different realistic application conditions efficiently and improve 19.1% recognition performance on average via retraining with the generated data. Also, we conduct a case study on an industrial ASR system to investigate the performance of ASRTest under the real usage scenario. The study shows that ASRTest can detect errors and improve the performance of DNN-driven ASR systems effectively.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133950570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Android apps are event-driven, and their execution is often interrupted by external events. This interruption can cause data loss issues that annoy users. For instance, when the screen is rotated, the current app page will be destroyed and recreated. If the app state is improperly preserved, user data will be lost. In this work, we present an approach and tool iFixDataloss that automatically detects and fixes data loss issues in Android apps. To achieve this, we identify scenarios in which data loss issues may occur, develop strategies to reveal data loss issues, and design patch templates to fix them. Our experiments on 66 Android apps show iFixDataloss detected 374 data loss issues (284 of them were previously unknown) and successfully generated patches for 188 of the 374 issues. Out of 20 submitted patches, 16 have been accepted by developers. In comparison with state-of-the-art techniques, iFixDataloss performed significantly better in terms of the number of detected data loss issues and the quality of generated patches.
{"title":"Detecting and fixing data loss issues in Android apps","authors":"Wunan Guo, Zhen Dong, Liwei Shen, Wei Tian, Ting Su, Xin Peng","doi":"10.1145/3533767.3534402","DOIUrl":"https://doi.org/10.1145/3533767.3534402","url":null,"abstract":"Android apps are event-driven, and their execution is often interrupted by external events. This interruption can cause data loss issues that annoy users. For instance, when the screen is rotated, the current app page will be destroyed and recreated. If the app state is improperly preserved, user data will be lost. In this work, we present an approach and tool iFixDataloss that automatically detects and fixes data loss issues in Android apps. To achieve this, we identify scenarios in which data loss issues may occur, develop strategies to reveal data loss issues, and design patch templates to fix them. Our experiments on 66 Android apps show iFixDataloss detected 374 data loss issues (284 of them were previously unknown) and successfully generated patches for 188 of the 374 issues. Out of 20 submitted patches, 16 have been accepted by developers. In comparison with state-of-the-art techniques, iFixDataloss performed significantly better in terms of the number of detected data loss issues and the quality of generated patches.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132754445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}