Weiyu Pan, Zhenbang Chen, Guofeng Zhang, Yunlai Luo, Yufeng Zhang, Ji Wang
Parsing code exists extensively in software. Symbolic execution of complex parsing programs is challenging. The inputs generated by the symbolic execution using the byte-level symbolization are usually rejected by the parsing program, which dooms the effectiveness and efficiency of symbolic execution. Complex parsing programs usually adopt token-based input grammar checking. A token sequence represents one case of the input grammar. Based on this observation, we propose grammar-agnostic symbolic execution that can automatically generate token sequences to test complex parsing programs effectively and efficiently. Our method's key idea is to symbolize tokens instead of input bytes to improve the efficiency of symbolic execution. Technically, we propose a novel two-stage algorithm: the first stage collects the byte-level constraints of token values; the second stage employs token symbolization and the constraints collected in the first stage to generate the program inputs that are more possible to pass the parsing code. We have implemented our method on a Java Pathfinder (JPF) based concolic execution engine. The results of the extensive experiments on real-world Java parsing programs demonstrate the effectiveness and efficiency in testing complex parsing programs. Our method detects 6 unknown bugs in the benchmark programs and achieves orders of magnitude speedup to find the same bugs.
{"title":"Grammar-agnostic symbolic execution by token symbolization","authors":"Weiyu Pan, Zhenbang Chen, Guofeng Zhang, Yunlai Luo, Yufeng Zhang, Ji Wang","doi":"10.1145/3460319.3464845","DOIUrl":"https://doi.org/10.1145/3460319.3464845","url":null,"abstract":"Parsing code exists extensively in software. Symbolic execution of complex parsing programs is challenging. The inputs generated by the symbolic execution using the byte-level symbolization are usually rejected by the parsing program, which dooms the effectiveness and efficiency of symbolic execution. Complex parsing programs usually adopt token-based input grammar checking. A token sequence represents one case of the input grammar. Based on this observation, we propose grammar-agnostic symbolic execution that can automatically generate token sequences to test complex parsing programs effectively and efficiently. Our method's key idea is to symbolize tokens instead of input bytes to improve the efficiency of symbolic execution. Technically, we propose a novel two-stage algorithm: the first stage collects the byte-level constraints of token values; the second stage employs token symbolization and the constraints collected in the first stage to generate the program inputs that are more possible to pass the parsing code. We have implemented our method on a Java Pathfinder (JPF) based concolic execution engine. The results of the extensive experiments on real-world Java parsing programs demonstrate the effectiveness and efficiency in testing complex parsing programs. Our method detects 6 unknown bugs in the benchmark programs and achieves orders of magnitude speedup to find the same bugs.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115477334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fuzzers aware of the input grammar can explore deeper program states using grammar-aware mutations. Existing grammar-aware fuzzers are ineffective at synthesizing complex bug triggers due to: (i) grammars introducing a sampling bias during input generation due to their structure, and (ii) the current mutation operators for parse trees performing localized small-scale changes. Gramatron uses grammar automatons in conjunction with aggressive mutation operators to synthesize complex bug triggers faster. We build grammar automatons to address the sampling bias. It restructures the grammar to allow for unbiased sampling from the input state space. We redesign grammar-aware mutation operators to be more aggressive, i.e., perform large-scale changes. Gramatron can consistently generate complex bug triggers in an efficient manner as compared to using conventional grammars with parse trees. Inputs generated from scratch by Gramatron have higher diversity as they achieve up to 24.2% more coverage relative to existing fuzzers. Gramatron makes input generation 98% faster and the input representations are 24% smaller. Our redesigned mutation operators are 6.4× more aggressive while still being 68% faster at performing these mutations. We evaluate Gramatron across three interpreters with 10 known bugs consisting of three complex bug triggers and seven simple bug triggers against two Nautilus variants. Gramatron finds all the complex bug triggers reliably and faster. For the simple bug triggers, Gramatron outperforms Nautilus four out of seven times. To demonstrate Gramatron’s effectiveness in the wild, we deployed Gramatron on three popular interpreters for a 10-day fuzzing campaign where it discovered 10 new vulnerabilities.
{"title":"Gramatron: effective grammar-aware fuzzing","authors":"Prashast Srivastava, Mathias Payer","doi":"10.1145/3460319.3464814","DOIUrl":"https://doi.org/10.1145/3460319.3464814","url":null,"abstract":"Fuzzers aware of the input grammar can explore deeper program states using grammar-aware mutations. Existing grammar-aware fuzzers are ineffective at synthesizing complex bug triggers due to: (i) grammars introducing a sampling bias during input generation due to their structure, and (ii) the current mutation operators for parse trees performing localized small-scale changes. Gramatron uses grammar automatons in conjunction with aggressive mutation operators to synthesize complex bug triggers faster. We build grammar automatons to address the sampling bias. It restructures the grammar to allow for unbiased sampling from the input state space. We redesign grammar-aware mutation operators to be more aggressive, i.e., perform large-scale changes. Gramatron can consistently generate complex bug triggers in an efficient manner as compared to using conventional grammars with parse trees. Inputs generated from scratch by Gramatron have higher diversity as they achieve up to 24.2% more coverage relative to existing fuzzers. Gramatron makes input generation 98% faster and the input representations are 24% smaller. Our redesigned mutation operators are 6.4× more aggressive while still being 68% faster at performing these mutations. We evaluate Gramatron across three interpreters with 10 known bugs consisting of three complex bug triggers and seven simple bug triggers against two Nautilus variants. Gramatron finds all the complex bug triggers reliably and faster. For the simple bug triggers, Gramatron outperforms Nautilus four out of seven times. To demonstrate Gramatron’s effectiveness in the wild, we deployed Gramatron on three popular interpreters for a 10-day fuzzing campaign where it discovered 10 new vulnerabilities.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124459923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanlin Wei, Behnaz Hassanshahi, Guangdong Bai, P. Krishnan, Kostyantyn Vorobyov
Various third-party single sign-on (SSO) services (e.g., Facebook Login and Twitter Login) are widely deployed by web applications to facilitate their authentication and authorization processes. Nevertheless, integrating these services in a secure manner remains challenging, such that security issues are continually reported in recent years. In this work, we develop MoScan, a model-based scanner that can be used by software testers and security analysts for detecting and reporting security vulnerabilities in SSO implementations. MoScan takes as input a state machine built based on an SSO standard and our empirical study to represent participants' states and transitions during the login process. In the testing process, it analyzes network traces captured during the execution of SSO services, and increments the state machine which is then used to generate payloads to test the protocol participants. We evaluate MoScan with 23 real-world websites which integrate the Facebook SSO service to test its capability of identifying security vulnerabilities. To show the adaptability of MoScan's state machine, we also test it on Twitter and LinkedIn’s SSO services, and Github's authentication plugin in Jenkins. It detects three known weaknesses and one new logic fault from them, showing a new perspective in testing stateful protocol implementations like SSO services. Our demonstration and the source code of MoScan are available at https://github.com/baigd/moscan.
{"title":"MoScan: a model-based vulnerability scanner for web single sign-on services","authors":"Hanlin Wei, Behnaz Hassanshahi, Guangdong Bai, P. Krishnan, Kostyantyn Vorobyov","doi":"10.1145/3460319.3469081","DOIUrl":"https://doi.org/10.1145/3460319.3469081","url":null,"abstract":"Various third-party single sign-on (SSO) services (e.g., Facebook Login and Twitter Login) are widely deployed by web applications to facilitate their authentication and authorization processes. Nevertheless, integrating these services in a secure manner remains challenging, such that security issues are continually reported in recent years. In this work, we develop MoScan, a model-based scanner that can be used by software testers and security analysts for detecting and reporting security vulnerabilities in SSO implementations. MoScan takes as input a state machine built based on an SSO standard and our empirical study to represent participants' states and transitions during the login process. In the testing process, it analyzes network traces captured during the execution of SSO services, and increments the state machine which is then used to generate payloads to test the protocol participants. We evaluate MoScan with 23 real-world websites which integrate the Facebook SSO service to test its capability of identifying security vulnerabilities. To show the adaptability of MoScan's state machine, we also test it on Twitter and LinkedIn’s SSO services, and Github's authentication plugin in Jenkins. It detects three known weaknesses and one new logic fault from them, showing a new perspective in testing stateful protocol implementations like SSO services. Our demonstration and the source code of MoScan are available at https://github.com/baigd/moscan.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"152 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120947294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GUI testing is an essential part of regression testing for Android apps. For regression GUI testing to remain effective, it is important that obsolete GUI test scripts get repaired after the app has evolved. In this paper, we propose a novel approach named GUIDER to automated repair of GUI test scripts for Android apps. The key novelty of the approach lies in the utilization of both structural and visual information of widgets on app GUIs to better understand what widgets of the base version app become in the updated version. A supporting tool has been implemented for the approach. Experiments conducted on the popular messaging and social media app WeChat show that GUIDER is both effective and efficient. Repairs produced by GUIDER enabled 88.8% and 54.9% more test actions to run correctly than those produced by existing approaches to GUI test repair that rely solely on visual or structural information of app GUIs.
{"title":"GUIDER: GUI structure and vision co-guided test script repair for Android apps","authors":"Tongtong Xu, Minxue Pan, Yu Pei, Guiyin Li, Xia Zeng, Tian Zhang, Yuetang Deng, Xuandong Li","doi":"10.1145/3460319.3464830","DOIUrl":"https://doi.org/10.1145/3460319.3464830","url":null,"abstract":"GUI testing is an essential part of regression testing for Android apps. For regression GUI testing to remain effective, it is important that obsolete GUI test scripts get repaired after the app has evolved. In this paper, we propose a novel approach named GUIDER to automated repair of GUI test scripts for Android apps. The key novelty of the approach lies in the utilization of both structural and visual information of widgets on app GUIs to better understand what widgets of the base version app become in the updated version. A supporting tool has been implemented for the approach. Experiments conducted on the popular messaging and social media app WeChat show that GUIDER is both effective and efficient. Repairs produced by GUIDER enabled 88.8% and 54.9% more test actions to run correctly than those produced by existing approaches to GUI test repair that rely solely on visual or structural information of app GUIs.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133226685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenbang Chen, Zehua Chen, Ziqi Shuai, Guofeng Zhang, Weiyu Pan, Yufeng Zhang, Ji Wang
Symbolic execution is powered by constraint solving. The advancement of constraint solving boosts the development and the applications of symbolic execution. Modern SMT solvers provide the mechanism of solving strategy that allows the users to control the solving procedure, which significantly improves the solver's generalization ability. We observe that the symbolic executions of different programs are actually different constraint solving problems. Therefore, we propose synthesizing a solving strategy for a program to fit the program's symbolic execution best. To achieve this, we divide symbolic execution into two stages. The SMT formulas solved in the first stage are used to online synthesize a solving strategy, which is then employed during the constraint solving in the second stage. We propose novel synthesis algorithms that combine offline trained deep learning models and online tuning to synthesize the solving strategy. The algorithms balance the synthesis overhead and the improvement achieved by the synthesized solving strategy. We have implemented our method on the state-of-the-art symbolic execution engine KLEE for C programs. The results of the extensive experiments indicate that our method effectively improves the efficiency of symbolic execution. On average, our method increases the numbers of queries and paths by 58.76% and 66.11%, respectively. Besides, we applied our method to a Java Pathfinder-based concolic execution engine to validate the generalization ability. The results indicate that our method has a good generalization ability and increases the numbers of queries and paths by 100.24% and 102.6% for the benchmark Java programs, respectively.
{"title":"Synthesize solving strategy for symbolic execution","authors":"Zhenbang Chen, Zehua Chen, Ziqi Shuai, Guofeng Zhang, Weiyu Pan, Yufeng Zhang, Ji Wang","doi":"10.1145/3460319.3464815","DOIUrl":"https://doi.org/10.1145/3460319.3464815","url":null,"abstract":"Symbolic execution is powered by constraint solving. The advancement of constraint solving boosts the development and the applications of symbolic execution. Modern SMT solvers provide the mechanism of solving strategy that allows the users to control the solving procedure, which significantly improves the solver's generalization ability. We observe that the symbolic executions of different programs are actually different constraint solving problems. Therefore, we propose synthesizing a solving strategy for a program to fit the program's symbolic execution best. To achieve this, we divide symbolic execution into two stages. The SMT formulas solved in the first stage are used to online synthesize a solving strategy, which is then employed during the constraint solving in the second stage. We propose novel synthesis algorithms that combine offline trained deep learning models and online tuning to synthesize the solving strategy. The algorithms balance the synthesis overhead and the improvement achieved by the synthesized solving strategy. We have implemented our method on the state-of-the-art symbolic execution engine KLEE for C programs. The results of the extensive experiments indicate that our method effectively improves the efficiency of symbolic execution. On average, our method increases the numbers of queries and paths by 58.76% and 66.11%, respectively. Besides, we applied our method to a Java Pathfinder-based concolic execution engine to validate the generalization ability. The results indicate that our method has a good generalization ability and increases the numbers of queries and paths by 100.24% and 102.6% for the benchmark Java programs, respectively.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123618862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the importance of Android app quality assurance, many Android UI testing tools have been developed by researchers over the years. However, recent studies show that these tools typically achieve low code coverage on popular industrial apps. In fact, given a reasonable amount of run time, most state-of-the-art tools cannot even outperform a simple tool, Monkey, on popular industrial apps with large codebases and sophisticated functionalities. Our motivating study finds that these tools perform two types of operations, UI Hierarchy Capturing (capturing information about the contents on the screen) and UI Event Execution (executing UI events, such as clicks), often inefficiently using UIAutomator, a component of the Android framework. In total, these two types of operations use on average 70% of the given test time. Based on this finding, to improve the effectiveness of Android testing tools, we propose TOLLER, a tool consisting of infrastructure enhancements to the Android operating system. TOLLER injects itself into the same virtual machine as the app under test, giving TOLLER direct access to the app’s runtime memory. TOLLER is thus able to directly (1) access UI data structures, and thus capture contents on the screen without the overhead of invoking the Android framework services or remote procedure calls (RPCs), and (2) invoke UI event handlers without needing to execute the UI events. Compared with the often-used UIAutomator, TOLLER reduces average time usage of UI Hierarchy Capturing and UI Event Execution operations by up to 97% and 95%, respectively. We integrate TOLLER with existing state-of-the-art/practice Android UI testing tools and achieve the range of 11.8% to 70.1% relative code coverage improvement on average. We also find that TOLLER-enhanced tools are able to trigger 1.4x to 3.6x distinct crashes compared with their original versions without TOLLER enhancement. These improvements are so substantial that they also change the relative competitiveness of the tools under empirical comparison. Our findings highlight the practicality of TOLLER as well as raising the community awareness of infrastructure support’s significance beyond the community’s existing heavy focus on algorithms.
{"title":"An infrastructure approach to improving effectiveness of Android UI testing tools","authors":"Wenyu Wang, Wing Lam, Tao Xie","doi":"10.1145/3460319.3464828","DOIUrl":"https://doi.org/10.1145/3460319.3464828","url":null,"abstract":"Due to the importance of Android app quality assurance, many Android UI testing tools have been developed by researchers over the years. However, recent studies show that these tools typically achieve low code coverage on popular industrial apps. In fact, given a reasonable amount of run time, most state-of-the-art tools cannot even outperform a simple tool, Monkey, on popular industrial apps with large codebases and sophisticated functionalities. Our motivating study finds that these tools perform two types of operations, UI Hierarchy Capturing (capturing information about the contents on the screen) and UI Event Execution (executing UI events, such as clicks), often inefficiently using UIAutomator, a component of the Android framework. In total, these two types of operations use on average 70% of the given test time. Based on this finding, to improve the effectiveness of Android testing tools, we propose TOLLER, a tool consisting of infrastructure enhancements to the Android operating system. TOLLER injects itself into the same virtual machine as the app under test, giving TOLLER direct access to the app’s runtime memory. TOLLER is thus able to directly (1) access UI data structures, and thus capture contents on the screen without the overhead of invoking the Android framework services or remote procedure calls (RPCs), and (2) invoke UI event handlers without needing to execute the UI events. Compared with the often-used UIAutomator, TOLLER reduces average time usage of UI Hierarchy Capturing and UI Event Execution operations by up to 97% and 95%, respectively. We integrate TOLLER with existing state-of-the-art/practice Android UI testing tools and achieve the range of 11.8% to 70.1% relative code coverage improvement on average. We also find that TOLLER-enhanced tools are able to trigger 1.4x to 3.6x distinct crashes compared with their original versions without TOLLER enhancement. These improvements are so substantial that they also change the relative competitiveness of the tools under empirical comparison. Our findings highlight the practicality of TOLLER as well as raising the community awareness of infrastructure support’s significance beyond the community’s existing heavy focus on algorithms.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127400682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C is a dominant language for implementing system software. Unfortunately, its support for low-level control of memory often leads to memory errors. Dynamic analysis tools, which have been widely used for detecting memory errors at runtime, are not yet satisfactory as they cannot deterministically and completely detect some types of memory errors, e.g., segment confusion errors, sub-object overflows, use-after-frees, and memory leaks. We propose Smatus, short for smart status, a new dynamic analysis approach that supports comprehensive runtime detection of memory errors. The key innovation is to create and maintain a small status node for each memory object. Our approach tracks not only the bounds of each pointer’s referent but also the status and reference count of the referent in its status node, where the status represents the liveness and segment type of the referent. A status node is smart as it is automatically destroyed when it becomes useless. To the best of our knowledge, Smatus represents the most comprehensive approach of its kind. In terms of effectiveness (for detecting more kinds of errors), Smatus outperforms state-of-the-art tools, Google’s AddressSanitizer, SoftBoundCETS and Valgrind. In terms of performance, Smatus outperforms SoftBoundCETS and Valgrind in terms of both time and memory overheads incurred, and is on par with AddressSanitizer in terms of the time and memory overheads tradeoff (with much lower memory overhead incurred).
{"title":"Runtime detection of memory errors with smart status","authors":"Zhe Chen, Chong Wang, Junqi Yan, Yulei Sui, Jingling Xue","doi":"10.1145/3460319.3464807","DOIUrl":"https://doi.org/10.1145/3460319.3464807","url":null,"abstract":"C is a dominant language for implementing system software. Unfortunately, its support for low-level control of memory often leads to memory errors. Dynamic analysis tools, which have been widely used for detecting memory errors at runtime, are not yet satisfactory as they cannot deterministically and completely detect some types of memory errors, e.g., segment confusion errors, sub-object overflows, use-after-frees, and memory leaks. We propose Smatus, short for smart status, a new dynamic analysis approach that supports comprehensive runtime detection of memory errors. The key innovation is to create and maintain a small status node for each memory object. Our approach tracks not only the bounds of each pointer’s referent but also the status and reference count of the referent in its status node, where the status represents the liveness and segment type of the referent. A status node is smart as it is automatically destroyed when it becomes useless. To the best of our knowledge, Smatus represents the most comprehensive approach of its kind. In terms of effectiveness (for detecting more kinds of errors), Smatus outperforms state-of-the-art tools, Google’s AddressSanitizer, SoftBoundCETS and Valgrind. In terms of performance, Smatus outperforms SoftBoundCETS and Valgrind in terms of both time and memory overheads incurred, and is on par with AddressSanitizer in terms of the time and memory overheads tradeoff (with much lower memory overhead incurred).","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125571097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alberto Martin-Lopez, Sergio Segura, Antonio Ruiz-Cortés
Testing RESTful APIs thoroughly is critical due to their key role in software integration. Existing tools for the automated generation of test cases in this domain have shown great promise, but their applicability is limited as they mostly rely on random inputs, i.e., fuzzing. In this paper, we present RESTest, an open source black-box testing framework for RESTful web APIs. Based on the API specification, RESTest supports the generation of test cases using different testing techniques such as fuzzing and constraint-based testing, among others. RESTest is developed as a framework and can be easily extended with new test case generators and test writers for different programming languages. We evaluate the tool in two scenarios: offline and online testing. In the former, we show how RESTest can efficiently generate realistic test cases (test inputs and test oracles) that uncover bugs in real-world APIs. In the latter, we show RESTest's capabilities as a continuous testing and monitoring framework. Demo video: https://youtu.be/1f_tjdkaCKo.
由于RESTful api在软件集成中的关键作用,彻底测试它们是至关重要的。在这个领域中,用于自动生成测试用例的现有工具已经显示出很大的希望,但是它们的适用性是有限的,因为它们主要依赖于随机输入,即模糊测试。在本文中,我们介绍了RESTest,一个用于RESTful web api的开源黑盒测试框架。基于API规范,rest支持使用不同的测试技术生成测试用例,例如模糊测试和基于约束的测试等。rest是作为一个框架开发的,并且可以很容易地使用针对不同编程语言的新测试用例生成器和测试编写器进行扩展。我们在两种情况下评估该工具:离线和在线测试。在前者中,我们展示了rest如何有效地生成真实的测试用例(测试输入和测试oracle),从而发现真实api中的bug。在后一篇文章中,我们展示了rest作为连续测试和监视框架的功能。演示视频:https://youtu.be/1f_tjdkaCKo。
{"title":"RESTest: automated black-box testing of RESTful web APIs","authors":"Alberto Martin-Lopez, Sergio Segura, Antonio Ruiz-Cortés","doi":"10.1145/3460319.3469082","DOIUrl":"https://doi.org/10.1145/3460319.3469082","url":null,"abstract":"Testing RESTful APIs thoroughly is critical due to their key role in software integration. Existing tools for the automated generation of test cases in this domain have shown great promise, but their applicability is limited as they mostly rely on random inputs, i.e., fuzzing. In this paper, we present RESTest, an open source black-box testing framework for RESTful web APIs. Based on the API specification, RESTest supports the generation of test cases using different testing techniques such as fuzzing and constraint-based testing, among others. RESTest is developed as a framework and can be easily extended with new test case generators and test writers for different programming languages. We evaluate the tool in two scenarios: offline and online testing. In the former, we show how RESTest can efficiently generate realistic test cases (test inputs and test oracles) that uncover bugs in real-world APIs. In the latter, we show RESTest's capabilities as a continuous testing and monitoring framework. Demo video: https://youtu.be/1f_tjdkaCKo.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115107583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The most popular static taint analysis tools for Android allow users to change the underlying analysis algorithms through configuration options. However, the large configuration spaces make it difficult for developers and users alike to understand the full capabilities of these tools, and studies to-date have only focused on individual configurations. In this work, we present the first study that evaluates the configurations in Android taint analysis tools, focusing on the two most popular tools, FlowDroid and DroidSafe. First, we perform a manual code investigation to better understand how configurations are implemented in both tools. We formalize the expected effects of configuration option settings in terms of precision and soundness partial orders which we use to systematically test the configuration space. Second, we create a new dataset of 756 manually classified flows across 18 open-source real-world apps and conduct large-scale experiments on this dataset and micro-benchmarks. We observe that configurations make significant tradeoffs on the performance, precision, and soundness of both tools. The studies to-date would reach different conclusions on the tools' capabilities were they to consider configurations or use real-world datasets. In addition, we study the individual options through a statistical analysis and make actionable recommendations for users to tune the tools to their own ends. Finally, we use the partial orders to test the tool configuration spaces and detect 21 instances where options behaved in unexpected and incorrect ways, demonstrating the need for rigorous testing of configuration spaces.
{"title":"The impact of tool configuration spaces on the evaluation of configurable taint analysis for Android","authors":"Austin Mordahl, Shiyi Wei","doi":"10.1145/3460319.3464823","DOIUrl":"https://doi.org/10.1145/3460319.3464823","url":null,"abstract":"The most popular static taint analysis tools for Android allow users to change the underlying analysis algorithms through configuration options. However, the large configuration spaces make it difficult for developers and users alike to understand the full capabilities of these tools, and studies to-date have only focused on individual configurations. In this work, we present the first study that evaluates the configurations in Android taint analysis tools, focusing on the two most popular tools, FlowDroid and DroidSafe. First, we perform a manual code investigation to better understand how configurations are implemented in both tools. We formalize the expected effects of configuration option settings in terms of precision and soundness partial orders which we use to systematically test the configuration space. Second, we create a new dataset of 756 manually classified flows across 18 open-source real-world apps and conduct large-scale experiments on this dataset and micro-benchmarks. We observe that configurations make significant tradeoffs on the performance, precision, and soundness of both tools. The studies to-date would reach different conclusions on the tools' capabilities were they to consider configurations or use real-world datasets. In addition, we study the individual options through a statistical analysis and make actionable recommendations for users to tune the tools to their own ends. Finally, we use the partial orders to test the tool configuration spaces and detect 21 instances where options behaved in unexpected and incorrect ways, demonstrating the need for rigorous testing of configuration spaces.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129835048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning(DL) techniques attract people from various fields with superior performance in making progressive breakthroughs. To ensure the quality of DL techniques, researchers have been working on testing and verification approaches. Some recent studies reveal that the underlying DL operators could cause defects inside a DL model. DL operators work as fundamental components in DL libraries. Library developers still work on practical approaches to ensure the quality of operators they provide. However, the variety of DL operators and the implementation complexity make it challenging to evaluate their quality. Operator testing with limited test cases may fail to reveal hidden defects inside the implementation. Besides, the existing model-to-library testing approach requires extra labor and time cost to identify and locate errors, i.e., developers can only react to the exposed defects. This paper proposes a fuzzing-based operator-level precision testing approach to estimate individual DL operators' precision errors to bridge this gap. Unlike conventional fuzzing techniques, valid shape variable inputs and fine-grained precision error evaluation are implemented. The testing of DL operators is treated as a searching problem to maximize output precision errors. We implement our approach in a tool named Predoo and conduct an experiment on seven DL operators from TensorFlow. The experiment result shows that Predoo can trigger larger precision errors compared to the error threshold declared in the testing scripts from the TensorFlow repository.
{"title":"Predoo: precision testing of deep learning operators","authors":"Xufan Zhang, Ning Sun, Chunrong Fang, Jiawei Liu, Jia Liu, Dong Chai, Jiang Wang, Zhenyu Chen","doi":"10.1145/3460319.3464843","DOIUrl":"https://doi.org/10.1145/3460319.3464843","url":null,"abstract":"Deep learning(DL) techniques attract people from various fields with superior performance in making progressive breakthroughs. To ensure the quality of DL techniques, researchers have been working on testing and verification approaches. Some recent studies reveal that the underlying DL operators could cause defects inside a DL model. DL operators work as fundamental components in DL libraries. Library developers still work on practical approaches to ensure the quality of operators they provide. However, the variety of DL operators and the implementation complexity make it challenging to evaluate their quality. Operator testing with limited test cases may fail to reveal hidden defects inside the implementation. Besides, the existing model-to-library testing approach requires extra labor and time cost to identify and locate errors, i.e., developers can only react to the exposed defects. This paper proposes a fuzzing-based operator-level precision testing approach to estimate individual DL operators' precision errors to bridge this gap. Unlike conventional fuzzing techniques, valid shape variable inputs and fine-grained precision error evaluation are implemented. The testing of DL operators is treated as a searching problem to maximize output precision errors. We implement our approach in a tool named Predoo and conduct an experiment on seven DL operators from TensorFlow. The experiment result shows that Predoo can trigger larger precision errors compared to the error threshold declared in the testing scripts from the TensorFlow repository.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125469495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}