Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis最新文献_第3页

A lightweight framework for function name reassignment based on large-scale stripped binaries 基于大规模剥离二进制文件的函数名称重分配的轻量级框架

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-07-11 DOI: 10.1145/3460319.3464804

Han Gao, Shaoyin Cheng, Yinxing Xue, Weiming Zhang

Software in the wild is usually released as stripped binaries that contain no debug information (e.g., function names). This paper studies the issue of reassigning descriptive names for functions to help facilitate reverse engineering. Since the essence of this issue is a data-driven prediction task, persuasive research should be based on sufficiently large-scale and diverse data. However, prior studies can only be based on small-scale datasets because their techniques suffer from heavyweight binary analysis, making them powerless in the face of big-size and large-scale binaries. This paper presents the Neural Function Rename Engine (NFRE), a lightweight framework for function name reassignment that utilizes both sequential and structural information of assembly code. NFRE uses fine-grained and easily acquired features to model assembly code, making it more effective and efficient than existing techniques. In addition, we construct a large-scale dataset and present two data-preprocessing approaches to help improve its usability. Benefiting from the lightweight design, NFRE can be efficiently trained on the large-scale dataset, thereby having better generalization capability for unknown functions. The comparative experiments show that NFRE outperforms two existing techniques by a relative improvement of 32% and 16%, respectively, while the time cost for binary analysis is much less.

野外的软件通常以不包含调试信息(例如，函数名)的剥离二进制文件的形式发布。本文研究了为函数重新分配描述性名称的问题，以帮助逆向工程。由于这个问题的本质是一个数据驱动的预测任务，因此说服性研究应该基于足够大规模和多样化的数据。然而，之前的研究只能基于小规模的数据集，因为它们的技术受到重量级二进制分析的影响，在面对大尺寸和大规模二进制数据时无能为力。本文提出了神经函数重命名引擎(NFRE)，这是一个轻量级的函数重命名框架，它利用了汇编代码的顺序信息和结构信息。NFRE使用细粒度和容易获得的特性来建模汇编代码，使其比现有技术更有效和高效。此外，我们构建了一个大规模的数据集，并提出了两种数据预处理方法来帮助提高其可用性。得益于轻量级的设计，NFRE可以在大规模数据集上进行有效的训练，从而对未知函数具有更好的泛化能力。对比实验表明，NFRE比现有的两种方法分别提高了32%和16%，而二进制分析的时间成本要低得多。

{"title":"A lightweight framework for function name reassignment based on large-scale stripped binaries","authors":"Han Gao, Shaoyin Cheng, Yinxing Xue, Weiming Zhang","doi":"10.1145/3460319.3464804","DOIUrl":"https://doi.org/10.1145/3460319.3464804","url":null,"abstract":"Software in the wild is usually released as stripped binaries that contain no debug information (e.g., function names). This paper studies the issue of reassigning descriptive names for functions to help facilitate reverse engineering. Since the essence of this issue is a data-driven prediction task, persuasive research should be based on sufficiently large-scale and diverse data. However, prior studies can only be based on small-scale datasets because their techniques suffer from heavyweight binary analysis, making them powerless in the face of big-size and large-scale binaries. This paper presents the Neural Function Rename Engine (NFRE), a lightweight framework for function name reassignment that utilizes both sequential and structural information of assembly code. NFRE uses fine-grained and easily acquired features to model assembly code, making it more effective and efficient than existing techniques. In addition, we construct a large-scale dataset and present two data-preprocessing approaches to help improve its usability. Benefiting from the lightweight design, NFRE can be efficiently trained on the large-scale dataset, thereby having better generalization capability for unknown functions. The comparative experiments show that NFRE outperforms two existing techniques by a relative improvement of 32% and 16%, respectively, while the time cost for binary analysis is much less.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125530587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Empirically evaluating readily available information for regression test optimization in continuous integration 经验性地评估持续集成中回归测试优化的可用信息

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-07-11 DOI: 10.1145/3460319.3464834

Daniel Elsner, Florian Hauer, A. Pretschner, Silke Reimer

Regression test selection (RTS) and prioritization (RTP) techniques aim to reduce testing efforts and developer feedback time after a change to the code base. Using various information sources, including test traces, build dependencies, version control data, and test histories, they have been shown to be effective. However, not all of these sources are guaranteed to be available and accessible for arbitrary continuous integration (CI) environments. In contrast, metadata from version control systems (VCSs) and CI systems are readily available and inexpensive. Yet, corresponding RTP and RTS techniques are scattered across research and often only evaluated on synthetic faults or in a specific industrial context. It is cumbersome for practitioners to identify insights that apply to their context, let alone to calibrate associated parameters for maximum cost-effectiveness. This paper consolidates existing work on RTP and unsafe RTS into an actionable methodology to build and evaluate such approaches that exclusively rely on CI and VCS metadata. To investigate how these approaches from prior research compare in heterogeneous settings, we apply the methodology in a large-scale empirical study on a set of 23 projects covering 37,000 CI logs and 76,000 VCS commits. We find that these approaches significantly outperform established RTP baselines and, while still triggering 90% of the failures, we show that practitioners can expect to save on average 84% of test execution time for unsafe RTS. We also find that it can be beneficial to limit training data, features from test history work better than change-based features, and, somewhat surprisingly, simple and well-known heuristics often outperform complex machine-learned models.

回归测试选择(RTS)和优先级排序(RTP)技术的目的是减少测试工作和更改代码库后开发人员的反馈时间。使用各种信息源，包括测试跟踪、构建依赖关系、版本控制数据和测试历史，它们已被证明是有效的。然而，并不是所有这些源都保证对任意持续集成(CI)环境可用和可访问。相反，来自版本控制系统(vcs)和CI系统的元数据很容易获得，而且价格低廉。然而，相应的RTP和RTS技术分散在研究中，通常只在合成故障或特定工业环境中进行评估。对于从业者来说，识别应用于其环境的见解是很麻烦的，更不用说校准相关参数以获得最大的成本效益了。本文将RTP和不安全RTS的现有工作整合为一种可操作的方法，以构建和评估仅依赖CI和VCS元数据的方法。为了研究这些来自先前研究的方法在异构环境中的比较，我们将该方法应用于一项大规模的实证研究，该研究涵盖了23个项目，涵盖37,000个CI日志和76,000个VCS提交。我们发现这些方法明显优于已建立的RTP基线，虽然仍然触发90%的失败，但我们表明从业者可以期望为不安全的RTS节省平均84%的测试执行时间。我们还发现限制训练数据是有益的，来自测试历史的特征比基于变化的特征工作得更好，并且，有点令人惊讶的是，简单和众所周知的启发式通常优于复杂的机器学习模型。

{"title":"Empirically evaluating readily available information for regression test optimization in continuous integration","authors":"Daniel Elsner, Florian Hauer, A. Pretschner, Silke Reimer","doi":"10.1145/3460319.3464834","DOIUrl":"https://doi.org/10.1145/3460319.3464834","url":null,"abstract":"Regression test selection (RTS) and prioritization (RTP) techniques aim to reduce testing efforts and developer feedback time after a change to the code base. Using various information sources, including test traces, build dependencies, version control data, and test histories, they have been shown to be effective. However, not all of these sources are guaranteed to be available and accessible for arbitrary continuous integration (CI) environments. In contrast, metadata from version control systems (VCSs) and CI systems are readily available and inexpensive. Yet, corresponding RTP and RTS techniques are scattered across research and often only evaluated on synthetic faults or in a specific industrial context. It is cumbersome for practitioners to identify insights that apply to their context, let alone to calibrate associated parameters for maximum cost-effectiveness. This paper consolidates existing work on RTP and unsafe RTS into an actionable methodology to build and evaluate such approaches that exclusively rely on CI and VCS metadata. To investigate how these approaches from prior research compare in heterogeneous settings, we apply the methodology in a large-scale empirical study on a set of 23 projects covering 37,000 CI logs and 76,000 VCS commits. We find that these approaches significantly outperform established RTP baselines and, while still triggering 90% of the failures, we show that practitioners can expect to save on average 84% of test execution time for unsafe RTS. We also find that it can be beneficial to limit training data, features from test history work better than change-based features, and, somewhat surprisingly, simple and well-known heuristics often outperform complex machine-learned models.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114718449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Fixing dependency errors for Python build reproducibility 修复Python构建重现性的依赖错误

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-07-11 DOI: 10.1145/3460319.3464797

Suchita Mukherjee, Abigail Almanza, Cindy Rubio-González

Software reproducibility is important for re-usability and the cumulative progress of research. An important manifestation of unreproducible software is the changed outcome of software builds over time. While enhancing code reuse, the use of open-source dependency packages hosted on centralized repositories such as PyPI can have adverse effects on build reproducibility. Frequent updates to these packages often cause their latest versions to have breaking changes for applications using them. Large Python applications risk their historical builds becoming unreproducible due to the widespread usage of Python dependencies, and the lack of uniform practices for dependency version specification. Manually fixing dependency errors requires expensive developer time and effort, while automated approaches face challenges of parsing unstructured build logs, finding transitive dependencies, and exploring an exponential search space of dependency versions. In this paper, we investigate how open-source Python projects specify dependency versions, and how their reproducibility is impacted by dependency packages. We propose a tool PyDFix to detect and fix unreproducibility in Python builds caused by dependency errors. PyDFix is evaluated on two bug datasets BugSwarm and BugsInPy, both of which are built from real-world open-source projects. PyDFix analyzes a total of 2,702 builds, identifying 1,921 (71.1%) of them to be unreproducible due to dependency errors. From these, PyDFix provides a complete fix for 859 (44.7%) builds, and partial fixes for an additional 632 (32.9%) builds.

软件的可再现性对于软件的可重用性和研究的累积进展具有重要意义。不可复制软件的一个重要表现是随着时间的推移，软件构建的结果发生了变化。在增强代码重用的同时，使用托管在集中式存储库(如PyPI)上的开源依赖包可能会对构建可再现性产生不利影响。对这些包的频繁更新通常会导致其最新版本对使用它们的应用程序产生破坏性更改。由于Python依赖项的广泛使用，以及依赖项版本规范缺乏统一的实践，大型Python应用程序的历史构建可能会变得不可复制。手动修复依赖错误需要昂贵的开发人员时间和精力，而自动化方法面临着解析非结构化构建日志、查找可传递依赖和探索依赖版本的指数搜索空间的挑战。在本文中，我们研究了开源Python项目如何指定依赖版本，以及它们的可重复性如何受到依赖包的影响。我们提出了一个PyDFix工具来检测和修复由依赖错误引起的Python构建中的不可再现性。PyDFix在两个bug数据集BugSwarm和BugsInPy上进行评估，这两个数据集都是基于真实的开源项目构建的。PyDFix总共分析了2,702个构建，其中1,921个(71.1%)由于依赖错误而无法重现。从这些构建中，PyDFix为859个(44.7%)构建提供了完整的修复，并为另外632个(32.9%)构建提供了部分修复。

{"title":"Fixing dependency errors for Python build reproducibility","authors":"Suchita Mukherjee, Abigail Almanza, Cindy Rubio-González","doi":"10.1145/3460319.3464797","DOIUrl":"https://doi.org/10.1145/3460319.3464797","url":null,"abstract":"Software reproducibility is important for re-usability and the cumulative progress of research. An important manifestation of unreproducible software is the changed outcome of software builds over time. While enhancing code reuse, the use of open-source dependency packages hosted on centralized repositories such as PyPI can have adverse effects on build reproducibility. Frequent updates to these packages often cause their latest versions to have breaking changes for applications using them. Large Python applications risk their historical builds becoming unreproducible due to the widespread usage of Python dependencies, and the lack of uniform practices for dependency version specification. Manually fixing dependency errors requires expensive developer time and effort, while automated approaches face challenges of parsing unstructured build logs, finding transitive dependencies, and exploring an exponential search space of dependency versions. In this paper, we investigate how open-source Python projects specify dependency versions, and how their reproducibility is impacted by dependency packages. We propose a tool PyDFix to detect and fix unreproducibility in Python builds caused by dependency errors. PyDFix is evaluated on two bug datasets BugSwarm and BugsInPy, both of which are built from real-world open-source projects. PyDFix analyzes a total of 2,702 builds, identifying 1,921 (71.1%) of them to be unreproducible due to dependency errors. From these, PyDFix provides a complete fix for 859 (44.7%) builds, and partial fixes for an additional 632 (32.9%) builds.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123620524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

SCStudio: a secure and efficient integrated development environment for smart contracts SCStudio:一个安全高效的智能合约集成开发环境

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-07-11 DOI: 10.1145/3460319.3469078

Meng Ren, Fuchen Ma, Zijing Yin, Huizhong Li, Ying Fu, Ting Chen, Yu Jiang

With the increasing popularity of block-chain technologies, more and more engineers use smart contracts for application implementation. Traditional supporting tools can either provide code completions based on static libraries or detect a limited set of vulnerabilities, which results in the manpower waste during coding and miss-detection of bugs. In this work, we propose SCStudio, a unified smart contract development platform, which aims to help developers implement more secure smart contracts easily. The core idea is to realize real-time security-reinforced recommendation through pattern-based learning; and to perform security-oriented validation via integrated testing. SCStudio was implemented as a plug-in of VS Code. It has been used as the official development tool of WeBank and integrated as the recommended development tool by FISCO-BCOS community. In practice, it outperforms existing contract development environments, such as Remix, improving the average word suggestion accuracy by 30%-60% and helping detect about 25% more vulnerabilities. The video is presented at https://youtu.be/l6hW3Ds5Tkg.

随着区块链技术的日益普及，越来越多的工程师使用智能合约来实现应用。传统的支持工具要么提供基于静态库的代码补全，要么检测有限的漏洞集，这导致了编码过程中的人力浪费和错误检测的缺失。在这项工作中，我们提出了SCStudio，这是一个统一的智能合约开发平台，旨在帮助开发人员轻松实现更安全的智能合约。其核心思想是通过基于模式的学习实现实时的安全增强推荐;并通过集成测试执行面向安全的验证。SCStudio是作为VS Code的插件实现的。作为微众银行官方开发工具，被FISCO-BCOS社区整合为推荐开发工具。在实践中，它优于现有的合约开发环境，如Remix，将平均单词建议准确率提高了30%-60%，并帮助检测大约25%的漏洞。该视频在https://youtu.be/l6hW3Ds5Tkg上发布。

{"title":"SCStudio: a secure and efficient integrated development environment for smart contracts","authors":"Meng Ren, Fuchen Ma, Zijing Yin, Huizhong Li, Ying Fu, Ting Chen, Yu Jiang","doi":"10.1145/3460319.3469078","DOIUrl":"https://doi.org/10.1145/3460319.3469078","url":null,"abstract":"With the increasing popularity of block-chain technologies, more and more engineers use smart contracts for application implementation. Traditional supporting tools can either provide code completions based on static libraries or detect a limited set of vulnerabilities, which results in the manpower waste during coding and miss-detection of bugs. In this work, we propose SCStudio, a unified smart contract development platform, which aims to help developers implement more secure smart contracts easily. The core idea is to realize real-time security-reinforced recommendation through pattern-based learning; and to perform security-oriented validation via integrated testing. SCStudio was implemented as a plug-in of VS Code. It has been used as the official development tool of WeBank and integrated as the recommended development tool by FISCO-BCOS community. In practice, it outperforms existing contract development environments, such as Remix, improving the average word suggestion accuracy by 30%-60% and helping detect about 25% more vulnerabilities. The video is presented at https://youtu.be/l6hW3Ds5Tkg.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129450729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Identifying privacy weaknesses from multi-party trigger-action integration platforms 识别多方触发-操作集成平台的隐私弱点

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-07-11 DOI: 10.1145/3460319.3464838

Kulani Mahadewa, Yanjun Zhang, Guangdong Bai, Lei Bu, Zhiqiang Zuo, Dileepa Fernando, Zhenkai Liang, J. Dong

With many trigger-action platforms that integrate Internet of Things (IoT) systems and online services, rich functionalities transparently connecting digital and physical worlds become easily accessible for the end users. On the other hand, such facilities incorporate multiple parties whose data control policies may radically differ and even contradict each other, and thus privacy violations may arise throughout the lifecycle (e.g., generation and transmission) of triggers and actions. In this work, we conduct an in-depth study on the privacy issues in multi-party trigger-action integration platforms (TAIPs). We first characterize privacy violations that may arise with the integration of heterogeneous systems and services. Based on this knowledge, we propose Taifu, a dynamic testing approach to identify privacy weaknesses from the TAIP. The key insight of Taifu is that the applets which actually program the trigger-action rules can be used as test cases to explore the behavior of the TAIP. We evaluate the effectiveness of our approach by applying it on the TAIPs that are built around the IFTTT platform. To our great surprise, we find that privacy violations are prevalent among them. Using the automatically generated 407 applets, each from a different TAIP, Taifu detects 194 cases with access policy breaches, 218 access control missing, 90 access revocation missing, 15 unintended flows, and 73 over-privilege access.

随着许多触发器操作平台集成了物联网(IoT)系统和在线服务，最终用户可以轻松访问透明连接数字和物理世界的丰富功能。另一方面，这些设施包含多方，其数据控制策略可能完全不同，甚至相互矛盾，因此在触发器和操作的整个生命周期(例如，生成和传输)中可能会出现隐私侵犯。在这项工作中，我们对多方触发-动作集成平台(TAIPs)中的隐私问题进行了深入研究。我们首先描述了在集成异构系统和服务时可能出现的隐私侵犯。在此基础上，我们提出了一种动态测试方法Taifu，从TAIP中识别隐私弱点。Taifu的关键见解是，实际编写触发-操作规则的applet可以用作测试用例来探索TAIP的行为。我们通过将其应用于围绕IFTTT平台构建的ttip来评估我们方法的有效性。令我们非常惊讶的是，我们发现侵犯隐私的行为在他们中间很普遍。使用自动生成的407个applet(每个applet都来自不同的TAIP)， Taifu检测到194个访问策略违规案例，218个访问控制缺失，90个访问撤销缺失，15个意外流和73个超权限访问。

{"title":"Identifying privacy weaknesses from multi-party trigger-action integration platforms","authors":"Kulani Mahadewa, Yanjun Zhang, Guangdong Bai, Lei Bu, Zhiqiang Zuo, Dileepa Fernando, Zhenkai Liang, J. Dong","doi":"10.1145/3460319.3464838","DOIUrl":"https://doi.org/10.1145/3460319.3464838","url":null,"abstract":"With many trigger-action platforms that integrate Internet of Things (IoT) systems and online services, rich functionalities transparently connecting digital and physical worlds become easily accessible for the end users. On the other hand, such facilities incorporate multiple parties whose data control policies may radically differ and even contradict each other, and thus privacy violations may arise throughout the lifecycle (e.g., generation and transmission) of triggers and actions. In this work, we conduct an in-depth study on the privacy issues in multi-party trigger-action integration platforms (TAIPs). We first characterize privacy violations that may arise with the integration of heterogeneous systems and services. Based on this knowledge, we propose Taifu, a dynamic testing approach to identify privacy weaknesses from the TAIP. The key insight of Taifu is that the applets which actually program the trigger-action rules can be used as test cases to explore the behavior of the TAIP. We evaluate the effectiveness of our approach by applying it on the TAIPs that are built around the IFTTT platform. To our great surprise, we find that privacy violations are prevalent among them. Using the automatically generated 407 applets, each from a different TAIP, Taifu detects 194 cases with access policy breaches, 218 access control missing, 90 access revocation missing, 15 unintended flows, and 73 over-privilege access.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124567206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Test-case prioritization for configuration testing 配置测试的测试用例优先级

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-07-11 DOI: 10.1145/3460319.3464810

Runxiang Cheng, Lingming Zhang, D. Marinov, Tianyin Xu

Configuration changes are among the dominant causes of failures of large-scale software system deployment. Given the velocity of configuration changes, typically at the scale of hundreds to thousands of times daily in modern cloud systems, checking these configuration changes is critical to prevent failures due to misconfigurations. Recent work has proposed configuration testing, Ctest, a technique that tests configuration changes together with the code that uses the changed configurations. Ctest can automatically generate a large number of ctests that can effectively detect misconfigurations, including those that are hard to detect by traditional techniques. However, running ctests can take a long time to detect misconfigurations. Inspired by traditional test-case prioritization (TCP) that aims to reorder test executions to speed up detection of regression code faults, we propose to apply TCP to reorder ctests to speed up detection of misconfigurations. We extensively evaluate a total of 84 traditional and novel ctest-specific TCP techniques. The experimental results on five widely used cloud projects demonstrate that TCP can substantially speed up misconfiguration detection. Our study provides guidelines for applying TCP to configuration testing in practice.

配置更改是大规模软件系统部署失败的主要原因之一。考虑到配置更改的速度，通常在现代云系统中每天以数百到数千次的规模进行更改，检查这些配置更改对于防止由于配置错误而导致的故障至关重要。最近的工作提出了配置测试，Ctest，这是一种测试配置更改以及使用更改的配置的代码的技术。Ctest可以自动生成大量可以有效检测错误配置的Ctest，包括那些传统技术难以检测到的错误配置。但是，运行ctest可能需要很长时间才能检测到错误配置。传统的测试用例优先级(TCP)旨在重新排序测试执行以加快对回归代码错误的检测，受其启发，我们提出将TCP应用于重新排序测试以加快对错误配置的检测。我们广泛地评估了总共84种传统的和新的ctest特定的TCP技术。在五个广泛使用的云项目上的实验结果表明，TCP可以大大加快错误配置检测的速度。本文的研究为TCP在实际配置测试中的应用提供了指导。

{"title":"Test-case prioritization for configuration testing","authors":"Runxiang Cheng, Lingming Zhang, D. Marinov, Tianyin Xu","doi":"10.1145/3460319.3464810","DOIUrl":"https://doi.org/10.1145/3460319.3464810","url":null,"abstract":"Configuration changes are among the dominant causes of failures of large-scale software system deployment. Given the velocity of configuration changes, typically at the scale of hundreds to thousands of times daily in modern cloud systems, checking these configuration changes is critical to prevent failures due to misconfigurations. Recent work has proposed configuration testing, Ctest, a technique that tests configuration changes together with the code that uses the changed configurations. Ctest can automatically generate a large number of ctests that can effectively detect misconfigurations, including those that are hard to detect by traditional techniques. However, running ctests can take a long time to detect misconfigurations. Inspired by traditional test-case prioritization (TCP) that aims to reorder test executions to speed up detection of regression code faults, we propose to apply TCP to reorder ctests to speed up detection of misconfigurations. We extensively evaluate a total of 84 traditional and novel ctest-specific TCP techniques. The experimental results on five widely used cloud projects demonstrate that TCP can substantially speed up misconfiguration detection. Our study provides guidelines for applying TCP to configuration testing in practice.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122645574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Semantic matching of GUI events for test reuse: are we there yet? GUI事件的语义匹配以实现测试重用:我们做到了吗?

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-07-11 DOI: 10.1145/3460319.3464827

L. Mariani, Ali Mohebbi, M. Pezzè, Valerio Terragni

GUI testing is an important but expensive activity. Recently, research on test reuse approaches for Android applications produced interesting results. Test reuse approaches automatically migrate human-designed GUI tests from a source app to a target app that shares similar functionalities. They achieve this by exploiting semantic similarity among textual information of GUI widgets. Semantic matching of GUI events plays a crucial role in these approaches. In this paper, we present the first empirical study on semantic matching of GUI events. Our study involves 253 configurations of the semantic matching, 337 unique queries, and 8,099 distinct GUI events. We report several key findings that indicate how to improve semantic matching of test reuse approaches, propose SemFinder a novel semantic matching algorithm that outperforms existing solutions, and identify several interesting research directions.

GUI测试是一项重要但代价高昂的活动。最近，对Android应用程序测试重用方法的研究产生了有趣的结果。测试重用方法自动地将人为设计的GUI测试从源应用程序迁移到共享类似功能的目标应用程序。他们通过利用GUI小部件的文本信息之间的语义相似性来实现这一点。GUI事件的语义匹配在这些方法中起着至关重要的作用。本文首次对GUI事件的语义匹配进行了实证研究。我们的研究涉及253个语义匹配配置、337个唯一查询和8099个不同的GUI事件。我们报告了如何改进测试重用方法的语义匹配的几个关键发现，提出了一种优于现有解决方案的新的语义匹配算法SemFinder，并确定了几个有趣的研究方向。

引用次数: 19

SAND: a static analysis approach for detecting SQL antipatterns SAND:用于检测SQL反模式的静态分析方法

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-07-11 DOI: 10.1145/3460319.3464818

Yingjun Lyu, Sasha Volokh, William G. J. Halfond, Omer Tripp

Local databases underpin important features in many mobile applications, such as responsiveness in the face of poor connectivity. However, failure to use such databases correctly can lead to high resource consumption or even security vulnerabilities. We present SAND, an extensible static analysis approach that checks for misuse of local databases, also known as SQL antipatterns, in mobile apps. SAND features novel abstractions for common forms of application/database interactions, which enables concise and precise specification of the antipatterns that SAND checks for. To validate the efficacy of SAND, we have experimented with a diverse suite of 1,000 Android apps. We show that the abstractions that power SAND allow concise specification of all the known antipatterns from the literature (12-74 LOC), and that the antipatterns are modeled accurately (99.4-100% precision). As for performance, SAND requires on average 41 seconds to complete a scan on a mobile app.

本地数据库支撑着许多移动应用程序的重要特性，比如在连接差的情况下的响应能力。但是，如果不能正确使用这些数据库，可能会导致资源消耗过高，甚至出现安全漏洞。我们介绍了SAND，一种可扩展的静态分析方法，用于检查移动应用程序中对本地数据库的滥用，也称为SQL反模式。SAND为应用程序/数据库交互的常见形式提供了新颖的抽象，这使得SAND检查的反模式能够得到简明而精确的说明。为了验证SAND的有效性，我们用1000个不同的Android应用程序套件进行了实验。我们展示了驱动SAND的抽象允许从文献(12-74 LOC)中对所有已知的反模式进行简明的规范，并且反模式被精确地建模(99.4-100%精度)。在性能方面，SAND在移动应用程序上完成一次扫描平均需要41秒。

引用次数: 5

TERA: optimizing stochastic regression tests in machine learning projects TERA:优化机器学习项目中的随机回归测试

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-07-11 DOI: 10.1145/3460319.3464844

Saikat Dutta, Jeeva Selvam, Aryaman Jain, Sasa Misailovic

The stochastic nature of many Machine Learning (ML) algorithms makes testing of ML tools and libraries challenging. ML algorithms allow a developer to control their accuracy and run-time through a set of hyper-parameters, which are typically manually selected in tests. This choice is often too conservative and leads to slow test executions, thereby increasing the cost of regression testing. We propose TERA, the first automated technique for reducing the cost of regression testing in Machine Learning tools and libraries(jointly referred to as projects) without making the tests more flaky. TERA solves the problem of exploring the trade-off space between execution time of the test and its flakiness as an instance of Stochastic Optimization over the space of algorithm hyper-parameters. TERA presents how to leverage statistical convergence-testing techniques to estimate the level of flakiness of the test for a specific choice of hyper-parameters during optimization. We evaluate TERA on a corpus of 160 tests selected from 15 popular machine learning projects. Overall, TERA obtains a geo-mean speedup of 2.23x over the original tests, for the minimum passing probability threshold of 99%. We also show that the new tests did not reduce fault detection ability through a mutation study and a study on a set of 12 historical build failures in studied projects.

许多机器学习(ML)算法的随机性使得机器学习工具和库的测试具有挑战性。机器学习算法允许开发人员通过一组超参数控制其准确性和运行时间，这些参数通常在测试中手动选择。这种选择通常过于保守，导致测试执行缓慢，从而增加回归测试的成本。我们提出TERA，这是第一个用于减少机器学习工具和库(统称为项目)中回归测试成本的自动化技术，而不会使测试更加零散。TERA作为算法超参数空间上的随机优化实例，解决了在测试执行时间和薄片之间寻找权衡空间的问题。TERA介绍了如何利用统计收敛测试技术来估计优化过程中特定超参数选择的测试的片状程度。我们在从15个流行的机器学习项目中选择的160个测试语料库上评估TERA。总的来说，TERA获得了比原始测试2.23倍的地理平均加速，最小通过概率阈值为99%。我们还通过突变研究和对研究项目中12个历史构建失败的研究表明，新的测试并没有降低故障检测能力。

{"title":"TERA: optimizing stochastic regression tests in machine learning projects","authors":"Saikat Dutta, Jeeva Selvam, Aryaman Jain, Sasa Misailovic","doi":"10.1145/3460319.3464844","DOIUrl":"https://doi.org/10.1145/3460319.3464844","url":null,"abstract":"The stochastic nature of many Machine Learning (ML) algorithms makes testing of ML tools and libraries challenging. ML algorithms allow a developer to control their accuracy and run-time through a set of hyper-parameters, which are typically manually selected in tests. This choice is often too conservative and leads to slow test executions, thereby increasing the cost of regression testing. We propose TERA, the first automated technique for reducing the cost of regression testing in Machine Learning tools and libraries(jointly referred to as projects) without making the tests more flaky. TERA solves the problem of exploring the trade-off space between execution time of the test and its flakiness as an instance of Stochastic Optimization over the space of algorithm hyper-parameters. TERA presents how to leverage statistical convergence-testing techniques to estimate the level of flakiness of the test for a specific choice of hyper-parameters during optimization. We evaluate TERA on a corpus of 160 tests selected from 15 popular machine learning projects. Overall, TERA obtains a geo-mean speedup of 2.23x over the original tests, for the minimum passing probability threshold of 99%. We also show that the new tests did not reduce fault detection ability through a mutation study and a study on a set of 12 historical build failures in studied projects.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114576839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Type and interval aware array constraint solving for symbolic execution 符号执行的类型和间隔感知数组约束求解

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-07-11 DOI: 10.1145/3460319.3464826

Ziqi Shuai, Zhenbang Chen, Yufeng Zhang, Jun Sun, Ji Wang

Array constraints are prevalent in analyzing a program with symbolic execution. Solving array constraints is challenging due to the complexity of the precise encoding for arrays. In this work, we propose to synergize symbolic execution and array constraint solving. Our method addresses the difficulties in solving array constraints with novel ideas. First, we propose a lightweight method for pre-checking the unsatisfiability of array constraints based on integer linear programming. Second, observing that encoding arrays at the byte-level introduces many redundant axioms that reduce the effectiveness of constraint solving, we propose type and interval aware axiom generation. Note that the type information of array variables is inferred by symbolic execution, whereas interval information is calculated through the above pre-checking step. We have implemented our methods based on KLEE and its underlying constraint solver STP and conducted large-scale experiments on 75 real-world programs. The experimental results show that our method effectively improves the efficiency of symbolic execution. Our method solves 182.56% more constraints and explores 277.56% more paths on average under the same time threshold.

在分析具有符号执行的程序时，数组约束非常普遍。由于数组精确编码的复杂性，求解数组约束具有挑战性。在这项工作中，我们提出了符号执行和数组约束求解的协同。我们的方法以新颖的思想解决了求解阵列约束的困难。首先，提出了一种基于整数线性规划的阵列约束不满足性预检验的轻量级方法。其次，观察到字节级编码数组引入了许多冗余公理，降低了约束求解的有效性，我们提出了类型和间隔感知公理生成。注意，数组变量的类型信息是通过符号执行推断出来的，而间隔信息是通过上述预检查步骤计算出来的。我们基于KLEE及其底层约束求解器STP实现了我们的方法，并在75个真实世界的程序上进行了大规模实验。实验结果表明，该方法有效地提高了符号执行的效率。在相同的时间阈值下，我们的方法平均多解决了182.56%的约束，探索了277.56%的路径。

{"title":"Type and interval aware array constraint solving for symbolic execution","authors":"Ziqi Shuai, Zhenbang Chen, Yufeng Zhang, Jun Sun, Ji Wang","doi":"10.1145/3460319.3464826","DOIUrl":"https://doi.org/10.1145/3460319.3464826","url":null,"abstract":"Array constraints are prevalent in analyzing a program with symbolic execution. Solving array constraints is challenging due to the complexity of the precise encoding for arrays. In this work, we propose to synergize symbolic execution and array constraint solving. Our method addresses the difficulties in solving array constraints with novel ideas. First, we propose a lightweight method for pre-checking the unsatisfiability of array constraints based on integer linear programming. Second, observing that encoding arrays at the byte-level introduces many redundant axioms that reduce the effectiveness of constraint solving, we propose type and interval aware axiom generation. Note that the type information of array variables is inferred by symbolic execution, whereas interval information is calculated through the above pre-checking step. We have implemented our methods based on KLEE and its underlying constraint solver STP and conducted large-scale experiments on 75 real-world programs. The experimental results show that our method effectively improves the efficiency of symbolic execution. Our method solves 182.56% more constraints and explores 277.56% more paths on average under the same time threshold.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129398238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7