Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis最新文献_第6页

Pytest-Smell: a smell detection tool for Python unit tests Pytest-Smell: Python单元测试的气味检测工具

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3543290

Alexandru Bodea

Code quality and design are key factors in building a successful software application. It is known that a good internal structure assures a good external quality. To improve code quality, several guidelines and best practices are defined. Along with these, a key contribution is brought by unit testing. Just like the source code, unit test code is subject to bad programming practices, known as defects or smells, that have a negative impact on the quality of the software system. As a consequence, the system becomes harder to understand, maintain, and more prone to issues and bugs. In this respect, methods and tools that automate the detection of the aforementioned unit test smells are of the utmost importance. While there are several tools that aim to address the automatic detection of unit test smells, the majority of them are focused on Java software systems. Moreover, the only known such framework designed for applications written in Python performs the detection only on Unittest Python testing library. In addition to this, it relies on an IDE to run, which heavily restricts its usage. The tool proposed within this paper aims to close this gap, introducing a new framework which focuses on detecting Python test smells built with Pytest testing framework. As far as we know, a similar tool to automate the process of test smell detection for unit tests written in Pytest has not been developed yet. The proposed solution also addresses the portability issue, being a cross-platform, easy to install and use Python library.

代码质量和设计是构建成功软件应用程序的关键因素。众所周知，良好的内部结构保证了良好的外部质量。为了提高代码质量，定义了一些指导方针和最佳实践。除此之外，单元测试还带来了一个重要的贡献。就像源代码一样，单元测试代码受制于糟糕的编程实践，即缺陷或气味，它们对软件系统的质量有负面影响。因此，系统变得更加难以理解和维护，并且更容易出现问题和错误。在这方面，自动检测上述单元测试气味的方法和工具是至关重要的。虽然有几个工具旨在解决单元测试气味的自动检测，但它们中的大多数都集中在Java软件系统上。此外，为用Python编写的应用程序设计的唯一已知的此类框架仅在Unittest Python测试库上执行检测。除此之外，它依赖于IDE来运行，这严重限制了它的使用。本文中提出的工具旨在缩小这一差距，引入了一个新的框架，该框架专注于检测使用Pytest测试框架构建的Python测试气味。据我们所知，还没有开发出类似的工具来自动化用Pytest编写的单元测试的测试气味检测过程。提议的解决方案还解决了可移植性问题，它是一个跨平台的、易于安装和使用的Python库。

{"title":"Pytest-Smell: a smell detection tool for Python unit tests","authors":"Alexandru Bodea","doi":"10.1145/3533767.3543290","DOIUrl":"https://doi.org/10.1145/3533767.3543290","url":null,"abstract":"Code quality and design are key factors in building a successful software application. It is known that a good internal structure assures a good external quality. To improve code quality, several guidelines and best practices are defined. Along with these, a key contribution is brought by unit testing. Just like the source code, unit test code is subject to bad programming practices, known as defects or smells, that have a negative impact on the quality of the software system. As a consequence, the system becomes harder to understand, maintain, and more prone to issues and bugs. In this respect, methods and tools that automate the detection of the aforementioned unit test smells are of the utmost importance. While there are several tools that aim to address the automatic detection of unit test smells, the majority of them are focused on Java software systems. Moreover, the only known such framework designed for applications written in Python performs the detection only on Unittest Python testing library. In addition to this, it relies on an IDE to run, which heavily restricts its usage. The tool proposed within this paper aims to close this gap, introducing a new framework which focuses on detecting Python test smells built with Pytest testing framework. As far as we know, a similar tool to automate the process of test smell detection for unit tests written in Pytest has not been developed yet. The proposed solution also addresses the portability issue, being a cross-platform, easy to install and use Python library.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134638366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Finding bugs in Gremlin-based graph database systems via Randomized differential testing 通过随机差分测试发现基于gremlin的图形数据库系统中的bug

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534409

Yingying Zheng, Wensheng Dou, Yicheng Wang, Zheng Qin, Leile Tang, Yu Gao, Dong Wang, Wei Wang, Jun Wei

Graph database systems (GDBs) allow efficiently storing and retrieving graph data, and have become the critical component in many applications, e.g., knowledge graphs, social networks, and fraud detection. It is important to ensure that GDBs operate correctly. Logic bugs can occur and make GDBs return an incorrect result for a given query. These bugs are critical and can easily go unnoticed by developers when the graph and queries become complicated. Despite the importance of GDBs, logic bugs in GDBs have received less attention than those in relational database systems. In this paper, we present Grand, an approach for automatically finding logic bugs in GDBs that adopt Gremlin as their query language. The core idea of Grand is to construct semantically equivalent databases for multiple GDBs, and then compare the results of a Gremlin query on these databases. If the return results of a query on multiple GDBs are different, the likely cause is a logic bug in these GDBs. To effectively test GDBs, we propose a model-based query generation approach to generate valid Gremlin queries that can potentially return non-empty results, and a data mapping approach to unify the format of query results for different GDBs. We evaluate Grand on six widely-used GDBs, e.g., Neo4j and HugeGraph. In total, we have found 21 previously-unknown logic bugs in these GDBs. Among them, developers have confirmed 18 bugs, and fixed 7 bugs.

图形数据库系统(gdb)允许高效地存储和检索图形数据，并已成为许多应用程序的关键组成部分，例如知识图谱，社交网络和欺诈检测。确保gdb正常运行非常重要。可能出现逻辑错误，并使gdb为给定查询返回不正确的结果。当图形和查询变得复杂时，这些错误很容易被开发人员忽视。尽管gdb很重要，但与关系数据库系统中的逻辑错误相比，gdb中的逻辑错误受到的关注较少。在本文中，我们提出了Grand，一种在采用Gremlin作为查询语言的gdb中自动查找逻辑错误的方法。Grand的核心思想是为多个gdb构建语义等效的数据库，然后比较这些数据库上的Gremlin查询的结果。如果在多个gdb上查询的返回结果不同，可能的原因是这些gdb中的逻辑错误。为了有效地测试gdb，我们提出了一种基于模型的查询生成方法来生成有效的Gremlin查询，这些查询可能会返回非空结果，并提出了一种数据映射方法来统一不同gdb的查询结果格式。我们在六个广泛使用的gdb上评估了Grand，例如Neo4j和HugeGraph。总的来说，我们在这些gdb中发现了21个以前未知的逻辑错误。其中，开发人员确认了18个bug，修复了7个bug。

{"title":"Finding bugs in Gremlin-based graph database systems via Randomized differential testing","authors":"Yingying Zheng, Wensheng Dou, Yicheng Wang, Zheng Qin, Leile Tang, Yu Gao, Dong Wang, Wei Wang, Jun Wei","doi":"10.1145/3533767.3534409","DOIUrl":"https://doi.org/10.1145/3533767.3534409","url":null,"abstract":"Graph database systems (GDBs) allow efficiently storing and retrieving graph data, and have become the critical component in many applications, e.g., knowledge graphs, social networks, and fraud detection. It is important to ensure that GDBs operate correctly. Logic bugs can occur and make GDBs return an incorrect result for a given query. These bugs are critical and can easily go unnoticed by developers when the graph and queries become complicated. Despite the importance of GDBs, logic bugs in GDBs have received less attention than those in relational database systems. In this paper, we present Grand, an approach for automatically finding logic bugs in GDBs that adopt Gremlin as their query language. The core idea of Grand is to construct semantically equivalent databases for multiple GDBs, and then compare the results of a Gremlin query on these databases. If the return results of a query on multiple GDBs are different, the likely cause is a logic bug in these GDBs. To effectively test GDBs, we propose a model-based query generation approach to generate valid Gremlin queries that can potentially return non-empty results, and a data mapping approach to unify the format of query results for different GDBs. We evaluate Grand on six widely-used GDBs, e.g., Neo4j and HugeGraph. In total, we have found 21 previously-unknown logic bugs in these GDBs. Among them, developers have confirmed 18 bugs, and fixed 7 bugs.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122609818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

HybridRepair: towards annotation-efficient repair for deep learning models HybridRepair:面向深度学习模型的标注高效修复

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534408

Yu LI, Mu-Hwa Chen, Qiang Xu

A well-trained deep learning (DL) model often cannot achieve expected performance after deployment due to the mismatch between the distributions of the training data and the field data in the operational environment. Therefore, repairing DL models is critical, especially when deployed on increasingly larger tasks with shifted distributions. Generally speaking, it is easy to obtain a large amount of field data. Existing solutions develop various techniques to select a subset for annotation and then fine-tune the model for repair. While effective, achieving a higher repair rate is inevitably associated with more expensive labeling costs. To mitigate this problem, we propose a novel annotation-efficient repair solution for DL models, namely HybridRepair, wherein we take a holistic approach that coordinates the use of a small amount of annotated data and a large amount of unlabeled data for repair. Our key insight is that accurate yet sufficient training data is needed to repair the corresponding failure region in the data distribution. Under a given labeling budget, we selectively annotate some data in failure regions and propagate their labels to the neighboring data on the one hand. On the other hand, we take advantage of the semi-supervised learning (SSL) techniques to further boost the training data density. However, different from existing SSL solutions that try to use all the unlabeled data, we only use a selected part of them considering the impact of distribution shift on SSL solutions. Experimental results show that HybridRepair outperforms both state-of-the-art DL model repair solutions and semi-supervised techniques for model improvements, especially when there is a distribution shift between the training data and the field data. Our code is available at: https://github.com/cure-lab/HybridRepair.

由于训练数据的分布与实际操作环境中的现场数据不匹配，训练有素的深度学习(DL)模型在部署后往往无法达到预期的性能。因此，修复DL模型是至关重要的，特别是当部署在具有移位分布的日益庞大的任务上时。一般来说，很容易获得大量的现场数据。现有的解决方案开发了各种技术来选择一个子集进行注释，然后对模型进行微调以进行修复。虽然有效，但实现更高的修复率不可避免地与更昂贵的标签成本相关。为了缓解这个问题，我们为深度学习模型提出了一种新的注释高效修复解决方案，即HybridRepair，其中我们采用了一种整体方法，协调使用少量注释数据和大量未标记数据进行修复。我们的关键见解是，需要准确而足够的训练数据来修复数据分布中相应的故障区域。在给定的标注预算下，我们一方面有选择地标注失效区域的一些数据，并将它们的标注传播到邻近的数据中。另一方面，我们利用半监督学习(SSL)技术进一步提高训练数据密度。然而，与尝试使用所有未标记数据的现有SSL解决方案不同，考虑到分布转移对SSL解决方案的影响，我们只使用其中的一部分。实验结果表明，在模型改进方面，HybridRepair优于最先进的深度学习模型修复解决方案和半监督技术，特别是当训练数据和现场数据之间的分布发生变化时。我们的代码可在:https://github.com/cure-lab/HybridRepair。

{"title":"HybridRepair: towards annotation-efficient repair for deep learning models","authors":"Yu LI, Mu-Hwa Chen, Qiang Xu","doi":"10.1145/3533767.3534408","DOIUrl":"https://doi.org/10.1145/3533767.3534408","url":null,"abstract":"A well-trained deep learning (DL) model often cannot achieve expected performance after deployment due to the mismatch between the distributions of the training data and the field data in the operational environment. Therefore, repairing DL models is critical, especially when deployed on increasingly larger tasks with shifted distributions. Generally speaking, it is easy to obtain a large amount of field data. Existing solutions develop various techniques to select a subset for annotation and then fine-tune the model for repair. While effective, achieving a higher repair rate is inevitably associated with more expensive labeling costs. To mitigate this problem, we propose a novel annotation-efficient repair solution for DL models, namely HybridRepair, wherein we take a holistic approach that coordinates the use of a small amount of annotated data and a large amount of unlabeled data for repair. Our key insight is that accurate yet sufficient training data is needed to repair the corresponding failure region in the data distribution. Under a given labeling budget, we selectively annotate some data in failure regions and propagate their labels to the neighboring data on the one hand. On the other hand, we take advantage of the semi-supervised learning (SSL) techniques to further boost the training data density. However, different from existing SSL solutions that try to use all the unlabeled data, we only use a selected part of them considering the impact of distribution shift on SSL solutions. Experimental results show that HybridRepair outperforms both state-of-the-art DL model repair solutions and semi-supervised techniques for model improvements, especially when there is a distribution shift between the training data and the field data. Our code is available at: https://github.com/cure-lab/HybridRepair.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122615902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Hunting bugs with accelerated optimal graph vertex matching 用加速最优图顶点匹配寻找bug

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534393

Xiaohui Zhang, Yuanjun Gong, Bin Liang, Jianjun Huang, Wei You, Wenchang Shi, Jian Zhang

Various techniques based on code similarity measurement have been proposed to detect bugs. Essentially, the code fragment can be regarded as a kind of graph. Performing code graph similarity comparison to identify the potential bugs is a natural choice. However, the logic of a bug often involves only a few statements in the code fragment, while others are bug-irrelevant. They can be considered as a kind of noise, and can heavily interfere with the code similarity measurement. In theory, performing optimal vertex matching can address the problem well, but the task is NP-complete and cannot be applied to a large-scale code base. In this paper, we propose a two-phase strategy to accelerate code graph vertex matching for detecting bugs. In the first phase, a vertex matching embedding model is trained and used to rapidly filter a limited number of candidate code graphs from the target code base, which are likely to have a high vertex matching degree with the seed, i.e., the known buggy code. As a result, the number of code graphs needed to be further analyzed is dramatically reduced. In the second phase, a high-order similarity embedding model based on graph convolutional neural network is built to efficiently get the approximately optimal vertex matching between the seed and candidates. On this basis, the code graph similarity is calculated to identify the potential buggy code. The proposed method is applied to five open source projects. In total, 31 unknown bugs were successfully detected and confirmed by developers. Comparative experiments demonstrate that our method can effectively mitigate the noise problem, and the detection efficiency can be improved dozens of times with the two-phase strategy.

人们提出了各种基于代码相似度度量的技术来检测bug。从本质上讲，代码片段可以看作是一种图。执行代码图相似性比较来识别潜在的bug是一种自然的选择。然而，错误的逻辑通常只涉及代码片段中的几个语句，而其他语句与错误无关。它们可以看作是一种噪声，严重干扰代码相似度的测量。理论上，执行最优顶点匹配可以很好地解决这个问题，但这个任务是np完备的，不能应用于大规模的代码库。在本文中，我们提出了一种两阶段策略来加速代码图顶点匹配以检测bug。在第一阶段，训练一个顶点匹配嵌入模型，从目标代码库中快速筛选有限数量的候选代码图，这些候选代码图可能与种子(即已知的bug代码)具有较高的顶点匹配度。因此，需要进一步分析的代码图的数量大大减少了。第二阶段，建立基于图卷积神经网络的高阶相似度嵌入模型，有效地获得种子与候选点之间的近似最优顶点匹配;在此基础上，计算代码图相似度来识别潜在的bug代码。将该方法应用于五个开源项目。总共有31个未知bug被开发人员成功检测并确认。对比实验表明，该方法可以有效地缓解噪声问题，并且两相策略的检测效率可以提高数十倍。

{"title":"Hunting bugs with accelerated optimal graph vertex matching","authors":"Xiaohui Zhang, Yuanjun Gong, Bin Liang, Jianjun Huang, Wei You, Wenchang Shi, Jian Zhang","doi":"10.1145/3533767.3534393","DOIUrl":"https://doi.org/10.1145/3533767.3534393","url":null,"abstract":"Various techniques based on code similarity measurement have been proposed to detect bugs. Essentially, the code fragment can be regarded as a kind of graph. Performing code graph similarity comparison to identify the potential bugs is a natural choice. However, the logic of a bug often involves only a few statements in the code fragment, while others are bug-irrelevant. They can be considered as a kind of noise, and can heavily interfere with the code similarity measurement. In theory, performing optimal vertex matching can address the problem well, but the task is NP-complete and cannot be applied to a large-scale code base. In this paper, we propose a two-phase strategy to accelerate code graph vertex matching for detecting bugs. In the first phase, a vertex matching embedding model is trained and used to rapidly filter a limited number of candidate code graphs from the target code base, which are likely to have a high vertex matching degree with the seed, i.e., the known buggy code. As a result, the number of code graphs needed to be further analyzed is dramatically reduced. In the second phase, a high-order similarity embedding model based on graph convolutional neural network is built to efficiently get the approximately optimal vertex matching between the seed and candidates. On this basis, the code graph similarity is calculated to identify the potential buggy code. The proposed method is applied to five open source projects. In total, 31 unknown bugs were successfully detected and confirmed by developers. Comparative experiments demonstrate that our method can effectively mitigate the noise problem, and the detection efficiency can be improved dozens of times with the two-phase strategy.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127300255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

SmartDagger: a bytecode-based static analysis approach for detecting cross-contract vulnerability SmartDagger:基于字节码的静态分析方法，用于检测交叉合约漏洞

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534222

Zeqin Liao, Zibin Zheng, X. Chen, Yuhong Nan

With the increasing popularity of blockchain, automatically detecting vulnerabilities in smart contracts is becoming a significant problem. Prior research mainly identifies smart contract vulnerabilities without considering the interactions between multiple contracts. Due to the lack of analyzing the fine-grained contextual information during cross-contract invocations, existing approaches often produced a large number of false positives and false negatives. This paper proposes SmartDagger, a new framework for detecting cross-contract vulnerability through static analysis at the bytecode level. SmartDagger integrates a set of novel mechanisms to ensure its effectiveness and efficiency for cross-contract vulnerability detection. Particularly, SmartDagger effectively recovers the contract attribute information from the smart contract bytecode, which is critical for accurately identifying cross-contract vulnerabilities. Besides, instead of performing the typical whole-program analysis which is heavy-weight and time-consuming, SmartDagger selectively analyzes a subset of functions and reuses the data-flow results, which helps to improve its efficiency. Our further evaluation over a manually labelled dataset showed that SmartDagger significantly outperforms other state-of-the-art tools (i.e., Oyente, Slither, Osiris, and Mythril) for detecting cross-contract vulnerabilities. In addition, running SmartDagger over a randomly selected dataset of 250 smart contracts in the real-world, SmartDagger detects 11 cross-contract vulnerabilities, all of which are missed by prior tools.

随着区块链的日益普及，自动检测智能合约中的漏洞正成为一个重要问题。先前的研究主要是识别智能合约漏洞，而没有考虑多个合约之间的交互。由于在交叉契约调用期间缺乏对细粒度上下文信息的分析，现有的方法通常会产生大量的误报和误报。本文提出了一种通过字节码级静态分析检测交叉合约漏洞的新框架SmartDagger。SmartDagger集成了一套新颖的机制，以确保其跨契约漏洞检测的有效性和效率。特别是SmartDagger能够有效地从智能合约字节码中恢复合约属性信息，这对于准确识别跨合约漏洞至关重要。此外，SmartDagger不像传统的全程序分析那样重量级且耗时，而是选择性地分析功能子集并重用数据流结果，这有助于提高其效率。我们对手动标记数据集的进一步评估表明，SmartDagger在检测交叉合约漏洞方面明显优于其他最先进的工具(即Oyente, Slither, Osiris和Mythril)。此外，在现实世界中随机选择250个智能合约的数据集上运行SmartDagger, SmartDagger检测到11个交叉合约漏洞，所有这些漏洞都是之前的工具所遗漏的。

{"title":"SmartDagger: a bytecode-based static analysis approach for detecting cross-contract vulnerability","authors":"Zeqin Liao, Zibin Zheng, X. Chen, Yuhong Nan","doi":"10.1145/3533767.3534222","DOIUrl":"https://doi.org/10.1145/3533767.3534222","url":null,"abstract":"With the increasing popularity of blockchain, automatically detecting vulnerabilities in smart contracts is becoming a significant problem. Prior research mainly identifies smart contract vulnerabilities without considering the interactions between multiple contracts. Due to the lack of analyzing the fine-grained contextual information during cross-contract invocations, existing approaches often produced a large number of false positives and false negatives. This paper proposes SmartDagger, a new framework for detecting cross-contract vulnerability through static analysis at the bytecode level. SmartDagger integrates a set of novel mechanisms to ensure its effectiveness and efficiency for cross-contract vulnerability detection. Particularly, SmartDagger effectively recovers the contract attribute information from the smart contract bytecode, which is critical for accurately identifying cross-contract vulnerabilities. Besides, instead of performing the typical whole-program analysis which is heavy-weight and time-consuming, SmartDagger selectively analyzes a subset of functions and reuses the data-flow results, which helps to improve its efficiency. Our further evaluation over a manually labelled dataset showed that SmartDagger significantly outperforms other state-of-the-art tools (i.e., Oyente, Slither, Osiris, and Mythril) for detecting cross-contract vulnerabilities. In addition, running SmartDagger over a randomly selected dataset of 250 smart contracts in the real-world, SmartDagger detects 11 cross-contract vulnerabilities, all of which are missed by prior tools.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128774243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

eTainter: detecting gas-related vulnerabilities in smart contracts eTainter:在智能合约中检测与天然气相关的漏洞

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-07-18 DOI: 10.1145/3533767.3534378

Asem Ghaleb, J. Rubin, K. Pattabiraman

The execution of smart contracts on the Ethereum blockchain consumes gas paid for by users submitting contracts' invocation requests. A contract execution proceeds as long as the users dedicate enough gas, within the limit set by Ethereum. If insufficient gas is provided, the contract execution halts and changes made during execution get reverted. Unfortunately, contracts may contain code patterns that increase execution cost, causing the contracts to run out of gas. These patterns can be manipulated by malicious attackers to induce unwanted behavior in the targeted victim contracts, e.g., Denial-of-Service (DoS) attacks. We call these gas-related vulnerabilities. We propose eTainter, a static analyzer for detecting gas-related vulnerabilities based on taint tracking in the bytecode of smart contracts. We evaluate eTainter by comparing it with the prior work, MadMax, on a dataset of annotated contracts. The results show that eTainter outperforms MadMax in both precision and recall, and that eTainter has a precision of 90% based on manual inspection. We also use eTainter to perform large-scale analysis of 60,612 real-world contracts on the Ethereum blockchain. We find that gas-related vulnerabilities exist in 2,763 of these contracts, and that eTainter analyzes a contract in eight seconds, on average.

在以太坊区块链上执行智能合约会消耗用户提交合约调用请求所支付的gas。只要用户在以太坊设定的限制内投入足够的gas，合同就会执行。如果供气不足，合同将停止执行，执行过程中所做的更改将被恢复。不幸的是，契约可能包含增加执行成本的代码模式，从而导致契约耗尽gas。这些模式可以被恶意攻击者操纵，以在目标受害者合同中诱导不希望的行为，例如拒绝服务(DoS)攻击。我们称之为与天然气相关的脆弱性。我们提出了eTainter，这是一个静态分析器，用于检测基于智能合约字节码中的污染跟踪的气体相关漏洞。我们通过将eTainter与之前的工作MadMax进行比较来评估eTainter, MadMax是在一个带注释的合同数据集上完成的。结果表明，eTainter在精度和召回率方面都优于MadMax，并且基于人工检查的eTainter精度达到90%。我们还使用eTainter对以太坊区块链上的60,612份真实合同进行了大规模分析。我们发现2763份合约中存在与天然气相关的漏洞，eTainter平均在8秒内分析一份合约。

{"title":"eTainter: detecting gas-related vulnerabilities in smart contracts","authors":"Asem Ghaleb, J. Rubin, K. Pattabiraman","doi":"10.1145/3533767.3534378","DOIUrl":"https://doi.org/10.1145/3533767.3534378","url":null,"abstract":"The execution of smart contracts on the Ethereum blockchain consumes gas paid for by users submitting contracts' invocation requests. A contract execution proceeds as long as the users dedicate enough gas, within the limit set by Ethereum. If insufficient gas is provided, the contract execution halts and changes made during execution get reverted. Unfortunately, contracts may contain code patterns that increase execution cost, causing the contracts to run out of gas. These patterns can be manipulated by malicious attackers to induce unwanted behavior in the targeted victim contracts, e.g., Denial-of-Service (DoS) attacks. We call these gas-related vulnerabilities. We propose eTainter, a static analyzer for detecting gas-related vulnerabilities based on taint tracking in the bytecode of smart contracts. We evaluate eTainter by comparing it with the prior work, MadMax, on a dataset of annotated contracts. The results show that eTainter outperforms MadMax in both precision and recall, and that eTainter has a precision of 90% based on manual inspection. We also use eTainter to perform large-scale analysis of 60,612 real-world contracts on the Ethereum blockchain. We find that gas-related vulnerabilities exist in 2,763 of these contracts, and that eTainter analyzes a contract in eight seconds, on average.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127351573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Automated testing of image captioning systems 图像字幕系统的自动化测试

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-06-14 DOI: 10.1145/3533767.3534389

Boxi Yu, Zhiqi Zhong, Xinran Qin, Jiayi Yao, Yuancheng Wang, Pinjia He

Image captioning (IC) systems, which automatically generate a text description of the salient objects in an image (real or synthetic), have seen great progress over the past few years due to the development of deep neural networks. IC plays an indispensable role in human society, for example, labeling massive photos for scientific studies and assisting visually-impaired people in perceiving the world. However, even the top-notch IC systems, such as Microsoft Azure Cognitive Services and IBM Image Caption Generator, may return incorrect results, leading to the omission of important objects, deep misunderstanding, and threats to personal safety. To address this problem, we propose MetaIC, the first metamorphic testing approach to validate IC systems. Our core idea is that the object names should exhibit directional changes after object insertion. Specifically, MetaIC (1) extracts objects from existing images to construct an object corpus; (2) inserts an object into an image via novel object resizing and location tuning algorithms; and (3) reports image pairs whose captions do not exhibit differences in an expected way. In our evaluation, we use MetaIC to test one widely-adopted image captioning API and five state-of-the-art (SOTA) image captioning models. Using 1,000 seeds, MetaIC successfully reports 16,825 erroneous issues with high precision (84.9%-98.4%). There are three kinds of errors: misclassification, omission, and incorrect quantity. We visualize the errors reported by MetaIC, which shows that flexible overlapping setting facilitates IC testing by increasing and diversifying the reported errors. In addition, MetaIC can be further generalized to detect label errors in the training dataset, which has successfully detected 151 incorrect labels in MS COCO Caption, a standard dataset in image captioning.

由于深度神经网络的发展，图像字幕(IC)系统在过去几年中取得了很大的进步，该系统自动生成图像(真实或合成)中显著物体的文本描述。IC在人类社会中发挥着不可或缺的作用，例如为科学研究标记大量照片，帮助视障人士感知世界。然而，即使是最顶尖的IC系统，如微软Azure认知服务和IBM Image Caption Generator，也可能返回不正确的结果，导致遗漏重要对象，造成深刻的误解，并威胁人身安全。为了解决这个问题，我们提出了MetaIC，这是验证IC系统的第一个变质测试方法。我们的核心思想是对象名称应该在对象插入后显示方向变化。具体而言，MetaIC(1)从现有图像中提取对象，构建对象语料库;(2)通过新的目标调整大小和位置调整算法将目标插入图像;(3)报告标题没有表现出预期差异的图像对。在我们的评估中，我们使用MetaIC测试了一个被广泛采用的图像字幕API和五个最先进的(SOTA)图像字幕模型。使用1000个种子，MetaIC以高精度(84.9%-98.4%)成功报告了16,825个错误问题。错误有三种:分类错误、遗漏错误和数量错误。我们可视化了MetaIC报告的错误，这表明灵活的重叠设置通过增加和多样化报告的错误来促进IC测试。此外，MetaIC还可以进一步推广到训练数据集中的标签错误检测，该方法已经成功地在图像字幕的标准数据集MS COCO Caption中检测了151个错误标签。

{"title":"Automated testing of image captioning systems","authors":"Boxi Yu, Zhiqi Zhong, Xinran Qin, Jiayi Yao, Yuancheng Wang, Pinjia He","doi":"10.1145/3533767.3534389","DOIUrl":"https://doi.org/10.1145/3533767.3534389","url":null,"abstract":"Image captioning (IC) systems, which automatically generate a text description of the salient objects in an image (real or synthetic), have seen great progress over the past few years due to the development of deep neural networks. IC plays an indispensable role in human society, for example, labeling massive photos for scientific studies and assisting visually-impaired people in perceiving the world. However, even the top-notch IC systems, such as Microsoft Azure Cognitive Services and IBM Image Caption Generator, may return incorrect results, leading to the omission of important objects, deep misunderstanding, and threats to personal safety. To address this problem, we propose MetaIC, the first metamorphic testing approach to validate IC systems. Our core idea is that the object names should exhibit directional changes after object insertion. Specifically, MetaIC (1) extracts objects from existing images to construct an object corpus; (2) inserts an object into an image via novel object resizing and location tuning algorithms; and (3) reports image pairs whose captions do not exhibit differences in an expected way. In our evaluation, we use MetaIC to test one widely-adopted image captioning API and five state-of-the-art (SOTA) image captioning models. Using 1,000 seeds, MetaIC successfully reports 16,825 erroneous issues with high precision (84.9%-98.4%). There are three kinds of errors: misclassification, omission, and incorrect quantity. We visualize the errors reported by MetaIC, which shows that flexible overlapping setting facilitates IC testing by increasing and diversifying the reported errors. In addition, MetaIC can be further generalized to detect label errors in the training dataset, which has successfully detected 151 incorrect labels in MS COCO Caption, a standard dataset in image captioning.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115745882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

ESBMC-Jimple: verifying Kotlin programs via jimple intermediate representation 通过jimple中间表示验证Kotlin程序

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-06-09 DOI: 10.1145/3533767.3543294

Rafael S. Menezes, Daniel Moura, Helena Cavalcante, Rosiane de Freitas, L. Cordeiro

We describe and evaluate the first model checker for verifying Kotlin programs through the Jimple intermediate representation. The verifier, named ESBMC-Jimple, is built on top of the Efficient SMT-based Context-Bounded Model Checker (ESBMC). It uses the Soot framework to obtain the Jimple IR, representing a simplified version of the Kotlin source code, containing a maximum of three operands per instruction. ESBMC-Jimple processes Kotlin source code together with a model of the standard Kotlin libraries and checks a set of safety properties. Experimental results show that ESBMC-Jimple can correctly verify a set of Kotlin benchmarks from the literature; it is competitive with state-of-the-art Java bytecode verifiers. A demonstration is available at https://youtu.be/J6WhNfXvJNc.

我们通过Jimple中间表示描述并评估了用于验证Kotlin程序的第一个模型检查器。这个名为ESBMC- jimple的验证器建立在高效的基于smt的上下文边界模型检查器(ESBMC)之上。它使用Soot框架获得Jimple IR，它表示Kotlin源代码的简化版本，每条指令最多包含三个操作数。ESBMC-Jimple将Kotlin源代码与标准Kotlin库模型一起处理，并检查一组安全属性。实验结果表明，ESBMC-Jimple可以正确地验证一组来自文献的Kotlin基准;它可以与最先进的Java字节码验证器竞争。可以在https://youtu.be/J6WhNfXvJNc上获得演示。

引用次数: 0

Automatically detecting API-induced compatibility issues in Android apps: a comparative analysis (replicability study) 自动检测Android应用中api引起的兼容性问题:比较分析(可复制性研究)

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-05-31 DOI: 10.1145/3533767.3534407

Pei Liu, Yanjie Zhao, Haipeng Cai, M. Fazzini, John C. Grundy, Li Li

Fragmentation is a serious problem in the Android ecosystem. This problem is mainly caused by the fast evolution of the system itself and the various customizations independently maintained by different smartphone manufacturers. Many efforts have attempted to mitigate its impact via approaches to automatically pinpoint compatibility issues in Android apps. Unfortunately, at this stage, it is still unknown if this objective has been fulfilled, and the existing approaches can indeed be replicated and reliably leveraged to pinpoint compatibility issues in the wild. We, therefore, propose to fill this gap by first conducting a literature review within this topic to identify all the available approaches. Among the nine identified approaches, we then try our best to reproduce them based on their original datasets. After that, we go one step further to empirically compare those approaches against common datasets with real-world apps containing compatibility issues. Experimental results show that existing tools can indeed be reproduced, but their capabilities are quite distinct, as confirmed by the fact that there is only a small overlap of the results reported by the selected tools. This evidence suggests that more efforts should be spent by our community to achieve sound compatibility issues detection.

碎片化是Android生态系统中的一个严重问题。这个问题主要是由于系统本身的快速演变和不同智能手机厂商独立维护的各种定制造成的。许多努力都试图通过自动查明Android应用程序中的兼容性问题来减轻其影响。不幸的是，在这个阶段，我们仍然不知道这个目标是否已经实现，现有的方法确实可以被复制，并可靠地利用，以查明野外的兼容性问题。因此，我们建议通过首先在本主题内进行文献综述来确定所有可用的方法来填补这一空白。在确定的九种方法中，我们然后根据它们的原始数据集尽力再现它们。之后，我们进一步将这些方法与包含兼容性问题的真实应用程序的通用数据集进行实证比较。实验结果表明，现有的工具确实可以被复制，但它们的能力是相当不同的，这一点得到了事实的证实，即所选工具报告的结果只有很小的重叠。这一证据表明，我们的社区应该花更多的努力来实现良好的兼容性问题检测。

{"title":"Automatically detecting API-induced compatibility issues in Android apps: a comparative analysis (replicability study)","authors":"Pei Liu, Yanjie Zhao, Haipeng Cai, M. Fazzini, John C. Grundy, Li Li","doi":"10.1145/3533767.3534407","DOIUrl":"https://doi.org/10.1145/3533767.3534407","url":null,"abstract":"Fragmentation is a serious problem in the Android ecosystem. This problem is mainly caused by the fast evolution of the system itself and the various customizations independently maintained by different smartphone manufacturers. Many efforts have attempted to mitigate its impact via approaches to automatically pinpoint compatibility issues in Android apps. Unfortunately, at this stage, it is still unknown if this objective has been fulfilled, and the existing approaches can indeed be replicated and reliably leveraged to pinpoint compatibility issues in the wild. We, therefore, propose to fill this gap by first conducting a literature review within this topic to identify all the available approaches. Among the nine identified approaches, we then try our best to reproduce them based on their original datasets. After that, we go one step further to empirically compare those approaches against common datasets with real-world apps containing compatibility issues. Experimental results show that existing tools can indeed be reproduced, but their capabilities are quite distinct, as confirmed by the fact that there is only a small overlap of the results reported by the selected tools. This evidence suggests that more efforts should be spent by our community to achieve sound compatibility issues detection.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116479931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

jTrans: jump-aware transformer for binary code similarity detection 用于二进制代码相似度检测的跳转感知变压器

Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2022-05-25 DOI: 10.1145/3533767.3534367

Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, Chao Zhang

Binary code similarity detection (BCSD) has important applications in various fields such as vulnerabilities detection, software component analysis, and reverse engineering. Recent studies have shown that deep neural networks (DNNs) can comprehend instructions or control-flow graphs (CFG) of binary code and support BCSD. In this study, we propose a novel Transformer-based approach, namely jTrans, to learn representations of binary code. It is the first solution that embeds control flow information of binary code into Transformer-based language models, by using a novel jump-aware representation of the analyzed binaries and a newly-designed pre-training task. Additionally, we release to the community a newly-created large dataset of binaries, BinaryCorp, which is the most diverse to date. Evaluation results show that jTrans outperforms state-of-the-art (SOTA) approaches on this more challenging dataset by 30.5% (i.e., from 32.0% to 62.5%). In a real-world task of known vulnerability searching, jTrans achieves a recall that is 2X higher than existing SOTA baselines.

二进制代码相似度检测(BCSD)在漏洞检测、软件组件分析、逆向工程等领域有着重要的应用。近年来的研究表明，深度神经网络(dnn)可以理解二进制代码的指令或控制流图(CFG)，并支持BCSD。在这项研究中，我们提出了一种新的基于转换器的方法，即jTrans，来学习二进制代码的表示。这是第一个将二进制代码的控制流信息嵌入到基于transformer的语言模型中的解决方案，该方案使用了一种新的跳跃感知表示分析二进制代码和新设计的预训练任务。此外，我们向社区发布了一个新创建的大型二进制数据集BinaryCorp，这是迄今为止最多样化的。评估结果显示，在这个更具挑战性的数据集上，jTrans比最先进的(SOTA)方法高出30.5%(即从32.0%到62.5%)。在已知漏洞搜索的真实任务中，jTrans实现了比现有SOTA基线高2倍的召回率。

{"title":"jTrans: jump-aware transformer for binary code similarity detection","authors":"Hao Wang, Wenjie Qu, Gilad Katz, Wenyu Zhu, Zeyu Gao, Han Qiu, Jianwei Zhuge, Chao Zhang","doi":"10.1145/3533767.3534367","DOIUrl":"https://doi.org/10.1145/3533767.3534367","url":null,"abstract":"Binary code similarity detection (BCSD) has important applications in various fields such as vulnerabilities detection, software component analysis, and reverse engineering. Recent studies have shown that deep neural networks (DNNs) can comprehend instructions or control-flow graphs (CFG) of binary code and support BCSD. In this study, we propose a novel Transformer-based approach, namely jTrans, to learn representations of binary code. It is the first solution that embeds control flow information of binary code into Transformer-based language models, by using a novel jump-aware representation of the analyzed binaries and a newly-designed pre-training task. Additionally, we release to the community a newly-created large dataset of binaries, BinaryCorp, which is the most diverse to date. Evaluation results show that jTrans outperforms state-of-the-art (SOTA) approaches on this more challenging dataset by 30.5% (i.e., from 32.0% to 62.5%). In a real-world task of known vulnerability searching, jTrans achieves a recall that is 2X higher than existing SOTA baselines.","PeriodicalId":412271,"journal":{"name":"Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114679286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43