Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis最新文献

英文中文

DeepHyperion: exploring the feature space of deep learning-based systems through illumination search DeepHyperion:通过光照搜索来探索基于深度学习的系统的特征空间

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-07-05 DOI: 10.1145/3460319.3464811

Tahereh Zohdinasab, Vincenzo Riccio, Alessio Gambi, P. Tonella

Deep Learning (DL) has been successfully applied to a wide range of application domains, including safety-critical ones. Several DL testing approaches have been recently proposed in the literature but none of them aims to assess how different interpretable features of the generated inputs affect the system's behaviour. In this paper, we resort to Illumination Search to find the highest-performing test cases (i.e., misbehaving and closest to misbehaving), spread across the cells of a map representing the feature space of the system. We introduce a methodology that guides the users of our approach in the tasks of identifying and quantifying the dimensions of the feature space for a given domain. We developed DeepHyperion, a search-based tool for DL systems that illuminates, i.e., explores at large, the feature space, by providing developers with an interpretable feature map where automatically generated inputs are placed along with information about the exposed behaviours.

深度学习(DL)已经成功地应用于广泛的应用领域，包括安全关键领域。最近在文献中提出了几种深度学习测试方法，但它们都没有旨在评估生成的输入的不同可解释特征如何影响系统的行为。在本文中，我们借助于照明搜索来找到性能最高的测试用例(即，不正常行为和最接近不正常行为)，分布在表示系统特征空间的地图的单元中。我们介绍了一种方法，指导我们的方法的用户在识别和量化给定域的特征空间的维度的任务。我们开发了DeepHyperion，这是一个用于深度学习系统的基于搜索的工具，通过为开发人员提供可解释的特征映射，将自动生成的输入与有关暴露行为的信息一起放置，从而照亮，即在大范围内探索特征空间。

引用次数: 41

ModelDiff: testing-based DNN similarity comparison for model reuse detection ModelDiff:基于测试的DNN相似度比较，用于模型重用检测

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-06-11 DOI: 10.1145/3460319.3464816

Yuanchun Li, Ziqi Zhang, Bingyan Liu, Ziyue Yang, Yunxin Liu

The knowledge of a deep learning model may be transferred to a student model, leading to intellectual property infringement or vulnerability propagation. Detecting such knowledge reuse is nontrivial because the suspect models may not be white-box accessible and/or may serve different tasks. In this paper, we propose ModelDiff, a testing-based approach to deep learning model similarity comparison. Instead of directly comparing the weights, activations, or outputs of two models, we compare their behavioral patterns on the same set of test inputs. Specifically, the behavioral pattern of a model is represented as a decision distance vector (DDV), in which each element is the distance between the model's reactions to a pair of inputs. The knowledge similarity between two models is measured with the cosine similarity between their DDVs. To evaluate ModelDiff, we created a benchmark that contains 144 pairs of models that cover most popular model reuse methods, including transfer learning, model compression, and model stealing. Our method achieved 91.7% correctness on the benchmark, which demonstrates the effectiveness of using ModelDiff for model reuse detection. A study on mobile deep learning apps has shown the feasibility of ModelDiff on real-world models.

深度学习模型的知识可能被转移到学生模型中，导致知识产权侵权或漏洞传播。检测这样的知识重用是非常重要的，因为可疑模型可能不是白盒可访问的，并且/或者可能服务于不同的任务。在本文中，我们提出了ModelDiff，一种基于测试的深度学习模型相似性比较方法。我们不是直接比较两个模型的权重、激活或输出，而是在同一组测试输入上比较它们的行为模式。具体来说，模型的行为模式被表示为决策距离向量(DDV)，其中每个元素是模型对一对输入的反应之间的距离。两个模型之间的知识相似度用它们的ddv之间的余弦相似度来度量。为了评估ModelDiff，我们创建了一个包含144对模型的基准，这些模型涵盖了最流行的模型重用方法，包括迁移学习、模型压缩和模型窃取。我们的方法在基准上达到了91.7%的正确率，证明了使用ModelDiff进行模型重用检测的有效性。一项针对移动深度学习应用的研究表明，ModelDiff在现实世界模型上是可行的。

{"title":"ModelDiff: testing-based DNN similarity comparison for model reuse detection","authors":"Yuanchun Li, Ziqi Zhang, Bingyan Liu, Ziyue Yang, Yunxin Liu","doi":"10.1145/3460319.3464816","DOIUrl":"https://doi.org/10.1145/3460319.3464816","url":null,"abstract":"The knowledge of a deep learning model may be transferred to a student model, leading to intellectual property infringement or vulnerability propagation. Detecting such knowledge reuse is nontrivial because the suspect models may not be white-box accessible and/or may serve different tasks. In this paper, we propose ModelDiff, a testing-based approach to deep learning model similarity comparison. Instead of directly comparing the weights, activations, or outputs of two models, we compare their behavioral patterns on the same set of test inputs. Specifically, the behavioral pattern of a model is represented as a decision distance vector (DDV), in which each element is the distance between the model's reactions to a pair of inputs. The knowledge similarity between two models is measured with the cosine similarity between their DDVs. To evaluate ModelDiff, we created a benchmark that contains 144 pairs of models that cover most popular model reuse methods, including transfer learning, model compression, and model stealing. Our method achieved 91.7% correctness on the benchmark, which demonstrates the effectiveness of using ModelDiff for model reuse detection. A study on mobile deep learning apps has shown the feasibility of ModelDiff on real-world models.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"25 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125742948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Validating static warnings via testing code fragments 通过测试代码片段来验证静态警告

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-06-08 DOI: 10.1145/3460319.3464832

Ashwin Kallingal Joshy, Xueyuan Chen, Benjamin Steenhoek, Wei Le

Static analysis is an important approach for finding bugs and vulnerabilities in software. However, inspecting and confirming static warnings are challenging and time-consuming. In this paper, we present a novel solution that automatically generates test cases based on static warnings to validate true and false positives. We designed a syntactic patching algorithm that can generate syntactically valid, semantic preserving executable code fragments from static warnings. We developed a build and testing system to automatically test code fragments using fuzzers, KLEE and Valgrind. We evaluated our techniques using 12 real-world C projects and 1955 warnings from two commercial static analysis tools. We successfully built 68.5% code fragments and generated 1003 test cases. Through automatic testing, we identified 48 true positives and 27 false positives, and 205 likely false positives. We matched 4 CVE and real-world bugs using Helium, and they are only triggered by our tool but not other baseline tools. We found that testing code fragments is scalable and useful; it can trigger bugs that testing entire programs or testing procedures failed to trigger.

静态分析是发现软件缺陷和漏洞的重要方法。然而，检查和确认静态警告是具有挑战性和耗时的。在本文中，我们提出了一种新的解决方案，该方案基于静态警告自动生成测试用例，以验证真阳性和假阳性。我们设计了一种语法补丁算法，可以从静态警告中生成语法有效、语义保留的可执行代码片段。我们开发了一个构建和测试系统，使用fuzzers, KLEE和Valgrind自动测试代码片段。我们使用12个真实的C项目和来自两个商业静态分析工具的1955个警告来评估我们的技术。我们成功地构建了68.5%的代码片段，并生成了1003个测试用例。通过自动测试，我们确定了48个真阳性和27个假阳性，以及205个可能的假阳性。我们使用Helium匹配了4个CVE和真实世界的bug，它们仅由我们的工具触发，而不是其他基准工具。我们发现测试代码片段是可扩展且有用的;它可能会触发测试整个程序或测试过程无法触发的错误。

{"title":"Validating static warnings via testing code fragments","authors":"Ashwin Kallingal Joshy, Xueyuan Chen, Benjamin Steenhoek, Wei Le","doi":"10.1145/3460319.3464832","DOIUrl":"https://doi.org/10.1145/3460319.3464832","url":null,"abstract":"Static analysis is an important approach for finding bugs and vulnerabilities in software. However, inspecting and confirming static warnings are challenging and time-consuming. In this paper, we present a novel solution that automatically generates test cases based on static warnings to validate true and false positives. We designed a syntactic patching algorithm that can generate syntactically valid, semantic preserving executable code fragments from static warnings. We developed a build and testing system to automatically test code fragments using fuzzers, KLEE and Valgrind. We evaluated our techniques using 12 real-world C projects and 1955 warnings from two commercial static analysis tools. We successfully built 68.5% code fragments and generated 1003 test cases. Through automatic testing, we identified 48 true positives and 27 false positives, and 205 likely false positives. We matched 4 CVE and real-world bugs using Helium, and they are only triggered by our tool but not other baseline tools. We found that testing code fragments is scalable and useful; it can trigger bugs that testing entire programs or testing procedures failed to trigger.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114828859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

QFuzz: quantitative fuzzing for side channels QFuzz:侧通道的定量模糊

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-06-07 DOI: 10.1145/3460319.3464817

Yannic Noller, Saeid Tizpaz-Niari

Side channels pose a significant threat to the confidentiality of software systems. Such vulnerabilities are challenging to detect and evaluate because they arise from non-functional properties of software such as execution times and require reasoning on multiple execution traces. Recently, noninterference notions have been adapted in static analysis, symbolic execution, and greybox fuzzing techniques. However, noninterference is a strict notion and may reject security even if the strength of information leaks are weak. A quantitative notion of security allows for the relaxation of noninterference and tolerates small (unavoidable) leaks. Despite progress in recent years, the existing quantitative approaches have scalability limitations in practice. In this work, we present QFuzz, a greybox fuzzing technique to quantitatively evaluate the strength of side channels with a focus on min entropy. Min entropy is a measure based on the number of distinguishable observations (partitions) to assess the resulting threat from an attacker who tries to compromise secrets in one try. We develop a novel greybox fuzzing equipped with two partitioning algorithms that try to maximize the number of distinguishable observations and the cost differences between them. We evaluate QFuzz on a large set of benchmarks from existing work and real-world libraries (with a total of 70 subjects). QFuzz compares favorably to three state-of-the-art detection techniques. QFuzz provides quantitative information about leaks beyond the capabilities of all three techniques. Crucially, we compare QFuzz to a state-of-the-art quantification tool and find that QFuzz significantly outperforms the tool in scalability while maintaining similar precision. Overall, we find that our approach scales well for real-world applications and provides useful information to evaluate resulting threats. Additionally, QFuzz identifies a zero-day side-channel vulnerability in a security critical Java library that has since been confirmed and fixed by the developers.

侧信道对软件系统的保密性构成重大威胁。这些漏洞很难检测和评估，因为它们来自软件的非功能属性，比如执行时间，并且需要对多个执行跟踪进行推理。最近，非干扰概念被应用于静态分析、符号执行和灰盒模糊技术中。然而，不干涉是一个严格的概念，即使信息泄漏的强度很弱，也可能拒绝安全性。安全的定量概念允许放松不干扰，并容忍小的(不可避免的)泄漏。尽管近年来取得了进展，但现有的定量方法在实践中存在可扩展性限制。在这项工作中，我们提出了QFuzz，一种灰盒模糊技术，用于定量评估侧边信道的强度，重点是最小熵。最小熵是一种基于可区分观察值(分区)数量的度量，用于评估攻击者试图一次性泄露机密所造成的威胁。我们开发了一种新的灰盒模糊，配备了两种分区算法，试图最大化可区分的观测值的数量和它们之间的成本差异。我们在现有工作和实际库(总共有70个主题)的大量基准上评估QFuzz。QFuzz优于三种最先进的检测技术。QFuzz提供了有关泄漏的定量信息，这些泄漏超出了所有三种技术的能力。至关重要的是，我们将QFuzz与最先进的量化工具进行了比较，发现QFuzz在保持类似精度的同时，在可扩展性方面明显优于该工具。总的来说，我们发现我们的方法适用于现实世界的应用程序，并为评估产生的威胁提供了有用的信息。此外，QFuzz还在一个安全关键的Java库中识别了一个零日侧通道漏洞，该漏洞已被开发人员确认并修复。

{"title":"QFuzz: quantitative fuzzing for side channels","authors":"Yannic Noller, Saeid Tizpaz-Niari","doi":"10.1145/3460319.3464817","DOIUrl":"https://doi.org/10.1145/3460319.3464817","url":null,"abstract":"Side channels pose a significant threat to the confidentiality of software systems. Such vulnerabilities are challenging to detect and evaluate because they arise from non-functional properties of software such as execution times and require reasoning on multiple execution traces. Recently, noninterference notions have been adapted in static analysis, symbolic execution, and greybox fuzzing techniques. However, noninterference is a strict notion and may reject security even if the strength of information leaks are weak. A quantitative notion of security allows for the relaxation of noninterference and tolerates small (unavoidable) leaks. Despite progress in recent years, the existing quantitative approaches have scalability limitations in practice. In this work, we present QFuzz, a greybox fuzzing technique to quantitatively evaluate the strength of side channels with a focus on min entropy. Min entropy is a measure based on the number of distinguishable observations (partitions) to assess the resulting threat from an attacker who tries to compromise secrets in one try. We develop a novel greybox fuzzing equipped with two partitioning algorithms that try to maximize the number of distinguishable observations and the cost differences between them. We evaluate QFuzz on a large set of benchmarks from existing work and real-world libraries (with a total of 70 subjects). QFuzz compares favorably to three state-of-the-art detection techniques. QFuzz provides quantitative information about leaks beyond the capabilities of all three techniques. Crucially, we compare QFuzz to a state-of-the-art quantification tool and find that QFuzz significantly outperforms the tool in scalability while maintaining similar precision. Overall, we find that our approach scales well for real-world applications and provides useful information to evaluate resulting threats. Additionally, QFuzz identifies a zero-day side-channel vulnerability in a security critical Java library that has since been confirmed and fixed by the developers.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129576421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Exposing previously undetectable faults in deep neural networks 揭示深度神经网络中以前无法检测到的故障

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-06-01 DOI: 10.1145/3460319.3464801

Isaac Dunn, Hadrien Pouget, D. Kroening, T. Melham

Existing methods for testing DNNs solve the oracle problem by constraining the raw features (e.g. image pixel values) to be within a small distance of a dataset example for which the desired DNN output is known. But this limits the kinds of faults these approaches are able to detect. In this paper, we introduce a novel DNN testing method that is able to find faults in DNNs that other methods cannot. The crux is that, by leveraging generative machine learning, we can generate fresh test inputs that vary in their high-level features (for images, these include object shape, location, texture, and colour). We demonstrate that our approach is capable of detecting deliberately injected faults as well as new faults in state-of-the-art DNNs, and that in both cases, existing methods are unable to find these faults.

现有的测试DNN的方法通过将原始特征(例如图像像素值)限制在已知所需DNN输出的数据集示例的一小段距离内来解决oracle问题。但这限制了这些方法能够检测到的故障种类。在本文中，我们介绍了一种新的深度神经网络测试方法，它能够发现其他方法无法发现的深度神经网络中的故障。关键是，通过利用生成式机器学习，我们可以生成新的测试输入，这些输入在其高级特征(对于图像，这些特征包括物体形状、位置、纹理和颜色)上有所不同。我们证明，我们的方法能够检测故意注入的故障以及最先进的dnn中的新故障，并且在这两种情况下，现有方法都无法找到这些故障。

引用次数: 16

Attack as defense: characterizing adversarial examples using robustness 攻击即防御:使用鲁棒性描述对抗性示例

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-03-13 DOI: 10.1145/3460319.3464822

Zhe Zhao, Guangke Chen, Jingyi Wang, Yiwei Yang, Fu Song, Jun Sun

As a new programming paradigm, deep learning has expanded its application to many real-world problems. At the same time, deep learning based software are found to be vulnerable to adversarial attacks. Though various defense mechanisms have been proposed to improve robustness of deep learning software, many of them are ineffective against adaptive attacks. In this work, we propose a novel characterization to distinguish adversarial examples from benign ones based on the observation that adversarial examples are significantly less robust than benign ones. As existing robustness measurement does not scale to large networks, we propose a novel defense framework, named attack as defense (A2D), to detect adversarial examples by effectively evaluating an example’s robustness. A2D uses the cost of attacking an input for robustness evaluation and identifies those less robust examples as adversarial since less robust examples are easier to attack. Extensive experiment results on MNIST, CIFAR10 and ImageNet show that A2D is more effective than recent promising approaches. We also evaluate our defense against potential adaptive attacks and show that A2D is effective in defending carefully designed adaptive attacks, e.g., the attack success rate drops to 0% on CIFAR10.

作为一种新的编程范式，深度学习已经扩展到许多现实问题的应用。与此同时，人们发现基于深度学习的软件容易受到对抗性攻击。虽然已经提出了各种防御机制来提高深度学习软件的鲁棒性，但其中许多机制对自适应攻击无效。在这项工作中，我们提出了一种新的特征来区分对抗样本和良性样本，这是基于对对抗样本的鲁棒性明显低于良性样本的观察。由于现有的鲁棒性测量不能扩展到大型网络，我们提出了一种新的防御框架，称为攻击即防御(A2D)，通过有效评估示例的鲁棒性来检测对抗示例。A2D使用攻击输入的代价进行鲁棒性评估，并将那些不太鲁棒的示例识别为对抗性的，因为不太鲁棒的示例更容易被攻击。在MNIST, CIFAR10和ImageNet上的大量实验结果表明，A2D比最近有前途的方法更有效。我们还评估了我们对潜在自适应攻击的防御，并表明A2D在防御精心设计的自适应攻击方面是有效的，例如，攻击成功率在CIFAR10上降至0%。

{"title":"Attack as defense: characterizing adversarial examples using robustness","authors":"Zhe Zhao, Guangke Chen, Jingyi Wang, Yiwei Yang, Fu Song, Jun Sun","doi":"10.1145/3460319.3464822","DOIUrl":"https://doi.org/10.1145/3460319.3464822","url":null,"abstract":"As a new programming paradigm, deep learning has expanded its application to many real-world problems. At the same time, deep learning based software are found to be vulnerable to adversarial attacks. Though various defense mechanisms have been proposed to improve robustness of deep learning software, many of them are ineffective against adaptive attacks. In this work, we propose a novel characterization to distinguish adversarial examples from benign ones based on the observation that adversarial examples are significantly less robust than benign ones. As existing robustness measurement does not scale to large networks, we propose a novel defense framework, named attack as defense (A2D), to detect adversarial examples by effectively evaluating an example’s robustness. A2D uses the cost of attacking an input for robustness evaluation and identifies those less robust examples as adversarial since less robust examples are easier to attack. Extensive experiment results on MNIST, CIFAR10 and ImageNet show that A2D is more effective than recent promising approaches. We also evaluate our defense against potential adaptive attacks and show that A2D is effective in defending carefully designed adaptive attacks, e.g., the attack success rate drops to 0% on CIFAR10.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130449669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Model-based testing of networked applications 基于模型的网络应用测试

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-01-31 DOI: 10.1145/3460319.3464798

Yishuai Li, B. Pierce, S. Zdancewic

We present a principled automatic testing framework for application-layer protocols. The key innovation is a domain-specific embedded language for writing nondeterministic models of the behavior of networked servers. These models are defined within the Coq interactive theorem prover, supporting a smooth transition from testing to formal verification. Given a server model, we show how to automatically derive a tester that probes the server for unexpected behaviors. We address the uncertainties caused by both the server's internal choices and the network delaying messages nondeterministically. The derived tester accepts server implementations whose possible behaviors are a subset of those allowed by the nondeterministic model. We demonstrate the effectiveness of this framework by using it to specify and test a fragment of the HTTP/1.1 protocol, showing that the automatically derived tester can capture RFC violations in buggy server implementations, including the latest versions of Apache and Nginx.

提出了一种原则性的应用层协议自动测试框架。关键的创新是一种特定于领域的嵌入式语言，用于编写网络服务器行为的不确定性模型。这些模型是在Coq交互定理证明器中定义的，支持从测试到正式验证的平滑过渡。给定一个服务器模型，我们将展示如何自动派生一个测试器来探测服务器的意外行为。我们不确定地解决了服务器内部选择和网络延迟消息引起的不确定性。派生的测试器接受服务器实现，其可能的行为是不确定性模型所允许的行为的子集。我们通过使用它来指定和测试HTTP/1.1协议的一个片段来证明这个框架的有效性，表明自动派生的测试器可以在有bug的服务器实现中捕获RFC违规，包括最新版本的Apache和Nginx。

引用次数: 4

ProFuzzBench: a benchmark for stateful protocol fuzzing ProFuzzBench:一个有状态协议模糊测试的基准

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2021-01-13 DOI: 10.1145/3460319.3469077

R. Natella, Van-Thuan Pham

We present a new benchmark (ProFuzzBench) for stateful fuzzing of network protocols. The benchmark includes a suite of representative open-source network servers for popular protocols, and tools to automate experimentation. We discuss challenges and potential directions for future research based on this benchmark.

我们提出了一个新的网络协议状态模糊测试基准(ProFuzzBench)。该基准测试包括一套用于流行协议的代表性开源网络服务器，以及自动化实验的工具。我们在此基础上讨论了未来研究的挑战和潜在方向。

引用次数: 30

Automatic test suite generation for key-points detection DNNs using many-objective search (experience paper) 使用多目标搜索自动生成关键点检测dnn的测试套件(经验论文)

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2020-12-11 DOI: 10.1145/3460319.3464802

Fitash Ul Haq, Donghwan Shin, L. Briand, Thomas Stifter, Jun Wang

Automatically detecting the positions of key-points (e.g., facial key-points or finger key-points) in an image is an essential problem in many applications, such as driver's gaze detection and drowsiness detection in automated driving systems. With the recent advances of Deep Neural Networks (DNNs), Key-Points detection DNNs (KP-DNNs) have been increasingly employed for that purpose. Nevertheless, KP-DNN testing and validation have remained a challenging problem because KP-DNNs predict many independent key-points at the same time---where each individual key-point may be critical in the targeted application---and images can vary a great deal according to many factors. In this paper, we present an approach to automatically generate test data for KP-DNNs using many-objective search. In our experiments, focused on facial key-points detection DNNs developed for an industrial automotive application, we show that our approach can generate test suites to severely mispredict, on average, more than 93% of all key-points. In comparison, random search-based test data generation can only severely mispredict 41% of them. Many of these mispredictions, however, are not avoidable and should not therefore be considered failures. We also empirically compare state-of-the-art, many-objective search algorithms and their variants, tailored for test suite generation. Furthermore, we investigate and demonstrate how to learn specific conditions, based on image characteristics (e.g., head posture and skin color), that lead to severe mispredictions. Such conditions serve as a basis for risk analysis or DNN retraining.

自动检测图像中关键点的位置(例如面部关键点或手指关键点)是许多应用中的关键问题，例如自动驾驶系统中的驾驶员凝视检测和睡意检测。随着深度神经网络(dnn)的最新进展，关键点检测dnn (kp - dnn)已越来越多地用于这一目的。然而，KP-DNN测试和验证仍然是一个具有挑战性的问题，因为KP-DNN同时预测许多独立的关键点——其中每个单独的关键点在目标应用中可能是关键的——并且图像可能根据许多因素而变化很大。在本文中，我们提出了一种使用多目标搜索自动生成kp - dnn测试数据的方法。在我们的实验中，专注于为工业汽车应用开发的面部关键点检测dnn，我们表明我们的方法可以生成严重错误预测的测试套件，平均超过93%的所有关键点。相比之下，基于随机搜索的测试数据生成只能严重错误地预测其中的41%。然而，许多这样的错误预测是不可避免的，因此不应被视为失败。我们还经验地比较了最先进的、多目标的搜索算法及其变体，为测试套件的生成量身定制。此外，我们研究并演示了如何根据图像特征(例如，头部姿势和肤色)学习导致严重错误预测的特定条件。这些条件可作为风险分析或DNN再培训的基础。

{"title":"Automatic test suite generation for key-points detection DNNs using many-objective search (experience paper)","authors":"Fitash Ul Haq, Donghwan Shin, L. Briand, Thomas Stifter, Jun Wang","doi":"10.1145/3460319.3464802","DOIUrl":"https://doi.org/10.1145/3460319.3464802","url":null,"abstract":"Automatically detecting the positions of key-points (e.g., facial key-points or finger key-points) in an image is an essential problem in many applications, such as driver's gaze detection and drowsiness detection in automated driving systems. With the recent advances of Deep Neural Networks (DNNs), Key-Points detection DNNs (KP-DNNs) have been increasingly employed for that purpose. Nevertheless, KP-DNN testing and validation have remained a challenging problem because KP-DNNs predict many independent key-points at the same time---where each individual key-point may be critical in the targeted application---and images can vary a great deal according to many factors. In this paper, we present an approach to automatically generate test data for KP-DNNs using many-objective search. In our experiments, focused on facial key-points detection DNNs developed for an industrial automotive application, we show that our approach can generate test suites to severely mispredict, on average, more than 93% of all key-points. In comparison, random search-based test data generation can only severely mispredict 41% of them. Many of these mispredictions, however, are not avoidable and should not therefore be considered failures. We also empirically compare state-of-the-art, many-objective search algorithms and their variants, tailored for test suite generation. Furthermore, we investigate and demonstrate how to learn specific conditions, based on image characteristics (e.g., head posture and skin color), that lead to severe mispredictions. Such conditions serve as a basis for risk analysis or DNN retraining.","PeriodicalId":188008,"journal":{"name":"Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131577156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀