ACM Transactions on Software Engineering and Methodology (TOSEM)最新文献_第3页

Opinion Mining for Software Development: A Systematic Literature Review 软件开发中的意见挖掘:系统的文献综述

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2022-03-07 DOI: 10.1145/3490388

B. Lin, Nathan Cassee, Alexander Serebrenik, G. Bavota, Nicole Novielli, Michele Lanza

Opinion mining, sometimes referred to as sentiment analysis, has gained increasing attention in software engineering (SE) studies. SE researchers have applied opinion mining techniques in various contexts, such as identifying developers’ emotions expressed in code comments and extracting users’ critics toward mobile apps. Given the large amount of relevant studies available, it can take considerable time for researchers and developers to figure out which approaches they can adopt in their own studies and what perils these approaches entail. We conducted a systematic literature review involving 185 papers. More specifically, we present (1) well-defined categories of opinion mining-related software development activities, (2) available opinion mining approaches, whether they are evaluated when adopted in other studies, and how their performance is compared, (3) available datasets for performance evaluation and tool customization, and (4) concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques. The results of our study serve as references to choose suitable opinion mining tools for software development activities and provide critical insights for the further development of opinion mining techniques in the SE domain.

意见挖掘，有时也称为情感分析，在软件工程(SE)研究中得到了越来越多的关注。SE研究人员已经将意见挖掘技术应用于各种环境中，例如识别开发人员在代码注释中表达的情绪，以及提取用户对移动应用程序的批评。考虑到大量的相关研究，研究人员和开发人员可能需要相当长的时间来确定他们可以在自己的研究中采用哪些方法以及这些方法带来的风险。我们对185篇论文进行了系统的文献综述。更具体地说，我们提出了(1)与意见挖掘相关的软件开发活动的定义明确的类别，(2)可用的意见挖掘方法，它们是否在其他研究中被评估，以及如何比较它们的性能，(3)用于性能评估和工具定制的可用数据集，以及(4)SE研究人员在应用/定制这些意见挖掘技术时可能需要考虑的问题或限制。我们的研究结果为软件开发活动选择合适的意见挖掘工具提供了参考，并为意见挖掘技术在SE领域的进一步发展提供了重要的见解。

{"title":"Opinion Mining for Software Development: A Systematic Literature Review","authors":"B. Lin, Nathan Cassee, Alexander Serebrenik, G. Bavota, Nicole Novielli, Michele Lanza","doi":"10.1145/3490388","DOIUrl":"https://doi.org/10.1145/3490388","url":null,"abstract":"Opinion mining, sometimes referred to as sentiment analysis, has gained increasing attention in software engineering (SE) studies. SE researchers have applied opinion mining techniques in various contexts, such as identifying developers’ emotions expressed in code comments and extracting users’ critics toward mobile apps. Given the large amount of relevant studies available, it can take considerable time for researchers and developers to figure out which approaches they can adopt in their own studies and what perils these approaches entail. We conducted a systematic literature review involving 185 papers. More specifically, we present (1) well-defined categories of opinion mining-related software development activities, (2) available opinion mining approaches, whether they are evaluated when adopted in other studies, and how their performance is compared, (3) available datasets for performance evaluation and tool customization, and (4) concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques. The results of our study serve as references to choose suitable opinion mining tools for software development activities and provide critical insights for the further development of opinion mining techniques in the SE domain.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"39 1","pages":"1 - 41"},"PeriodicalIF":0.0,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79873296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Verification of Distributed Systems via Sequential Emulation 基于顺序仿真的分布式系统验证

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2022-03-07 DOI: 10.1145/3490387

Luca Di Stefano, R. De Nicola, Omar Inverso

Sequential emulation is a semantics-based technique to automatically reduce property checking of distributed systems to the analysis of sequential programs. An automated procedure takes as input a formal specification of a distributed system, a property of interest, and the structural operational semantics of the specification language and generates a sequential program whose execution traces emulate the possible evolutions of the considered system. The problem as to whether the property of interest holds for the system can then be expressed either as a reachability or as a termination query on the program. This allows to immediately adapt mature verification techniques developed for general-purpose languages to domain-specific languages, and to effortlessly integrate new techniques as soon as they become available. We test our approach on a selection of concurrent systems originated from different contexts from population protocols to models of flocking behaviour. By combining a comprehensive range of program verification techniques, from traditional symbolic execution to modern inductive-based methods such as property-directed reachability, we are able to draw consistent and correct verification verdicts for the considered systems.

顺序仿真是一种基于语义的技术，它将分布式系统的属性检查自动简化为顺序程序的分析。自动化过程将分布式系统的正式规范、感兴趣的属性和规范语言的结构操作语义作为输入，并生成一个顺序程序，其执行跟踪模拟所考虑的系统的可能演变。关于感兴趣的属性是否适用于系统的问题，可以表示为程序的可达性或终止查询。这允许立即将为通用语言开发的成熟验证技术适应于特定于领域的语言，并在新技术可用时毫不费力地集成它们。我们在从种群协议到群集行为模型的不同上下文中选择并发系统来测试我们的方法。通过结合广泛的程序验证技术，从传统的符号执行到现代的基于归纳的方法，如属性导向可达性，我们能够为考虑的系统得出一致和正确的验证结论。

{"title":"Verification of Distributed Systems via Sequential Emulation","authors":"Luca Di Stefano, R. De Nicola, Omar Inverso","doi":"10.1145/3490387","DOIUrl":"https://doi.org/10.1145/3490387","url":null,"abstract":"Sequential emulation is a semantics-based technique to automatically reduce property checking of distributed systems to the analysis of sequential programs. An automated procedure takes as input a formal specification of a distributed system, a property of interest, and the structural operational semantics of the specification language and generates a sequential program whose execution traces emulate the possible evolutions of the considered system. The problem as to whether the property of interest holds for the system can then be expressed either as a reachability or as a termination query on the program. This allows to immediately adapt mature verification techniques developed for general-purpose languages to domain-specific languages, and to effortlessly integrate new techniques as soon as they become available. We test our approach on a selection of concurrent systems originated from different contexts from population protocols to models of flocking behaviour. By combining a comprehensive range of program verification techniques, from traditional symbolic execution to modern inductive-based methods such as property-directed reachability, we are able to draw consistent and correct verification verdicts for the considered systems.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"9 1","pages":"1 - 41"},"PeriodicalIF":0.0,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89468964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

L2S: A Framework for Synthesizing the Most Probable Program under a Specification 在一个规范下合成最可能程序的框架

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2022-03-07 DOI: 10.1145/3487570

Yingfei Xiong, Bo Wang

In many scenarios, we need to find the most likely program that meets a specification under a local context, where the local context can be an incomplete program, a partial specification, natural language description, and so on. We call such a problem program estimation. In this article, we propose a framework, LingLong Synthesis Framework (L2S), to address this problem. Compared with existing work, our work is novel in the following aspects. (1) We propose a theory of expansion rules to describe how to decompose a program into choices. (2) We propose an approach based on abstract interpretation to efficiently prune off the program sub-space that does not satisfy the specification. (3) We prove that the probability of a program is the product of the probabilities of choosing expansion rules, regardless of the choosing order. (4) We reduce the program estimation problem to a pathfinding problem, enabling existing pathfinding algorithms to solve this problem. L2S has been applied to program generation and program repair. In this article, we report our instantiation of this framework for synthesizing conditional expressions (L2S-Cond) and repairing conditional statements (L2S-Hanabi). The experiments on L2S-Cond show that each option enabled by L2S, including the expansion rules, the pruning technique, and the use of different pathfinding algorithms, plays a major role in the performance of the approach. The default configuration of L2S-Cond correctly predicts nearly 60% of the conditional expressions in the top 5 candidates. Moreover, we evaluate L2S-Hanabi on 272 bugs from two real-world Java defects benchmarks, namely Defects4J and Bugs.jar. L2S-Hanabi correctly fixes 32 bugs with a high precision of 84%. In terms of repairing conditional statement bugs, L2S-Hanabi significantly outperforms all existing approaches in both precision and recall.

在许多场景中，我们需要在局部上下文中找到最可能满足规范的程序，其中局部上下文中可以是不完整的程序、部分规范、自然语言描述等等。我们称这样的问题为程序评估。在本文中，我们提出了一个框架，玲珑综合框架(L2S)来解决这个问题。与已有的工作相比，我们的工作在以下几个方面是新颖的。(1)提出了一个可拓规则理论来描述如何将一个规划分解为多个选择。(2)提出了一种基于抽象解释的方法，对不满足规范的程序子空间进行有效的修剪。(3)证明了一个规划的概率是选择展开规则的概率的乘积，而与选择顺序无关。(4)我们将程序估计问题简化为寻路问题，使现有的寻路算法能够解决该问题。L2S已应用于程序生成和程序修复。在本文中，我们报告了这个框架的实例化，用于合成条件表达式(L2S-Cond)和修复条件语句(L2S-Hanabi)。在L2S- cond上的实验表明，L2S支持的每个选项，包括扩展规则、剪枝技术和不同寻路算法的使用，对该方法的性能起着重要作用。L2S-Cond的默认配置正确地预测了前5个候选条件表达式中近60%的条件表达式。此外，我们对来自两个真实Java缺陷基准(即Defects4J和bugs .jar)的272个缺陷对L2S-Hanabi进行了评估。L2S-Hanabi正确修复了32个错误，精确度高达84%。在修复条件语句错误方面，L2S-Hanabi在精度和召回率方面都明显优于所有现有方法。

{"title":"L2S: A Framework for Synthesizing the Most Probable Program under a Specification","authors":"Yingfei Xiong, Bo Wang","doi":"10.1145/3487570","DOIUrl":"https://doi.org/10.1145/3487570","url":null,"abstract":"In many scenarios, we need to find the most likely program that meets a specification under a local context, where the local context can be an incomplete program, a partial specification, natural language description, and so on. We call such a problem program estimation. In this article, we propose a framework, LingLong Synthesis Framework (L2S), to address this problem. Compared with existing work, our work is novel in the following aspects. (1) We propose a theory of expansion rules to describe how to decompose a program into choices. (2) We propose an approach based on abstract interpretation to efficiently prune off the program sub-space that does not satisfy the specification. (3) We prove that the probability of a program is the product of the probabilities of choosing expansion rules, regardless of the choosing order. (4) We reduce the program estimation problem to a pathfinding problem, enabling existing pathfinding algorithms to solve this problem. L2S has been applied to program generation and program repair. In this article, we report our instantiation of this framework for synthesizing conditional expressions (L2S-Cond) and repairing conditional statements (L2S-Hanabi). The experiments on L2S-Cond show that each option enabled by L2S, including the expansion rules, the pruning technique, and the use of different pathfinding algorithms, plays a major role in the performance of the approach. The default configuration of L2S-Cond correctly predicts nearly 60% of the conditional expressions in the top 5 candidates. Moreover, we evaluate L2S-Hanabi on 272 bugs from two real-world Java defects benchmarks, namely Defects4J and Bugs.jar. L2S-Hanabi correctly fixes 32 bugs with a high precision of 84%. In terms of repairing conditional statement bugs, L2S-Hanabi significantly outperforms all existing approaches in both precision and recall.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"29 1","pages":"1 - 45"},"PeriodicalIF":0.0,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91493150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Stateful Serverless Computing with Crucial 有状态无服务器计算与关键

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2022-03-07 DOI: 10.1145/3490386

Daniel Barcelona Pons, P. Sutra, Marc Sánchez Artigas, Gerard París, P. López

Serverless computing greatly simplifies the use of cloud resources. In particular, Function-as-a-Service (FaaS) platforms enable programmers to develop applications as individual functions that can run and scale independently. Unfortunately, applications that require fine-grained support for mutable state and synchronization, such as machine learning (ML) and scientific computing, are notoriously hard to build with this new paradigm. In this work, we aim at bridging this gap. We present Crucial, a system to program highly-parallel stateful serverless applications. Crucial retains the simplicity of serverless computing. It is built upon the key insight that FaaS resembles to concurrent programming at the scale of a datacenter. Accordingly, a distributed shared memory layer is the natural answer to the needs for fine-grained state management and synchronization. Crucial allows to port effortlessly a multi-threaded code base to serverless, where it can benefit from the scalability and pay-per-use model of FaaS platforms. We validate Crucial with the help of micro-benchmarks and by considering various stateful applications. Beyond classical parallel tasks (e.g., a Monte Carlo simulation), these applications include representative ML algorithms such as k-means and logistic regression. Our evaluation shows that Crucial obtains superior or comparable performance to Apache Spark at similar cost (18%–40% faster). We also use Crucial to port (part of) a state-of-the-art multi-threaded ML library to serverless. The ported application is up to 30% faster than with a dedicated high-end server. Finally, we attest that Crucial can rival in performance with a single-machine, multi-threaded implementation of a complex coordination problem. Overall, Crucial delivers all these benefits with less than 6% of changes in the code bases of the evaluated applications.

无服务器计算极大地简化了云资源的使用。特别是，功能即服务(FaaS)平台使程序员能够将应用程序开发为可以独立运行和扩展的独立功能。不幸的是，需要细粒度支持可变状态和同步的应用程序，比如机器学习(ML)和科学计算，很难用这种新范式构建。在这项工作中，我们的目标是弥合这一差距。我们提出了一个编程高度并行的无服务器状态应用程序的系统。Crucial保留了无服务器计算的简单性。它建立在FaaS类似于数据中心规模的并发编程的关键见解之上。因此，分布式共享内存层是满足细粒度状态管理和同步需求的自然答案。Crucial允许毫不费力地将多线程代码库移植到无服务器，从而可以从FaaS平台的可伸缩性和按使用付费模型中获益。我们在微基准测试的帮助下验证了Crucial，并考虑了各种有状态应用程序。除了经典的并行任务(例如，蒙特卡罗模拟)，这些应用程序包括代表性的ML算法，如k-means和逻辑回归。我们的评估表明，在相同的成本下，Crucial获得了优于或与Apache Spark相当的性能(快18%-40%)。我们还使用critical将最先进的多线程ML库(一部分)移植到无服务器上。移植后的应用程序比专用高端服务器快30%。最后，我们证明了Crucial在性能上可以与复杂协调问题的单机多线程实现相媲美。总的来说，在评估的应用程序的代码基础中，critical以少于6%的更改交付了所有这些好处。

{"title":"Stateful Serverless Computing with Crucial","authors":"Daniel Barcelona Pons, P. Sutra, Marc Sánchez Artigas, Gerard París, P. López","doi":"10.1145/3490386","DOIUrl":"https://doi.org/10.1145/3490386","url":null,"abstract":"Serverless computing greatly simplifies the use of cloud resources. In particular, Function-as-a-Service (FaaS) platforms enable programmers to develop applications as individual functions that can run and scale independently. Unfortunately, applications that require fine-grained support for mutable state and synchronization, such as machine learning (ML) and scientific computing, are notoriously hard to build with this new paradigm. In this work, we aim at bridging this gap. We present Crucial, a system to program highly-parallel stateful serverless applications. Crucial retains the simplicity of serverless computing. It is built upon the key insight that FaaS resembles to concurrent programming at the scale of a datacenter. Accordingly, a distributed shared memory layer is the natural answer to the needs for fine-grained state management and synchronization. Crucial allows to port effortlessly a multi-threaded code base to serverless, where it can benefit from the scalability and pay-per-use model of FaaS platforms. We validate Crucial with the help of micro-benchmarks and by considering various stateful applications. Beyond classical parallel tasks (e.g., a Monte Carlo simulation), these applications include representative ML algorithms such as k-means and logistic regression. Our evaluation shows that Crucial obtains superior or comparable performance to Apache Spark at similar cost (18%–40% faster). We also use Crucial to port (part of) a state-of-the-art multi-threaded ML library to serverless. The ported application is up to 30% faster than with a dedicated high-end server. Finally, we attest that Crucial can rival in performance with a single-machine, multi-threaded implementation of a complex coordination problem. Overall, Crucial delivers all these benefits with less than 6% of changes in the code bases of the evaluated applications.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"344 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76918123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Industry–Academia Research Collaboration and Knowledge Co-creation: Patterns and Anti-patterns 产学研合作与知识共同创造:模式与反模式

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2022-03-07 DOI: 10.1145/3494519

D. Marijan, Sagar Sen

Increasing the impact of software engineering research in the software industry and the society at large has long been a concern of high priority for the software engineering community. The problem of two cultures, research conducted in a vacuum (disconnected from the real world), or misaligned time horizons are just some of the many complex challenges standing in the way of successful industry–academia collaborations. This article reports on the experience of research collaboration and knowledge co-creation between industry and academia in software engineering as a way to bridge the research–practice collaboration gap. Our experience spans 14 years of collaboration between researchers in software engineering and the European and Norwegian software and IT industry. Using the participant observation and interview methods, we have collected and afterwards analyzed an extensive record of qualitative data. Drawing upon the findings made and the experience gained, we provide a set of 14 patterns and 14 anti-patterns for industry–academia collaborations, aimed to support other researchers and practitioners in establishing and running research collaboration projects in software engineering.

提高软件工程研究对软件行业和整个社会的影响一直是软件工程界高度关注的问题。两种文化的问题，在真空中进行的研究(与现实世界脱节)，或者不一致的时间范围，只是阻碍成功的产学研合作的许多复杂挑战中的一部分。本文报告了软件工程行业和学术界之间的研究协作和知识共同创造的经验，作为弥合研究-实践协作差距的一种方式。我们的经验跨越了软件工程研究人员与欧洲和挪威软件和IT行业之间14年的合作。采用参与式观察法和访谈法，我们收集并分析了大量的定性数据记录。根据所做的发现和获得的经验，我们提供了一组14种模式和14种反模式，用于工业-学术界合作，旨在支持其他研究人员和实践者在软件工程中建立和运行研究协作项目。

{"title":"Industry–Academia Research Collaboration and Knowledge Co-creation: Patterns and Anti-patterns","authors":"D. Marijan, Sagar Sen","doi":"10.1145/3494519","DOIUrl":"https://doi.org/10.1145/3494519","url":null,"abstract":"Increasing the impact of software engineering research in the software industry and the society at large has long been a concern of high priority for the software engineering community. The problem of two cultures, research conducted in a vacuum (disconnected from the real world), or misaligned time horizons are just some of the many complex challenges standing in the way of successful industry–academia collaborations. This article reports on the experience of research collaboration and knowledge co-creation between industry and academia in software engineering as a way to bridge the research–practice collaboration gap. Our experience spans 14 years of collaboration between researchers in software engineering and the European and Norwegian software and IT industry. Using the participant observation and interview methods, we have collected and afterwards analyzed an extensive record of qualitative data. Drawing upon the findings made and the experience gained, we provide a set of 14 patterns and 14 anti-patterns for industry–academia collaborations, aimed to support other researchers and practitioners in establishing and running research collaboration projects in software engineering.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"24 1","pages":"1 - 52"},"PeriodicalIF":0.0,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90951508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Boosting Compiler Testing via Compiler Optimization Exploration 通过编译器优化探索促进编译器测试

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2022-03-05 DOI: 10.1145/3508362

Junjie Chen, Chenyao Suo

Compilers are a kind of important software, and similar to the quality assurance of other software, compiler testing is one of the most widely-used ways of guaranteeing their quality. Compiler bugs tend to occur in compiler optimizations. Detecting optimization bugs needs to consider two main factors: (1) the optimization flags controlling the accessability of the compiler buggy code should be turned on; and (2) the test program should be able to trigger the buggy code. However, existing compiler testing approaches only consider the latter to generate effective test programs, but just run them under several pre-defined optimization levels (e.g., -O0, -O1, -O2, -O3, -Os in GCC). To better understand the influence of compiler optimizations on compiler testing, we conduct the first empirical study, and find that (1) all the bugs detected under the widely-used optimization levels are also detected under the explored optimization settings (we call a combination of optimization flags turned on for compilation an optimization setting), while 83.54% of bugs are only detected under the latter; (2) there exist both inhibition effect and promotion effect among optimization flags for compiler testing, indicating the necessity and challenges of considering the factor of compiler optimizations in compiler testing. We then propose the first approach, called COTest, by considering both factors to test compilers. Specifically, COTest first adopts machine-learning (the XGBoost algorithm) to model the relationship between test programs and optimization settings, to predict the bug-triggering probability of a test program under an optimization setting. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. The experiments on GCC and LLVM demonstrate its effectiveness, especially COTest detects 17 previously unknown bugs, 11 of which have been fixed or confirmed by developers.

编译器是一种重要的软件，与其他软件的质量保证一样，编译器测试是保证其质量的最广泛使用的方法之一。编译器错误往往发生在编译器优化中。检测优化bug需要考虑两个主要因素:(1)应该打开控制编译器bug代码可访问性的优化标志;(2)测试程序应该能够触发有bug的代码。然而，现有的编译器测试方法只考虑后者来生成有效的测试程序，而只是在几个预定义的优化级别下运行它们(例如GCC中的- 0、- 01、-O2、-O3、- o)。为了更好地理解编译器优化对编译器测试的影响，我们进行了第一次实证研究，发现(1)在广泛使用的优化级别下检测到的所有bug，在探索的优化设置(我们将为编译打开的优化标志组合称为优化设置)下也能检测到，而83.54%的bug仅在后者下被检测到;(2)编译器测试的优化标志之间既有抑制作用，也有促进作用，说明在编译器测试中考虑编译器优化因素的必要性和挑战性。然后我们提出第一种方法，称为COTest，通过考虑这两个因素来测试编译器。具体来说，COTest首先采用机器学习(XGBoost算法)对测试程序与优化设置之间的关系进行建模，预测在优化设置下测试程序触发bug的概率。然后，设计了一种多样性增强策略，选择一组不同的候选优化设置用于测试程序的预测。最后，根据预测的bug触发概率选择Top-K优化设置进行编译器测试。然后，设计了一种多样性增强策略，选择一组不同的候选优化设置用于测试程序的预测。最后，根据预测的bug触发概率选择Top-K优化设置进行编译器测试。在GCC和LLVM上的实验证明了它的有效性，特别是COTest检测到17个以前未知的bug，其中11个已经被开发人员修复或确认。

{"title":"Boosting Compiler Testing via Compiler Optimization Exploration","authors":"Junjie Chen, Chenyao Suo","doi":"10.1145/3508362","DOIUrl":"https://doi.org/10.1145/3508362","url":null,"abstract":"Compilers are a kind of important software, and similar to the quality assurance of other software, compiler testing is one of the most widely-used ways of guaranteeing their quality. Compiler bugs tend to occur in compiler optimizations. Detecting optimization bugs needs to consider two main factors: (1) the optimization flags controlling the accessability of the compiler buggy code should be turned on; and (2) the test program should be able to trigger the buggy code. However, existing compiler testing approaches only consider the latter to generate effective test programs, but just run them under several pre-defined optimization levels (e.g., -O0, -O1, -O2, -O3, -Os in GCC). To better understand the influence of compiler optimizations on compiler testing, we conduct the first empirical study, and find that (1) all the bugs detected under the widely-used optimization levels are also detected under the explored optimization settings (we call a combination of optimization flags turned on for compilation an optimization setting), while 83.54% of bugs are only detected under the latter; (2) there exist both inhibition effect and promotion effect among optimization flags for compiler testing, indicating the necessity and challenges of considering the factor of compiler optimizations in compiler testing. We then propose the first approach, called COTest, by considering both factors to test compilers. Specifically, COTest first adopts machine-learning (the XGBoost algorithm) to model the relationship between test programs and optimization settings, to predict the bug-triggering probability of a test program under an optimization setting. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. The experiments on GCC and LLVM demonstrate its effectiveness, especially COTest detects 17 previously unknown bugs, 11 of which have been fixed or confirmed by developers.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"37 1","pages":"1 - 33"},"PeriodicalIF":0.0,"publicationDate":"2022-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73725119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Buddy Stacks: Protecting Return Addresses with Efficient Thread-Local Storage and Runtime Re-Randomization 伙伴栈:用有效的线程本地存储和运行时再随机化保护返回地址

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2022-03-04 DOI: 10.1145/3494516

Changwei Zou, Xudong Wang, Yaoqing Gao, Jingling Xue

Shadow stacks play an important role in protecting return addresses to mitigate ROP attacks. Parallel shadow stacks, which shadow the call stack of each thread at the same constant offset for all threads, are known not to support multi-threading well. On the other hand, compact shadow stacks must maintain a separate shadow stack pointer in thread-local storage (TLS), which can be implemented in terms of a register or the per-thread Thread-Control-Block (TCB), suffering from poor compatibility in the former or high performance overhead in the latter. In addition, shadow stacks are vulnerable to information disclosure attacks. In this paper, we propose to mitigate ROP attacks for single- and multi-threaded server programs running on general-purpose computing systems by using a novel stack layout, called a buddy stack (referred to as Bustk), that is highly performant, compatible with existing code, and provides meaningful security. These goals are met due to three novel design aspects in Bustk. First, Bustk places a parallel shadow stack just below a thread’s call stack (as each other’s buddies allocated together), avoiding the need to maintain a separate shadow stack pointer and making it now well-suited for multi-threading. Second, Bustk uses an efficient stack-based thread-local storage mechanism, denoted STK-TLS, to store thread-specific metadata in two TLS sections just below the shadow stack in dual redundancy (as each other’s buddies), so that both can be accessed and updated in a lightweight manner from the call stack pointer rsp alone. Finally, Bustk re-randomizes continuously (on the order of milliseconds) the return addresses on the shadow stack by using a new microsecond-level runtime re-randomization technique, denoted STK-MSR. This mechanism aims to obsolete leaked information, making it extremely unlikely for the attacker to hijack return addresses, particularly against a server program that sits often tens of milliseconds away from the attacker. Our evaluation using web servers, Nginx and Apache Httpd, shows that Bustk works well in terms of performance, compatibility, and security provided, with its parallel shadow stacks incurring acceptable memory overhead for real-world applications and its STK-TLS mechanism costing only two pages per thread. In particular, Bustk can protect the Nginx and Apache servers with an adaptive 1-ms re-randomization policy (without observable overheads when IO is intensive, with about 17,000 requests per second). In addition, we have also evaluated Bustk using other non-server applications, Firefox, Python, LLVM, JDK and SPEC CPU2006, to demonstrate further the same degree of performance and compatibility provided, but the protection provided for, say, browsers, is weaker (since network-access delays can no longer be assumed).

影子堆栈在保护返回地址以减轻ROP攻击方面发挥着重要作用。并行阴影堆栈以相同的常量偏移量对所有线程的每个线程的调用堆栈进行阴影，已知它不能很好地支持多线程。另一方面，紧凑的影子堆栈必须在线程本地存储(TLS)中维护一个单独的影子堆栈指针，这可以通过寄存器或每线程线程控制块(TCB)来实现，前者的兼容性较差，后者的性能开销较高。此外，影子堆栈很容易受到信息泄露攻击。在本文中，我们建议通过使用一种新的堆栈布局来减轻运行在通用计算系统上的单线程和多线程服务器程序的ROP攻击，这种布局称为伙伴堆栈(称为Bustk)，它具有高性能，与现有代码兼容，并提供有意义的安全性。由于Bustk的三个新颖设计方面，这些目标得以实现。首先，Bustk将并行影子堆栈放置在线程调用堆栈的下方(就像每个线程的伙伴被分配在一起一样)，避免了维护单独的影子堆栈指针的需要，并使其现在非常适合多线程。其次，Bustk使用一种高效的基于堆栈的线程本地存储机制，称为STK-TLS，以双冗余的方式将线程特定的元数据存储在影子堆栈下方的两个TLS部分中(作为彼此的伙伴)，因此两者都可以仅从调用堆栈指针rsp以轻量级的方式访问和更新。最后，Bustk通过使用一种新的微秒级运行时重新随机化技术(称为STK-MSR)，连续地(以毫秒为单位)重新随机化影子堆栈上的返回地址。这种机制旨在废弃泄露的信息，使攻击者极不可能劫持返回地址，特别是针对通常距离攻击者几十毫秒的服务器程序。我们使用web服务器(Nginx和Apache Httpd)进行评估，结果表明，Bustk在性能、兼容性和安全性方面都表现良好，其并行影子堆栈为实际应用程序带来了可接受的内存开销，其STK-TLS机制每线程仅消耗两个页面。特别是，Bustk可以保护Nginx和Apache服务器，使用自适应的1毫秒重新随机化策略(IO密集时没有可观察到的开销，每秒约17,000个请求)。此外，我们还使用其他非服务器应用程序(Firefox、Python、LLVM、JDK和SPEC CPU2006)对Bustk进行了评估，以进一步演示提供的相同程度的性能和兼容性，但是为浏览器提供的保护较弱(因为不能再假设网络访问延迟)。

{"title":"Buddy Stacks: Protecting Return Addresses with Efficient Thread-Local Storage and Runtime Re-Randomization","authors":"Changwei Zou, Xudong Wang, Yaoqing Gao, Jingling Xue","doi":"10.1145/3494516","DOIUrl":"https://doi.org/10.1145/3494516","url":null,"abstract":"Shadow stacks play an important role in protecting return addresses to mitigate ROP attacks. Parallel shadow stacks, which shadow the call stack of each thread at the same constant offset for all threads, are known not to support multi-threading well. On the other hand, compact shadow stacks must maintain a separate shadow stack pointer in thread-local storage (TLS), which can be implemented in terms of a register or the per-thread Thread-Control-Block (TCB), suffering from poor compatibility in the former or high performance overhead in the latter. In addition, shadow stacks are vulnerable to information disclosure attacks. In this paper, we propose to mitigate ROP attacks for single- and multi-threaded server programs running on general-purpose computing systems by using a novel stack layout, called a buddy stack (referred to as Bustk), that is highly performant, compatible with existing code, and provides meaningful security. These goals are met due to three novel design aspects in Bustk. First, Bustk places a parallel shadow stack just below a thread’s call stack (as each other’s buddies allocated together), avoiding the need to maintain a separate shadow stack pointer and making it now well-suited for multi-threading. Second, Bustk uses an efficient stack-based thread-local storage mechanism, denoted STK-TLS, to store thread-specific metadata in two TLS sections just below the shadow stack in dual redundancy (as each other’s buddies), so that both can be accessed and updated in a lightweight manner from the call stack pointer rsp alone. Finally, Bustk re-randomizes continuously (on the order of milliseconds) the return addresses on the shadow stack by using a new microsecond-level runtime re-randomization technique, denoted STK-MSR. This mechanism aims to obsolete leaked information, making it extremely unlikely for the attacker to hijack return addresses, particularly against a server program that sits often tens of milliseconds away from the attacker. Our evaluation using web servers, Nginx and Apache Httpd, shows that Bustk works well in terms of performance, compatibility, and security provided, with its parallel shadow stacks incurring acceptable memory overhead for real-world applications and its STK-TLS mechanism costing only two pages per thread. In particular, Bustk can protect the Nginx and Apache servers with an adaptive 1-ms re-randomization policy (without observable overheads when IO is intensive, with about 17,000 requests per second). In addition, we have also evaluated Bustk using other non-server applications, Firefox, Python, LLVM, JDK and SPEC CPU2006, to demonstrate further the same degree of performance and compatibility provided, but the protection provided for, say, browsers, is weaker (since network-access delays can no longer be assumed).","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"16 1","pages":"1 - 37"},"PeriodicalIF":0.0,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87322726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Common Terminology for Software Risk Management 软件风险管理的通用术语

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2022-02-12 DOI: 10.1145/3498539

J. Masso, F. García, César J. Pardo, F. Pino, M. Piattini

In order to improve and sustain their competitiveness over time, organisations nowadays need to undertake different initiatives to adopt frameworks, models and standards that will allow them to align and improve their business processes. In spite of these efforts, organisations may still encounter governance and management problems. This is where Risk Management (RM) can play a major role, since its purpose is to contribute to the creation and preservation of value in the context of the organisation's processes. RM is a complex and subjective activity that requires experience and a high level of knowledge about risks, and it is for this reason that standardisation institutions and researchers have made great efforts to define initiatives to overcome these challenges. However, the RM field nevertheless presents a lack of uniformity in its terms and concepts, due to the different contexts and scopes of application, a situation that can generate ambiguities and misunderstandings. To address these issues, this paper aims to present an ontology called SRMO (Software Risk Management Ontology), which seeks to unify the terms and concepts associated with RM and provide an integrated and holistic view of risk. In doing so, the Pipeline framework has been applied in order to assure and verify the quality of the proposed ontology, and it has been implemented in Protégé and validated by means of competency questions. Three application scenarios of this ontology demonstrating their usefulness in the software engineering field are presented in this paper. We believe that this ontology can be useful for organisations that are interested in: (i) establishing an RM strategy from an integrated approach, (ii) defining the elements that help to identify risks and the criteria that support decision-making in risk assessment, and (iii) helping the involved stakeholders during the process of risk management.

为了提高和维持他们的竞争力，现在的组织需要采取不同的举措来采用框架、模型和标准，使他们能够协调和改进他们的业务流程。尽管做出了这些努力，组织仍然可能遇到治理和管理问题。这就是风险管理(RM)可以发挥主要作用的地方，因为它的目的是在组织过程的背景下促进价值的创造和保存。RM是一项复杂而主观的活动，需要经验和对风险的高水平知识，正是由于这个原因，标准化机构和研究人员已经做出了巨大的努力来定义克服这些挑战的举措。然而，由于不同的上下文和应用范围，RM领域在术语和概念上缺乏一致性，这种情况可能会产生歧义和误解。为了解决这些问题，本文旨在提出一个名为SRMO(软件风险管理本体)的本体，它试图统一与风险管理相关的术语和概念，并提供一个集成的、整体的风险视图。在这样做的过程中，为了保证和验证提出的本体的质量，已经应用了Pipeline框架，并且它已经在protgase中实现，并通过能力问题进行验证。文中给出了该本体的三种应用场景，说明了该本体在软件工程领域的实用性。我们相信，这个本体论对于以下方面感兴趣的组织是有用的:(i)通过综合方法建立风险管理战略，(ii)定义有助于识别风险的元素和支持风险评估决策的标准，以及(iii)在风险管理过程中帮助相关的利益相关者。

{"title":"A Common Terminology for Software Risk Management","authors":"J. Masso, F. García, César J. Pardo, F. Pino, M. Piattini","doi":"10.1145/3498539","DOIUrl":"https://doi.org/10.1145/3498539","url":null,"abstract":"In order to improve and sustain their competitiveness over time, organisations nowadays need to undertake different initiatives to adopt frameworks, models and standards that will allow them to align and improve their business processes. In spite of these efforts, organisations may still encounter governance and management problems. This is where Risk Management (RM) can play a major role, since its purpose is to contribute to the creation and preservation of value in the context of the organisation's processes. RM is a complex and subjective activity that requires experience and a high level of knowledge about risks, and it is for this reason that standardisation institutions and researchers have made great efforts to define initiatives to overcome these challenges. However, the RM field nevertheless presents a lack of uniformity in its terms and concepts, due to the different contexts and scopes of application, a situation that can generate ambiguities and misunderstandings. To address these issues, this paper aims to present an ontology called SRMO (Software Risk Management Ontology), which seeks to unify the terms and concepts associated with RM and provide an integrated and holistic view of risk. In doing so, the Pipeline framework has been applied in order to assure and verify the quality of the proposed ontology, and it has been implemented in Protégé and validated by means of competency questions. Three application scenarios of this ontology demonstrating their usefulness in the software engineering field are presented in this paper. We believe that this ontology can be useful for organisations that are interested in: (i) establishing an RM strategy from an integrated approach, (ii) defining the elements that help to identify risks and the criteria that support decision-making in risk assessment, and (iii) helping the involved stakeholders during the process of risk management.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"23 1","pages":"1 - 47"},"PeriodicalIF":0.0,"publicationDate":"2022-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88891007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

How Do Successful and Failed Projects Differ? A Socio-Technical Analysis 成功和失败的项目有何不同?社会技术分析

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2022-02-08 DOI: 10.1145/3504003

Mitchell Joblin, S. Apel

Software development is at the intersection of the social realm, involving people who develop the software, and the technical realm, involving artifacts (code, docs, etc.) that are being produced. It has been shown that a socio-technical perspective provides rich information about the state of a software project. In particular, we are interested in socio-technical factors that are associated with project success. For this purpose, we frame the task as a network classification problem. We show how a set of heterogeneous networks composed of social and technical entities can be jointly embedded in a single vector space enabling mathematically sound comparisons between distinct software projects. Our approach is specifically designed using intuitive metrics stemming from network analysis and statistics to ease the interpretation of results in the context of software engineering wisdom. Based on a selection of 32 open source projects, we perform an empirical study to validate our approach considering three prediction scenarios to test the classification model’s ability generalizing to (1) randomly held-out project snapshots, (2) future project states, and (3) entirely new projects. Our results provide evidence that a socio-technical perspective is superior to a pure social or technical perspective when it comes to early indicators of future project success. To our surprise, the methodology proposed here even shows evidence of being able to generalize to entirely novel (project hold-out set) software projects reaching predication accuracies of 80%, which is a further testament to the efficacy of our approach and beyond what has been possible so far. In addition, we identify key features that are strongly associated with project success. Our results indicate that even relatively simple socio-technical networks capture highly relevant and interpretable information about the early indicators of future project success.

软件开发处于社会领域和技术领域的交汇处，前者涉及开发软件的人员，后者涉及正在生产的工件(代码、文档等)。已经证明，社会技术视角提供了关于软件项目状态的丰富信息。特别是，我们对与项目成功相关的社会技术因素感兴趣。为此，我们将该任务定义为网络分类问题。我们展示了一组由社会和技术实体组成的异构网络如何可以联合嵌入到单个向量空间中，从而在不同的软件项目之间实现数学上合理的比较。我们的方法是特别设计的，使用源自网络分析和统计的直观度量，以简化软件工程智慧背景下对结果的解释。基于32个开源项目的选择，我们进行了一项实证研究来验证我们的方法，考虑了三种预测场景来测试分类模型泛化到(1)随机项目快照，(2)未来项目状态，以及(3)全新项目的能力。我们的结果提供了证据，当涉及到未来项目成功的早期指标时，社会技术视角优于纯粹的社会或技术视角。令我们惊讶的是，这里提出的方法甚至显示出能够推广到完全新颖的(项目保留集)软件项目的证据，预测精度达到80%，这进一步证明了我们的方法的有效性，并且超越了迄今为止的可能性。此外，我们还确定了与项目成功密切相关的关键特征。我们的研究结果表明，即使是相对简单的社会技术网络也能捕捉到与未来项目成功的早期指标高度相关且可解释的信息。

{"title":"How Do Successful and Failed Projects Differ? A Socio-Technical Analysis","authors":"Mitchell Joblin, S. Apel","doi":"10.1145/3504003","DOIUrl":"https://doi.org/10.1145/3504003","url":null,"abstract":"Software development is at the intersection of the social realm, involving people who develop the software, and the technical realm, involving artifacts (code, docs, etc.) that are being produced. It has been shown that a socio-technical perspective provides rich information about the state of a software project. In particular, we are interested in socio-technical factors that are associated with project success. For this purpose, we frame the task as a network classification problem. We show how a set of heterogeneous networks composed of social and technical entities can be jointly embedded in a single vector space enabling mathematically sound comparisons between distinct software projects. Our approach is specifically designed using intuitive metrics stemming from network analysis and statistics to ease the interpretation of results in the context of software engineering wisdom. Based on a selection of 32 open source projects, we perform an empirical study to validate our approach considering three prediction scenarios to test the classification model’s ability generalizing to (1) randomly held-out project snapshots, (2) future project states, and (3) entirely new projects. Our results provide evidence that a socio-technical perspective is superior to a pure social or technical perspective when it comes to early indicators of future project success. To our surprise, the methodology proposed here even shows evidence of being able to generalize to entirely novel (project hold-out set) software projects reaching predication accuracies of 80%, which is a further testament to the efficacy of our approach and beyond what has been possible so far. In addition, we identify key features that are strongly associated with project success. Our results indicate that even relatively simple socio-technical networks capture highly relevant and interpretable information about the early indicators of future project success.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"21 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2022-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88569009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

NPC: Neuron Path Coverage via Characterizing Decision Logic of Deep Neural Networks 通过表征深度神经网络决策逻辑的神经元路径覆盖

ACM Transactions on Software Engineering and Methodology (TOSEM)

Pub Date : 2022-01-31 DOI: 10.1145/3490489

Xiaofei Xie, Tianlin Li, Jian Wang, L. Ma, Qing Guo, Felix Juefei-Xu, Yang Liu

Deep learning has recently been widely applied to many applications across different domains, e.g., image classification and audio recognition. However, the quality of Deep Neural Networks (DNNs) still raises concerns in the practical operational environment, which calls for systematic testing, especially in safety-critical scenarios. Inspired by software testing, a number of structural coverage criteria are designed and proposed to measure the test adequacy of DNNs. However, due to the blackbox nature of DNN, the existing structural coverage criteria are difficult to interpret, making it hard to understand the underlying principles of these criteria. The relationship between the structural coverage and the decision logic of DNNs is unknown. Moreover, recent studies have further revealed the non-existence of correlation between the structural coverage and DNN defect detection, which further posts concerns on what a suitable DNN testing criterion should be. In this article, we propose the interpretable coverage criteria through constructing the decision structure of a DNN. Mirroring the control flow graph of the traditional program, we first extract a decision graph from a DNN based on its interpretation, where a path of the decision graph represents a decision logic of the DNN. Based on the control flow and data flow of the decision graph, we propose two variants of path coverage to measure the adequacy of the test cases in exercising the decision logic. The higher the path coverage, the more diverse decision logic the DNN is expected to be explored. Our large-scale evaluation results demonstrate that: The path in the decision graph is effective in characterizing the decision of the DNN, and the proposed coverage criteria are also sensitive with errors, including natural errors and adversarial examples, and strongly correlate with the output impartiality.

深度学习最近被广泛应用于不同领域的许多应用，例如图像分类和音频识别。然而，深度神经网络(dnn)的质量在实际操作环境中仍然令人担忧，这需要系统的测试，特别是在安全关键的场景中。受软件测试的启发，设计并提出了一些结构覆盖标准来衡量深度神经网络的测试充分性。然而，由于深度神经网络的黑箱性质，现有的结构覆盖标准难以解释，因此很难理解这些标准的基本原理。深层神经网络的结构覆盖与决策逻辑之间的关系是未知的。此外，最近的研究进一步揭示了结构覆盖率与DNN缺陷检测之间不存在相关性，这进一步引起了人们对什么是合适的DNN测试标准的关注。本文通过构造深度神经网络的决策结构，提出了可解释的覆盖准则。与传统程序的控制流图相镜像，我们首先根据DNN的解释从DNN中提取决策图，其中决策图的路径表示DNN的决策逻辑。基于决策图的控制流和数据流，我们提出了两种路径覆盖的变体来衡量测试用例在执行决策逻辑时的充分性。路径覆盖率越高，DNN期望探索的决策逻辑就越多样化。我们的大规模评估结果表明:决策图中的路径可以有效地表征深度神经网络的决策，并且所提出的覆盖标准对误差(包括自然误差和对抗示例)也很敏感，并且与输出的公正性强相关。

{"title":"NPC: Neuron Path Coverage via Characterizing Decision Logic of Deep Neural Networks","authors":"Xiaofei Xie, Tianlin Li, Jian Wang, L. Ma, Qing Guo, Felix Juefei-Xu, Yang Liu","doi":"10.1145/3490489","DOIUrl":"https://doi.org/10.1145/3490489","url":null,"abstract":"Deep learning has recently been widely applied to many applications across different domains, e.g., image classification and audio recognition. However, the quality of Deep Neural Networks (DNNs) still raises concerns in the practical operational environment, which calls for systematic testing, especially in safety-critical scenarios. Inspired by software testing, a number of structural coverage criteria are designed and proposed to measure the test adequacy of DNNs. However, due to the blackbox nature of DNN, the existing structural coverage criteria are difficult to interpret, making it hard to understand the underlying principles of these criteria. The relationship between the structural coverage and the decision logic of DNNs is unknown. Moreover, recent studies have further revealed the non-existence of correlation between the structural coverage and DNN defect detection, which further posts concerns on what a suitable DNN testing criterion should be. In this article, we propose the interpretable coverage criteria through constructing the decision structure of a DNN. Mirroring the control flow graph of the traditional program, we first extract a decision graph from a DNN based on its interpretation, where a path of the decision graph represents a decision logic of the DNN. Based on the control flow and data flow of the decision graph, we propose two variants of path coverage to measure the adequacy of the test cases in exercising the decision logic. The higher the path coverage, the more diverse decision logic the DNN is expected to be explored. Our large-scale evaluation results demonstrate that: The path in the decision graph is effective in characterizing the decision of the DNN, and the proposed coverage criteria are also sensitive with errors, including natural errors and adversarial examples, and strongly correlate with the output impartiality.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"31 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85755737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17