Software Testing Verification & Reliability最新文献_第6页

Integrating pattern matching and abstract interpretation for verifying cautions of microcontrollers 集成模式匹配和抽象解释，以验证微控制器的注意事项

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2021-08-19 DOI: 10.1002/stvr.1788

Thuy Nguyen, Takashi Tomita, Junpei Endo, Toshiaki Aoki

Handling hardware‐dependent properties at a low level is usually required in developing microcontroller‐based applications. One of these hardware‐dependent properties is cautions, which are described in microcontrollers hardware manuals. The process of verifying these cautions is performed manually, as there is currently no single tool that can directly handle this task. This research aims at automating the verification of these cautions. To obtain the typical cautions of microcontrollers, we investigate two sections which have a considerable number of required cautions in the hardware manual of a popular microcontroller. Subsequently, we analyse these cautions and categorize them into several groups. Based on this analysis, we propose a semi‐automatic approach for verifying the cautions which integrates two static programme analysis techniques (i.e., pattern matching and abstract interpretation). To evaluate our approach, we conducted experiments with generated source code, benchmark source code, and industrial source code. The generated source code, which was created automatically based on several aspects of the C programme, was used to evaluate the performance of the approach based on these aspects. The benchmark and the industrial source code, which were provided by Aisin Software Co., Ltd., were used to assess the feasibility and applicability of the approach. The results show that all expected violations in the benchmark source code were detected. Unexpected but real violations in the benchmark programme were also detected. For the industrial source code, the approach successfully handled and detected most of the expected violations. These results show that the approach is promising in verifying the cautions.

在开发基于微控制器的应用程序时，通常需要在较低的水平上处理硬件相关的属性。这些与硬件相关的属性之一是注意事项，在微控制器硬件手册中有描述。验证这些注意事项的过程是手动执行的，因为目前没有单一的工具可以直接处理此任务。本研究旨在自动化验证这些警告。为了获得微控制器的典型注意事项，我们研究了在流行微控制器的硬件手册中具有相当数量所需注意事项的两个部分。随后，我们对这些警告进行了分析，并将其分为几类。基于这一分析，我们提出了一种半自动方法来验证警告，该方法集成了两种静态程序分析技术(即模式匹配和抽象解释)。为了评估我们的方法，我们对生成的源代码、基准源代码和工业源代码进行了实验。生成的源代码是基于C程序的几个方面自动创建的，用于评估基于这些方面的方法的性能。通过爱信软件有限公司提供的基准测试和工业源代码，对该方法的可行性和适用性进行了评估。结果表明，在基准测试源代码中检测到所有预期的违规。在基准程序中也发现了意外但真实的违规行为。对于工业源代码，该方法成功地处理和检测了大多数预期的违规。这些结果表明，该方法在验证这些警告方面是有希望的。

{"title":"Integrating pattern matching and abstract interpretation for verifying cautions of microcontrollers","authors":"Thuy Nguyen, Takashi Tomita, Junpei Endo, Toshiaki Aoki","doi":"10.1002/stvr.1788","DOIUrl":"https://doi.org/10.1002/stvr.1788","url":null,"abstract":"Handling hardware‐dependent properties at a low level is usually required in developing microcontroller‐based applications. One of these hardware‐dependent properties is cautions, which are described in microcontrollers hardware manuals. The process of verifying these cautions is performed manually, as there is currently no single tool that can directly handle this task. This research aims at automating the verification of these cautions. To obtain the typical cautions of microcontrollers, we investigate two sections which have a considerable number of required cautions in the hardware manual of a popular microcontroller. Subsequently, we analyse these cautions and categorize them into several groups. Based on this analysis, we propose a semi‐automatic approach for verifying the cautions which integrates two static programme analysis techniques (i.e., pattern matching and abstract interpretation). To evaluate our approach, we conducted experiments with generated source code, benchmark source code, and industrial source code. The generated source code, which was created automatically based on several aspects of the C programme, was used to evaluate the performance of the approach based on these aspects. The benchmark and the industrial source code, which were provided by Aisin Software Co., Ltd., were used to assess the feasibility and applicability of the approach. The results show that all expected violations in the benchmark source code were detected. Unexpected but real violations in the benchmark programme were also detected. For the industrial source code, the approach successfully handled and detected most of the expected violations. These results show that the approach is promising in verifying the cautions.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"44 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85489349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Assessing test suites of extended finite state machines against model‐ and code‐based faults 针对基于模型和代码的错误评估扩展有限状态机的测试套件

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2021-08-18 DOI: 10.1002/stvr.1789

K. El-Fakih, A. Alzaatreh, Uraz Cengiz Türker

Tests can be derived from extended finite state machine (EFSM) specifications considering the coverage of single‐transfer faults, all transitions using a transition tour, all‐uses, edge‐pair, and prime path with side trip. We provide novel empirical assessments of the effectiveness of these test suites. The first assessment determines for each pair of test suites if there is a difference between the pair in covering EFSM faults of six EFSM specifications. If the difference is found significant, we determine which test suite outperforms the other. The second assessment is similar to the first; yet, it is carried out against code faults of 12 Java implementations of the specifications. Besides, two assessments are provided to determine whether test suites have better coverage of certain classes of EFSM (or code) faults than others. The evaluation uses proper data transformation of mutation scores and p‐value adjustments for controlling Type I error due to multiple tests. Furthermore, we show that subsuming mutants have an impact on mutation scores of both EFSM and code faults; and accordingly, we use a score that removes them in order not to invalidate the obtained results. The assessments show that all‐uses tests were outperformed by all other tests; transition tours outperformed both edge‐pair and prime path with side trips; and single‐transfer fault tests outperformed all other test suites. Similar results are obtained over the considered EFSM and code fault domains, and there were no significant differences between the test suites coverage of different classes of EFSM and code faults.

测试可以从扩展有限状态机(EFSM)规范中导出，考虑到单传输故障的覆盖范围，使用转换巡回的所有转换，所有使用，边缘对和带侧跳闸的主要路径。我们对这些测试套件的有效性提供了新颖的经验评估。第一个评估确定每对测试套件在覆盖6个EFSM规范的EFSM错误方面是否存在差异。如果发现差异显著，我们确定哪个测试套件优于其他测试套件。第二种评估与第一种类似;然而，它是针对规范的12个Java实现的代码错误执行的。此外，提供了两种评估来确定测试套件是否比其他测试套件更好地覆盖某些类型的EFSM(或代码)错误。评估使用适当的突变分数数据转换和p值调整来控制由于多次测试导致的I型误差。此外，我们发现包含突变体对EFSM和代码错误的突变分数都有影响;因此，为了不使获得的结果无效，我们使用一个分数来删除它们。评估表明，所有用途测试的表现都优于所有其他测试;过渡行程优于边对和主路径的支线行程;单传输故障测试优于所有其他测试套件。在考虑的EFSM和代码错误域上获得了类似的结果，并且在不同类型的EFSM和代码错误的测试套件覆盖率之间没有显着差异。

{"title":"Assessing test suites of extended finite state machines against model‐ and code‐based faults","authors":"K. El-Fakih, A. Alzaatreh, Uraz Cengiz Türker","doi":"10.1002/stvr.1789","DOIUrl":"https://doi.org/10.1002/stvr.1789","url":null,"abstract":"Tests can be derived from extended finite state machine (EFSM) specifications considering the coverage of single‐transfer faults, all transitions using a transition tour, all‐uses, edge‐pair, and prime path with side trip. We provide novel empirical assessments of the effectiveness of these test suites. The first assessment determines for each pair of test suites if there is a difference between the pair in covering EFSM faults of six EFSM specifications. If the difference is found significant, we determine which test suite outperforms the other. The second assessment is similar to the first; yet, it is carried out against code faults of 12 Java implementations of the specifications. Besides, two assessments are provided to determine whether test suites have better coverage of certain classes of EFSM (or code) faults than others. The evaluation uses proper data transformation of mutation scores and p‐value adjustments for controlling Type I error due to multiple tests. Furthermore, we show that subsuming mutants have an impact on mutation scores of both EFSM and code faults; and accordingly, we use a score that removes them in order not to invalidate the obtained results. The assessments show that all‐uses tests were outperformed by all other tests; transition tours outperformed both edge‐pair and prime path with side trips; and single‐transfer fault tests outperformed all other test suites. Similar results are obtained over the considered EFSM and code fault domains, and there were no significant differences between the test suites coverage of different classes of EFSM and code faults.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"22 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77635654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Editorial: Verification, reliability and performance 编辑:验证、可靠性和性能

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2021-08-08 DOI: 10.1002/stvr.1790

R. Hierons, Tao Xie

This issue includes three papers, covering software verification, software reliability modelling and performance assessment, respectively. The first paper, ‘Verification algebra for multi-tenant applications in VaaS architecture’, by Kai Hu, Ji Wan, Kan Luo, Yuzhuang Xu, Zijing Cheng and Wei-Tek Tsai, concerns verification in multi-tenant architectures. Multi-tenant architectures support composition of services and so the rapid development of applications. The issue addressed is the potentially massive number of possible applications formed by composing a given set of services. The authors propose a verification algebra that can determine the results of verification of new combinations of property/application on the basis of different combinations of services already verified and/or the verification of different, but related, properties. The overall approach was evaluated through simulations. (Recommended by Professor Paul Strooper) The second paper, ‘Entropy based enhanced particle swarm optimization on multi-objective software reliability modelling for optimal testing resources allocation’, by Pooja Rani and G. S. Mahapatra, concerns the optimum resource allocation problem to obtain the maximum reliability and minimum total cost under the testing effort constraint. The authors formulate a multi-objective software reliability model of testing resources for a new generalized exponential reliability function to characterize dynamic allocation of total expected cost and testing effort. The authors further propose an enhanced particle swarm optimization (EPSO) to maximize software reliability and minimize allocation cost. The authors conduct experiments to demonstrate the potential of the proposed approach to predict software reliability with greater accuracy. (Recommended by Professor Moonzoo Kim) The third paper, ‘Performance assessment based on stochastic differential equation and effort data for edge computing’, by Yoshinobu Tamura and Shigeru Yamada, concerns performance assessment based on the relationship between the cloud and edge services operated by using open-source software. The authors propose a two-dimensional stochastic differential equation model that considers the unique features with uncertainty from big data under the operation of cloud and edge services. The authors analyse actual data to show numerical examples of performance assessments considering the network connectivity as characteristics of cloud and edge services and compare the noise terms of the proposed model for actual data. (Recommended by Professor Min Xie)

本课题包括三篇论文，分别包括软件验证、软件可靠性建模和性能评估。第一篇论文，“VaaS体系结构中多租户应用的验证代数”，由胡凯、万吉、罗阚、徐宇庄、程子静和蔡伟德撰写，涉及多租户体系结构中的验证。多租户体系结构支持服务组合，因此支持应用程序的快速开发。解决的问题是，通过组合一组给定的服务，可能会形成大量潜在的应用程序。作者提出了一种验证代数，可以根据已验证的服务的不同组合和/或不同但相关的属性的验证，确定对新属性/应用组合的验证结果。通过仿真对整个方案进行了评价。Pooja Rani和G. S. Mahapatra的第二篇论文“基于熵的增强粒子群优化多目标软件可靠性建模的最优测试资源分配”，研究了在测试工作量约束下获得最大可靠性和最小总成本的最优资源分配问题。本文提出了一种新的广义指数可靠性函数，建立了测试资源的多目标软件可靠性模型，以表征总期望成本和测试工作量的动态分配。作者进一步提出了一种增强粒子群优化算法(EPSO)，以实现软件可靠性最大化和分配成本最小化。作者进行了实验，以证明所提出的方法以更高的精度预测软件可靠性的潜力。第三篇论文，“基于随机微分方程和边缘计算努力数据的性能评估”，由田村吉诺和山田茂撰写，涉及基于使用开源软件运行的云和边缘服务之间关系的性能评估。作者提出了一个二维随机微分方程模型，该模型考虑了云和边缘服务运行下大数据具有不确定性的独特特征。作者分析了实际数据，以显示考虑网络连接作为云和边缘服务特征的性能评估的数值示例，并比较了实际数据中所提出模型的噪声项。(谢敏教授推荐)

{"title":"Editorial: Verification, reliability and performance","authors":"R. Hierons, Tao Xie","doi":"10.1002/stvr.1790","DOIUrl":"https://doi.org/10.1002/stvr.1790","url":null,"abstract":"This issue includes three papers, covering software verification, software reliability modelling and performance assessment, respectively. The first paper, ‘Verification algebra for multi-tenant applications in VaaS architecture’, by Kai Hu, Ji Wan, Kan Luo, Yuzhuang Xu, Zijing Cheng and Wei-Tek Tsai, concerns verification in multi-tenant architectures. Multi-tenant architectures support composition of services and so the rapid development of applications. The issue addressed is the potentially massive number of possible applications formed by composing a given set of services. The authors propose a verification algebra that can determine the results of verification of new combinations of property/application on the basis of different combinations of services already verified and/or the verification of different, but related, properties. The overall approach was evaluated through simulations. (Recommended by Professor Paul Strooper) The second paper, ‘Entropy based enhanced particle swarm optimization on multi-objective software reliability modelling for optimal testing resources allocation’, by Pooja Rani and G. S. Mahapatra, concerns the optimum resource allocation problem to obtain the maximum reliability and minimum total cost under the testing effort constraint. The authors formulate a multi-objective software reliability model of testing resources for a new generalized exponential reliability function to characterize dynamic allocation of total expected cost and testing effort. The authors further propose an enhanced particle swarm optimization (EPSO) to maximize software reliability and minimize allocation cost. The authors conduct experiments to demonstrate the potential of the proposed approach to predict software reliability with greater accuracy. (Recommended by Professor Moonzoo Kim) The third paper, ‘Performance assessment based on stochastic differential equation and effort data for edge computing’, by Yoshinobu Tamura and Shigeru Yamada, concerns performance assessment based on the relationship between the cloud and edge services operated by using open-source software. The authors propose a two-dimensional stochastic differential equation model that considers the unique features with uncertainty from big data under the operation of cloud and edge services. The authors analyse actual data to show numerical examples of performance assessments considering the network connectivity as characteristics of cloud and edge services and compare the noise terms of the proposed model for actual data. (Recommended by Professor Min Xie)","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"7 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87148288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TRANSMUT‐Spark: Transformation mutation for Apache Spark TRANSMUT‐Spark: Apache Spark的转换突变

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2021-08-05 DOI: 10.1002/stvr.1809

J. Neto, A. Moreira, Genoveva Vargas-Solar, M. A. Musicante

This paper proposes TRANSMUT‐Spark for automating mutation testing of big data processing code within Spark programs. Apache Spark is an engine for big data analytics/processing that hides the inherent complexity of parallel big data programming. Nonetheless, programmers must cleverly combine Spark built‐in functions within programs and guide the engine to use the right data management strategies to exploit the computational resources required by big data processing and avoid substantial production losses. Many programming details in Spark data processing code are prone to false statements that must be correctly and automatically tested. This paper explores the application of mutation testing in Spark programs, a fault‐based testing technique that relies on fault simulation to evaluate and design test sets. The paper introduces TRANSMUT‐Spark for testing Spark programs by automating the most laborious steps of the process and fully executing the mutation testing process. The paper describes how the TRANSMUT‐Spark automates the mutant generation, test execution and adequacy analysis phases of mutation testing. It also discusses the results of experiments to validate the tool and argues its scope and limitations.

本文提出了TRANSMUT‐Spark，用于Spark程序中大数据处理代码的自动化突变测试。Apache Spark是一个用于大数据分析/处理的引擎，它隐藏了并行大数据编程固有的复杂性。尽管如此，程序员必须巧妙地在程序中结合Spark内置的功能，并引导引擎使用正确的数据管理策略来利用大数据处理所需的计算资源，避免大量的生产损失。Spark数据处理代码中的许多编程细节都容易出现错误语句，必须对这些错误语句进行正确和自动的测试。本文探讨了突变测试在Spark程序中的应用，这是一种基于故障的测试技术，它依赖于故障模拟来评估和设计测试集。本文介绍了用于测试Spark程序的TRANSMUT - Spark，通过自动化过程中最费力的步骤并完全执行突变测试过程。本文描述了TRANSMUT - Spark如何自动化突变生成、测试执行和突变测试的充分性分析阶段。本文还讨论了验证该工具的实验结果，并讨论了其范围和局限性。

{"title":"TRANSMUT‐Spark: Transformation mutation for Apache Spark","authors":"J. Neto, A. Moreira, Genoveva Vargas-Solar, M. A. Musicante","doi":"10.1002/stvr.1809","DOIUrl":"https://doi.org/10.1002/stvr.1809","url":null,"abstract":"This paper proposes TRANSMUT‐Spark for automating mutation testing of big data processing code within Spark programs. Apache Spark is an engine for big data analytics/processing that hides the inherent complexity of parallel big data programming. Nonetheless, programmers must cleverly combine Spark built‐in functions within programs and guide the engine to use the right data management strategies to exploit the computational resources required by big data processing and avoid substantial production losses. Many programming details in Spark data processing code are prone to false statements that must be correctly and automatically tested. This paper explores the application of mutation testing in Spark programs, a fault‐based testing technique that relies on fault simulation to evaluate and design test sets. The paper introduces TRANSMUT‐Spark for testing Spark programs by automating the most laborious steps of the process and fully executing the mutation testing process. The paper describes how the TRANSMUT‐Spark automates the mutant generation, test execution and adequacy analysis phases of mutation testing. It also discusses the results of experiments to validate the tool and argues its scope and limitations.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"93 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87502544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Editorial: Testing, Debugging, and Defect Prediction 编辑:测试、调试和缺陷预测

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2021-08-01 DOI: 10.1002/stvr.1775

R. Hierons, Tao Xie

This issue includes four papers, covering performance mutation testing, performance regression localization, fault detection and localization, and defect prediction, respectively. The first paper, by Pedro Delgado-Pérez, Ana Belén Sánchez, Sergio Segura and Inmaculada Medina-Bulo, concerns feasibility of applying performance mutation testing (i.e. applying mutation testing to assess performance tests) at the source-code level in general-purpose languages. To successfully apply performance mutation testing, the authors find it necessary to design specific mutation operators and mechanisms to evaluate the outputs. The authors define and evaluate seven new performance mutation operators to model known bug-inducing patterns. The authors report the results of experimental evaluation on open-source C++ programs. (Recommended by Professor Hyunsook Do) The second paper, by Frolin S. Ocariza Jr. and Boyang Zhao, considers the problem of finding the causes of performance regression in software. Here, a performance regression is an increase in response time as a result of changes to the software. The paper describes a design, called ZAM, that automates the process of comparing execution timelines collected from web applications. Such timelines are used as the basis for finding the causes of performance regression. A number of challenges are introduced by the context in which, for example, timing information is typically noisy. The authors report the results of experimental evaluation and also experience in using the approach. (Recommended by Professor T. H. Tse) The third paper, by Rawad Abou Assi, Wes Masri and Chadi Trad, concerns coincidental correctness and its impact on fault detection and localization. The authors consider weak coincidental correctness, in which a faulty statement is executed but this does not lead to an infected state. They also consider strong coincidental correctness, in which the execution of a faulty statement leads to an infected state but does not lead to incorrect output. The authors empirically investigated the effect of coincidental correctness on three classes of technique: spectrum-based fault localization (SBFL), test suite reduction (TSR) and test case prioritization (TCP). Interestingly, there was significant variation with, for example, evidence that coincidental correctness has a greater impact on TSR and TCP than on SBFL. (Recommended by Professor Hyunsook Do) The fourth paper, by Zeinab Eivazpour and Mohammad Reza Keyvanpour, concerns the cost issue when handling the class imbalance problem over the training dataset in software defect prediction. The authors propose the cost-sensitive stacked generalization (CSSG) approach. This approach combines the staking ensemble learning method with cost-sensitive learning, which aims to reduce misclassification costs. In the CSSG approach, the logistic regression classifier and extra randomized trees ensemble method in cost-sensitive learning and cost-insensitive conditions ar

本期包括四篇论文，分别涵盖性能突变测试、性能回归定位、故障检测与定位、缺陷预测。第一篇论文由Pedro delgado - prez、Ana belSánchez、Sergio Segura和Inmaculada Medina-Bulo撰写，讨论了在通用语言的源代码级别应用性能突变测试(即应用突变测试来评估性能测试)的可行性。为了成功地应用性能突变测试，作者发现有必要设计特定的突变算子和机制来评估输出。作者定义并评估了七个新的性能突变操作符来模拟已知的bug诱导模式。作者报告了对开源c++程序进行实验评估的结果。第二篇论文由Frolin S. Ocariza Jr.和Boyang Zhao撰写，讨论了寻找软件性能退化原因的问题。在这里，性能回归是由于软件更改而导致的响应时间的增加。这篇论文描述了一种名为ZAM的设计，它可以自动比较从web应用程序收集的执行时间线。这些时间线被用作查找性能退化原因的基础。在这种情况下会带来许多挑战，例如，定时信息通常是有噪声的。作者报告了实验评估的结果和使用该方法的经验。第三篇论文由Rawad Abou Assi, Wes Masri和Chadi Trad撰写，涉及巧合正确性及其对故障检测和定位的影响。作者考虑了弱巧合正确性，即执行了错误的语句，但这不会导致感染状态。他们还考虑了强巧合正确性，在这种情况下，执行错误语句会导致感染状态，但不会导致错误输出。作者实证研究了巧合正确性对基于频谱的故障定位(SBFL)、测试套件缩减(TSR)和测试用例优先级(TCP)三种技术的影响。有趣的是，存在显著的差异，例如，有证据表明，巧合正确性对TSR和TCP的影响大于对SBFL的影响。第四篇论文是Zeinab Eivazpour和Mohammad Reza Keyvanpour撰写的，研究了在软件缺陷预测中处理训练数据集的类不平衡问题时的成本问题。作者提出了代价敏感的堆叠泛化(CSSG)方法。该方法将赌注集成学习方法与代价敏感学习相结合，旨在降低错误分类的代价。在CSSG方法中，采用代价敏感学习和代价不敏感条件下的逻辑回归分类器和额外随机树集成方法作为堆叠方案的最终分类器。作者报告了实验评价结果。(推荐:杜贤淑教授)

{"title":"Editorial: Testing, Debugging, and Defect Prediction","authors":"R. Hierons, Tao Xie","doi":"10.1002/stvr.1775","DOIUrl":"https://doi.org/10.1002/stvr.1775","url":null,"abstract":"This issue includes four papers, covering performance mutation testing, performance regression localization, fault detection and localization, and defect prediction, respectively. The first paper, by Pedro Delgado-Pérez, Ana Belén Sánchez, Sergio Segura and Inmaculada Medina-Bulo, concerns feasibility of applying performance mutation testing (i.e. applying mutation testing to assess performance tests) at the source-code level in general-purpose languages. To successfully apply performance mutation testing, the authors find it necessary to design specific mutation operators and mechanisms to evaluate the outputs. The authors define and evaluate seven new performance mutation operators to model known bug-inducing patterns. The authors report the results of experimental evaluation on open-source C++ programs. (Recommended by Professor Hyunsook Do) The second paper, by Frolin S. Ocariza Jr. and Boyang Zhao, considers the problem of finding the causes of performance regression in software. Here, a performance regression is an increase in response time as a result of changes to the software. The paper describes a design, called ZAM, that automates the process of comparing execution timelines collected from web applications. Such timelines are used as the basis for finding the causes of performance regression. A number of challenges are introduced by the context in which, for example, timing information is typically noisy. The authors report the results of experimental evaluation and also experience in using the approach. (Recommended by Professor T. H. Tse) The third paper, by Rawad Abou Assi, Wes Masri and Chadi Trad, concerns coincidental correctness and its impact on fault detection and localization. The authors consider weak coincidental correctness, in which a faulty statement is executed but this does not lead to an infected state. They also consider strong coincidental correctness, in which the execution of a faulty statement leads to an infected state but does not lead to incorrect output. The authors empirically investigated the effect of coincidental correctness on three classes of technique: spectrum-based fault localization (SBFL), test suite reduction (TSR) and test case prioritization (TCP). Interestingly, there was significant variation with, for example, evidence that coincidental correctness has a greater impact on TSR and TCP than on SBFL. (Recommended by Professor Hyunsook Do) The fourth paper, by Zeinab Eivazpour and Mohammad Reza Keyvanpour, concerns the cost issue when handling the class imbalance problem over the training dataset in software defect prediction. The authors propose the cost-sensitive stacked generalization (CSSG) approach. This approach combines the staking ensemble learning method with cost-sensitive learning, which aims to reduce misclassification costs. In the CSSG approach, the logistic regression classifier and extra randomized trees ensemble method in cost-sensitive learning and cost-insensitive conditions ar","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"6 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90430547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Model checking C++ programs 模型检查c++程序

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2021-07-02 DOI: 10.1002/stvr.1793

Felipe R. Monteiro, M. R. Gadelha, L. Cordeiro

In the last three decades, memory safety issues in system programming languages such as C or C++ have been one of the most significant sources of security vulnerabilities. However, there exist only a few attempts with limited success to cope with the complexity of C++ program verification. We describe and evaluate a novel verification approach based on bounded model checking (BMC) and satisfiability modulo theories (SMT) to verify C++ programs. Our verification approach analyses bounded C++ programs by encoding into SMT various sophisticated features that the C++ programming language offers, such as templates, inheritance, polymorphism, exception handling, and the Standard Template Libraries. We formalize these features within our formal verification framework using a decidable fragment of first‐order logic and then show how state‐of‐the‐art SMT solvers can efficiently handle that. We implemented our verification approach on top of ESBMC. We compare ESBMC to LLBMC and DIVINE, which are state‐of‐the‐art verifiers to check C++ programs directly from the LLVM bitcode. Experimental results show that ESBMC can handle a wide range of C++ programs, presenting a higher number of correct verification results. Additionally, ESBMC has been applied to a commercial C++ application in the telecommunication domain and successfully detected arithmetic‐overflow errors, which could potentially lead to security vulnerabilities.

在过去的三十年中，系统编程语言(如C或c++)中的内存安全问题一直是安全漏洞的最重要来源之一。然而，在处理c++程序验证的复杂性方面，只有少数几次尝试取得了有限的成功。本文描述并评估了一种基于有界模型检验(BMC)和可满足模理论(SMT)的验证c++程序的新方法。我们的验证方法通过将c++编程语言提供的各种复杂特性(如模板、继承、多态性、异常处理和标准模板库)编码到SMT中来分析有界的c++程序。我们在我们的形式化验证框架中使用一阶逻辑的可确定片段形式化这些特征，然后展示最先进的SMT求解器如何有效地处理这些特征。我们在ESBMC之上实现了验证方法。我们将ESBMC与LLBMC和DIVINE进行比较，它们是最先进的验证器，可以直接从LLVM位码检查c++程序。实验结果表明，ESBMC可以处理大量的c++程序，给出了较高的正确率验证结果。此外，ESBMC已应用于电信领域的商业c++应用程序，并成功检测到可能导致安全漏洞的算术溢出错误。

{"title":"Model checking C++ programs","authors":"Felipe R. Monteiro, M. R. Gadelha, L. Cordeiro","doi":"10.1002/stvr.1793","DOIUrl":"https://doi.org/10.1002/stvr.1793","url":null,"abstract":"In the last three decades, memory safety issues in system programming languages such as C or C++ have been one of the most significant sources of security vulnerabilities. However, there exist only a few attempts with limited success to cope with the complexity of C++ program verification. We describe and evaluate a novel verification approach based on bounded model checking (BMC) and satisfiability modulo theories (SMT) to verify C++ programs. Our verification approach analyses bounded C++ programs by encoding into SMT various sophisticated features that the C++ programming language offers, such as templates, inheritance, polymorphism, exception handling, and the Standard Template Libraries. We formalize these features within our formal verification framework using a decidable fragment of first‐order logic and then show how state‐of‐the‐art SMT solvers can efficiently handle that. We implemented our verification approach on top of ESBMC. We compare ESBMC to LLBMC and DIVINE, which are state‐of‐the‐art verifiers to check C++ programs directly from the LLVM bitcode. Experimental results show that ESBMC can handle a wide range of C++ programs, presenting a higher number of correct verification results. Additionally, ESBMC has been applied to a commercial C++ application in the telecommunication domain and successfully detected arithmetic‐overflow errors, which could potentially lead to security vulnerabilities.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"77 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85732473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Effective grey‐box testing with partial FSM models 部分FSM模型的有效灰盒测试

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2021-06-27 DOI: 10.1002/stvr.1806

Robert Sachtleben, J. Peleska

For partial, nondeterministic, finite state machines, a new conformance relation called strong reduction is presented. It complements other existing conformance relations in the sense that the new relation is well suited for model‐based testing of systems whose inputs are enabled or disabled, depending on the actual system state. Examples of such systems are graphical user interfaces and systems with interfaces that can be enabled or disabled in a mechanical way. We present a new test generation algorithm producing complete test suites for strong reduction. The suites are executed according to the grey‐box testing paradigm: it is assumed that the state‐dependent sets of enabled inputs can be identified during test execution, while the implementation states remain hidden, as in black‐box testing. We show that this grey‐box information is exploited by the generation algorithm in such a way that the resulting best‐case test suite size is only linear in the state space size of the reference model. Moreover, examples show that this may lead to significant reductions of test suite size in comparison to true black‐box testing for strong reduction.

对于部分不确定性有限状态机，提出了一种新的一致性关系，称为强约简。它补充了其他现有的一致性关系，因为新关系非常适合基于模型的系统测试，这些系统的输入是启用的还是禁用的，取决于实际的系统状态。这种系统的例子是图形用户界面和具有可以以机械方式启用或禁用的界面的系统。我们提出了一种新的测试生成算法，用于生成完整的测试套件。套件是根据灰盒测试范式执行的:假设在测试执行期间可以识别启用输入的状态依赖集，而实现状态保持隐藏，就像在黑盒测试中一样。我们表明，生成算法以这样一种方式利用了这些灰盒信息，从而得到的最佳情况测试套件大小在参考模型的状态空间大小中仅是线性的。此外，实例表明，与真正的黑盒测试相比，这可能导致测试套件大小的显著减少。

引用次数: 2

JUGE: An infrastructure for benchmarking Java unit test generators JUGE:对Java单元测试生成器进行基准测试的基础设施

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2021-06-14 DOI: 10.1002/stvr.1838

Xavier Devroey, Alessio Gambi, Juan P. Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, Sebastiano Panichella

Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present our JUnit Generation Benchmarking Infrastructure (JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place where JUGE was used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact of JUGE in improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, the JUGE infrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.

研究者和实践者已经设计并实现了各种自动化的测试用例生成器来支持有效的软件测试。这些生成器适用于各种语言(例如Java、c#或Python)和各种平台(例如桌面、web或移动应用程序)。生成器表现出不同的有效性和效率，这取决于它们旨在满足的测试目标(例如，库的单元测试与整个应用程序的系统测试)和它们实现的底层技术。在这种情况下，从业者需要能够比较不同的生成器，以确定最适合他们的需求，而研究人员则寻求确定未来的研究方向。这可以通过系统地执行不同生成器的大规模评估来实现。然而，执行这样的经验评估并不是微不足道的，并且需要大量的努力来选择适当的基准，设置评估基础结构，并收集和分析结果。在本软件说明中，我们介绍了我们的JUnit生成基准基础设施(JUGE)，它支持生成器(基于搜索的、基于随机的、符号执行的等)，旨在为各种目的(验证、回归测试、故障定位等)自动生成单元测试。主要目标是减少总体基准测试工作，简化几个生成器的比较，并通过标准化评估和比较过程加强学术界和工业界之间的知识转移。自2013年以来，已经举办了几个版本的单元测试工具竞赛，这些竞赛与基于搜索的软件测试研讨会(Search - Based Software testing Workshop)同地举行，在那里使用和发展了JUGE。因此，越来越多来自学术界和工业界的工具(超过10种)在JUGE上进行了评估，这些工具经过多年的发展已经成熟，并且可以确定未来的研究方向。基于从竞赛中获得的经验，我们讨论了JUGE在改善学术界和工业界之间测试生成工具和方法的知识转移方面的预期影响。实际上，JUGE基础架构展示了一种实现设计，它足够灵活，可以集成额外的单元测试生成工具，这对开发人员来说是实用的，并且允许研究人员试验新的和先进的单元测试工具和方法。

{"title":"JUGE: An infrastructure for benchmarking Java unit test generators","authors":"Xavier Devroey, Alessio Gambi, Juan P. Galeotti, René Just, Fitsum Meshesha Kifetew, Annibale Panichella, Sebastiano Panichella","doi":"10.1002/stvr.1838","DOIUrl":"https://doi.org/10.1002/stvr.1838","url":null,"abstract":"Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present our JUnit Generation Benchmarking Infrastructure (JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place where JUGE was used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated on JUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact of JUGE in improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, the JUGE infrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"120 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75786178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

An ensemble‐based predictive mutation testing approach that considers impact of unreached mutants 一种基于集合的预测突变检测方法，考虑了未到达突变的影响

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2021-06-02 DOI: 10.1002/stvr.1784

Alireza Aghamohammadi, S. Mirian-Hosseinabadi

Predictive mutation testing (PMT) is a technique to predict whether a mutant is killed, using machine learning approaches. Researchers have proposed various methods for PMT over the years. However, the impact of unreached mutants on PMT is not fully addressed. A mutant is unreached if the statement on which the mutant is generated is not executed by any test cases. We aim at showing that unreached mutants can inflate PMT results. Moreover, we propose an alternative approach to PMT, suggesting a different interpretation for PMT. To this end, we replicated the previous PMT research. We empirically evaluated the suggested approach on 654 Java projects provided by prior literature. Our results indicate that the performance of PMT drastically decreases in terms of area under a receiver operating characteristic curve (AUC) from 0.833 to 0.517. Furthermore, PMT performs worse than random guesses on 27% of the projects. The proposed approach improves the PMT results, achieving the average AUC value of 0.613. As a result, we recommend researchers to remove unreached mutants when reporting the results.

预测突变测试(PMT)是一种利用机器学习方法预测突变体是否被杀死的技术。多年来，研究人员提出了各种治疗经前症候群的方法。然而，未到达突变体对PMT的影响尚未得到充分解决。如果生成突变的语句没有被任何测试用例执行，那么突变就是未到达的。我们的目标是表明未达到的突变体可以增加PMT的结果。此外，我们提出了PMT的另一种方法，提出了PMT的不同解释。为此，我们重复了之前的PMT研究。我们对先前文献提供的654个Java项目的建议方法进行了实证评估。我们的研究结果表明，PMT的性能在接收机工作特性曲线下的面积(AUC)从0.833急剧下降到0.517。此外，PMT在27%的项目中比随机猜测表现得更差。该方法改进了PMT结果，平均AUC值为0.613。因此，我们建议研究人员在报告结果时删除未到达的突变体。

引用次数: 2

The IEEE 12th International Conference on Software Testing, Verification & Validation 第十二届IEEE软件测试、验证与确认国际会议

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2021-06-01 DOI: 10.1002/stvr.1773

A. Memon, Myra B. Cohen

The IEEE 12th International Conference on Software Testing, Verification & Validation (ICST 2019) was held in Xi’an, China. The aim of the ICST conference is to bring together researchers and practitioners who study the theory, techniques, technologies, and applications that concern all aspects of software testing, verification, and validation of software systems. The program committee rigorously reviewed 110 full papers using a double-blind reviewing policy. Each paper received at least three regular reviews and went through a discussion phase where the reviewers made final decisions on each paper, each discussion being led by a meta-reviewer. Out of this process, the committee selected 31 full-length papers that appeared in the conference. These were presented over nine sessions ranging from classical topics such as test generation and test coverage to emerging topics such as machine learning and security during the main conference track. Based on the original reviewers’ feedback, we selected five papers for consideration for this special issue of STVR. These papers were extended from their conference version by the authors and were reviewed according to the standard STVR reviewing process. We thank all the ICST and STVR reviewers for their hardwork. Three papers successfully completed the reviewprocess and are contained in this special issue. The rest of this editorial provides a brief overview of these three papers. The first paper, Automated Visual Classification of DOM-based Presentation Failure Reports for Responsive Web Pages, by Ibrahim Althomali, Gregory Kapfhammer, and Phil McMinn, introduces VERVE, a tool that automatically classifies all hard to detect response layout failures (RLFs) in web applications. An empirical study reveals that VERVE’s classification of all five types of RLFs frequently agrees with classifications produced manually by humans. The second paper, BugsJS: A Benchmark and Taxonomy of JavaScript Bugs, by Péter Gyimesi, Béla Vancsics, Andrea Stocco, Davood Mazinanian, Árpád Beszédes, Rudolf Ferenc, and Ali Mesbah, presents, BugsJS, a benchmark of 453 real, manually validated JavaScript bugs from 10 popular JavaScript server-side programs, comprising 444 k LOC in total. Each bug is accompanied by its bug report, the test cases that expose it, as well as the patch that fixes it. BugJS can help facilitate reproducible empirical studies and comparisons of JavaScript analysis and testing tools. The third paper, Statically Driven Generation of Concurrent Tests for Thread-Safe Classes, by Valerio Terragni andMauro Pezzè presentsDEPCON, a novel approach that reduces the search space ofconcurrent tests by leveraging statically computeddependencies amongpublicmethods.DEPCON exploits the intuition that concurrent tests can expose thread-safety violations thatmanifest exceptions or deadlocks, only if they exercise some specific method dependencies. The results show that DEPCON is more effective than state-of-the-art approaches

第十二届IEEE软件测试、验证与验证国际会议(ICST 2019)在中国西安举行。ICST会议的目的是将研究理论、技术、技术和应用的研究人员和实践者聚集在一起，这些理论、技术、技术和应用涉及软件测试、验证和软件系统验证的各个方面。项目委员会采用双盲审查政策严格审查了110篇完整论文。每篇论文至少接受三次定期评审，并经过一个讨论阶段，审稿人对每篇论文做出最终决定，每次讨论都由元审稿人领导。在这个过程中，委员会选出了31篇发表在会议上的论文。这些内容在9个会议上进行了介绍，从经典主题(如测试生成和测试覆盖)到新兴主题(如机器学习和安全)。根据原审稿人的反馈，我们选择了5篇论文作为本期《STVR》特刊的考虑对象。这些论文是作者从会议版本中扩展出来的，并按照标准的STVR审查程序进行了审查。我们感谢所有ICST和STVR审稿人的辛勤工作。三篇论文成功地完成了审查过程，并包含在本期特刊中。这篇社论的其余部分提供了这三篇论文的简要概述。第一篇论文是Ibrahim Althomali、Gregory Kapfhammer和Phil McMinn撰写的《基于dom的响应性网页呈现失败报告的自动视觉分类》，介绍了VERVE，一个自动分类Web应用程序中所有难以检测的响应布局失败(rlf)的工具。一项实证研究表明，VERVE对所有五种类型的rlf的分类与人类手动生成的分类经常一致。第二篇论文，《BugsJS: JavaScript bug的基准和分类》，由passater Gyimesi, bassala vancics, Andrea Stocco, Davood Mazinanian， Árpád besz， Rudolf Ferenc和Ali Mesbah撰写，介绍了BugsJS，一个来自10个流行的JavaScript服务器端程序的453个真实的，手动验证的JavaScript bug的基准，总共包含444 k LOC。每个错误都伴随着它的错误报告，暴露它的测试用例，以及修复它的补丁。BugJS可以帮助促进JavaScript分析和测试工具的可重复的经验研究和比较。Valerio Terragni和mauro的第三篇论文《线程安全类的静态驱动并发测试生成》(Pezzè)提出了depcon，这是一种通过利用公共方法之间的静态计算依赖关系来减少并发测试搜索空间的新方法。DEPCON利用了这样一种直觉:只有当并发测试执行一些特定的方法依赖时，它们才会暴露线程安全违反，从而显示异常或死锁。结果表明，在暴露并发性错误方面，DEPCON比最先进的方法更有效。

{"title":"The IEEE 12th International Conference on Software Testing, Verification & Validation","authors":"A. Memon, Myra B. Cohen","doi":"10.1002/stvr.1773","DOIUrl":"https://doi.org/10.1002/stvr.1773","url":null,"abstract":"The IEEE 12th International Conference on Software Testing, Verification & Validation (ICST 2019) was held in Xi’an, China. The aim of the ICST conference is to bring together researchers and practitioners who study the theory, techniques, technologies, and applications that concern all aspects of software testing, verification, and validation of software systems. The program committee rigorously reviewed 110 full papers using a double-blind reviewing policy. Each paper received at least three regular reviews and went through a discussion phase where the reviewers made final decisions on each paper, each discussion being led by a meta-reviewer. Out of this process, the committee selected 31 full-length papers that appeared in the conference. These were presented over nine sessions ranging from classical topics such as test generation and test coverage to emerging topics such as machine learning and security during the main conference track. Based on the original reviewers’ feedback, we selected five papers for consideration for this special issue of STVR. These papers were extended from their conference version by the authors and were reviewed according to the standard STVR reviewing process. We thank all the ICST and STVR reviewers for their hardwork. Three papers successfully completed the reviewprocess and are contained in this special issue. The rest of this editorial provides a brief overview of these three papers. The first paper, Automated Visual Classification of DOM-based Presentation Failure Reports for Responsive Web Pages, by Ibrahim Althomali, Gregory Kapfhammer, and Phil McMinn, introduces VERVE, a tool that automatically classifies all hard to detect response layout failures (RLFs) in web applications. An empirical study reveals that VERVE’s classification of all five types of RLFs frequently agrees with classifications produced manually by humans. The second paper, BugsJS: A Benchmark and Taxonomy of JavaScript Bugs, by Péter Gyimesi, Béla Vancsics, Andrea Stocco, Davood Mazinanian, Árpád Beszédes, Rudolf Ferenc, and Ali Mesbah, presents, BugsJS, a benchmark of 453 real, manually validated JavaScript bugs from 10 popular JavaScript server-side programs, comprising 444 k LOC in total. Each bug is accompanied by its bug report, the test cases that expose it, as well as the patch that fixes it. BugJS can help facilitate reproducible empirical studies and comparisons of JavaScript analysis and testing tools. The third paper, Statically Driven Generation of Concurrent Tests for Thread-Safe Classes, by Valerio Terragni andMauro Pezzè presentsDEPCON, a novel approach that reduces the search space ofconcurrent tests by leveraging statically computeddependencies amongpublicmethods.DEPCON exploits the intuition that concurrent tests can expose thread-safety violations thatmanifest exceptions or deadlocks, only if they exercise some specific method dependencies. The results show that DEPCON is more effective than state-of-the-art approaches ","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"28 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81871843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0