2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)最新文献

英文中文

Patterns to Improve Fidelity for Model-Based Testing 提高基于模型测试保真度的模式

2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Pub Date : 2022-04-01 DOI: 10.1109/ICSTW55395.2022.00049

Yasuaki Hiruta, Hidetoshi Suhara, Y. Nishi

Model-based Testing (MBT), is an important testing technology to generate test cases automatically by MBT tools. While a single MBT tool supports a single MBT model, it is necessary to integrate implicitly multiple models to detect failures manually such as exploratory testing in industry. A single-model-MBT lacks fidelity which means the ability to generate the required test cases. In this research, we analyse such failures to identify implicit multiple models. Several patterns on integration of multiple models are discovered. This paper introduces the patterns and examples of test case generation to improve fidelity.

基于模型的测试(MBT)是一种通过MBT工具自动生成测试用例的重要测试技术。虽然单个MBT工具支持单个MBT模型，但有必要隐式集成多个模型以手动检测故障，例如工业中的探索性测试。单模型mbt缺乏保真度，这意味着无法生成所需的测试用例。在本研究中，我们分析了这些失败，以识别隐含的多重模型。发现了多模型集成的几种模式。本文介绍了测试用例生成的模式和示例，以提高测试用例的保真度。

引用次数: 0

Win GUI Crawler: A tool prototype for desktop GUI image and metadata collection Win GUI Crawler:桌面GUI图像和元数据收集的工具原型

2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Pub Date : 2022-04-01 DOI: 10.1109/ICSTW55395.2022.00046

Marko Savić, M. Mäntylä, Maëlick Claes

Despite the widespread of test automation, automatic testing of graphical user interfaces (GUI) remains a challenge. This is partly due to the difficulty of reliably identifying GUI elements over different versions of a given software system. Machine vision techniques could be a potential way of addressing this issue by automatically identifying GUI elements with the help of machine learning. However, developing a GUI testing tool relying on automatic identification of graphical elements first requires to acquire large amount of labeled data. In this paper, we present Win GUI Crawler, a tool for automatically gathering such data from Microsoft Windows GUI applications. The tool is based on Microsoft Windows Application Driver and performs actions on the GUI using a depth-first traversal of the GUI element tree. For each action performed by the crawler, screenshots are taken and metadata is extracted for each of the different screens. Bounding boxes of GUI elements are then filtered in order to identify what GUI elements are actually visible on the screen. Win GUI Crawler is then evaluated on several popular Windows applications and the current limitations are discussed.

尽管自动化测试的普及，图形用户界面(GUI)的自动化测试仍然是一个挑战。这部分是由于难以在给定软件系统的不同版本上可靠地识别GUI元素。机器视觉技术可以通过在机器学习的帮助下自动识别GUI元素来解决这个问题。然而，开发依赖于图形元素自动识别的GUI测试工具首先需要获取大量的标记数据。在本文中，我们介绍了Win GUI Crawler，一个从Microsoft Windows GUI应用程序自动收集此类数据的工具。该工具基于Microsoft Windows Application Driver，并使用GUI元素树的深度优先遍历在GUI上执行操作。对于爬虫执行的每个操作，都会截取屏幕截图，并为每个不同的屏幕提取元数据。然后过滤GUI元素的边界框，以确定哪些GUI元素在屏幕上实际可见。然后在几种流行的Windows应用程序上对Win GUI Crawler进行了评估，并讨论了当前的限制。

{"title":"Win GUI Crawler: A tool prototype for desktop GUI image and metadata collection","authors":"Marko Savić, M. Mäntylä, Maëlick Claes","doi":"10.1109/ICSTW55395.2022.00046","DOIUrl":"https://doi.org/10.1109/ICSTW55395.2022.00046","url":null,"abstract":"Despite the widespread of test automation, automatic testing of graphical user interfaces (GUI) remains a challenge. This is partly due to the difficulty of reliably identifying GUI elements over different versions of a given software system. Machine vision techniques could be a potential way of addressing this issue by automatically identifying GUI elements with the help of machine learning. However, developing a GUI testing tool relying on automatic identification of graphical elements first requires to acquire large amount of labeled data. In this paper, we present Win GUI Crawler, a tool for automatically gathering such data from Microsoft Windows GUI applications. The tool is based on Microsoft Windows Application Driver and performs actions on the GUI using a depth-first traversal of the GUI element tree. For each action performed by the crawler, screenshots are taken and metadata is extracted for each of the different screens. Bounding boxes of GUI elements are then filtered in order to identify what GUI elements are actually visible on the screen. Win GUI Crawler is then evaluated on several popular Windows applications and the current limitations are discussed.","PeriodicalId":147133,"journal":{"name":"2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122920165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Combinatorial Approach to Fairness Testing of Machine Learning Models 机器学习模型公平性测试的组合方法

2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Pub Date : 2022-04-01 DOI: 10.1109/ICSTW55395.2022.00030

A. Patel, Jaganmohan Chandrasekaran, Yu Lei, R. Kacker, D. R. Kuhn

Machine Learning (ML) models could exhibit biased behavior, or algorithmic discrimination, resulting in unfair or discriminatory outcomes. The bias in the ML model could emanate from various factors such as the training dataset, the choice of the ML algorithm, or the hyperparameters used to train the ML model. In addition to evaluating the model’s correctness, it is essential to test ML models for fair and unbiased behavior. In this paper, we present a combinatorial testing-based approach to perform fairness testing of ML models. Our approach is model agnostic and evaluates fairness violations of a pre-trained ML model in a two-step process. In the first step, we create an input parameter model from the training data set and then use the model to generate a t-way test set. In the second step, for each test, we modify the value of one or more protected attributes to see if we could find fairness violations. We performed an experimental evaluation of the proposed approach using ML models trained with tabular datasets. The results suggest that the proposed approach can successfully identify fairness violations in pre-trained ML models.

机器学习(ML)模型可能会表现出偏见行为或算法歧视，从而导致不公平或歧视性的结果。机器学习模型中的偏差可能来自各种因素，如训练数据集、机器学习算法的选择或用于训练机器学习模型的超参数。除了评估模型的正确性之外，还必须测试ML模型的公平和无偏行为。在本文中，我们提出了一种基于组合测试的方法来执行ML模型的公平性测试。我们的方法是模型不可知的，并在两步过程中评估预训练ML模型的公平性违规。在第一步中，我们从训练数据集创建一个输入参数模型，然后使用该模型生成一个t-way测试集。在第二步中，对于每个测试，我们修改一个或多个受保护属性的值，以查看是否可以找到违反公平性的情况。我们使用表格数据集训练的ML模型对所提出的方法进行了实验评估。结果表明，该方法可以在预训练的ML模型中成功识别出违反公平性的行为。

{"title":"A Combinatorial Approach to Fairness Testing of Machine Learning Models","authors":"A. Patel, Jaganmohan Chandrasekaran, Yu Lei, R. Kacker, D. R. Kuhn","doi":"10.1109/ICSTW55395.2022.00030","DOIUrl":"https://doi.org/10.1109/ICSTW55395.2022.00030","url":null,"abstract":"Machine Learning (ML) models could exhibit biased behavior, or algorithmic discrimination, resulting in unfair or discriminatory outcomes. The bias in the ML model could emanate from various factors such as the training dataset, the choice of the ML algorithm, or the hyperparameters used to train the ML model. In addition to evaluating the model’s correctness, it is essential to test ML models for fair and unbiased behavior. In this paper, we present a combinatorial testing-based approach to perform fairness testing of ML models. Our approach is model agnostic and evaluates fairness violations of a pre-trained ML model in a two-step process. In the first step, we create an input parameter model from the training data set and then use the model to generate a t-way test set. In the second step, for each test, we modify the value of one or more protected attributes to see if we could find fairness violations. We performed an experimental evaluation of the proposed approach using ML models trained with tabular datasets. The results suggest that the proposed approach can successfully identify fairness violations in pre-trained ML models.","PeriodicalId":147133,"journal":{"name":"2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124704715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Software Bug Prediction Model Based on Mathematical Graph Features Metrics 基于数学图特征度量的软件Bug预测模型

2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Pub Date : 2022-04-01 DOI: 10.1109/ICSTW55395.2022.00047

Tomohiro Takeda, Satoshi Masuda, K. Tsuda

Quality assurance is one of the most important activities in software development and maintenance. Software source codes are modified via change requests, functional improvement, and refactoring. When software changes, it is difficult to define the scope of test cases, and software testing costs tend to increase to maintain software quality. Therefore, change analysis is a challenge, and static testing is a key solution to this challenge. In this study, we propose new static testing metrics using mathematical graph analysis techniques for the control flow graph generated from the three-address code of the implementation codes based on our hypothesis of the existing correlation between the graph features and any software bugs. Five graph features are strongly correlated with the software bugs. Hence, our bug prediction model exhibits a better performance of 0.25 FN, 0.04 TN ratio, and 0.08 precision than a model based on the traditional bug prediction metrics, which are complexity, line of code (steps), and CRUD.

质量保证是软件开发和维护中最重要的活动之一。通过变更请求、功能改进和重构来修改软件源代码。当软件变更时，很难定义测试用例的范围，并且软件测试成本倾向于增加以保持软件质量。因此，变更分析是一个挑战，而静态测试是这个挑战的关键解决方案。在本研究中，我们基于图形特征与任何软件缺陷之间存在相关性的假设，对由实现代码的三地址码生成的控制流图，使用数学图分析技术提出了新的静态测试度量。五个图形特征与软件缺陷密切相关。因此，我们的bug预测模型比基于传统的bug预测指标(复杂度、代码行数(步骤)和CRUD)的模型表现出更好的性能，分别为0.25 FN、0.04 TN比率和0.08精度。

引用次数: 0

Experience of Combinatorial Testing toward Fault Detection, Isolation and Recovery Functionality 面向故障检测、隔离和恢复功能的组合测试经验

2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Pub Date : 2022-04-01 DOI: 10.1109/ICSTW55395.2022.00025

Naoko Okubo, Shoma Takatsuki, Yasushi Ueda

The functionality of Fault Detection, Isolation, and Recovery (FDIR) is a key factor in achieving the high reliability of space systems. The test suites for the FDIR functionality in JAXA’s space systems are manually designed by expert engineers with decades of experience to achieve as high combination coverage with a small test suite as possible. However, there are only a few engineers who can perform such ad-hoc test suite design. Therefore, FDIR functionality testing requires a supportive method to generate a test suite with the high combination coverage with the smallest size that can be executed in the development timescale. In this paper, we describe our experience in applying popular combinatorial testing techniques to generate the real-world earth-observation satellite’s FDIR functionality test suites and comparing them with conventional human-derived test suite. The purpose of this comparison is to check the capability of the existing combinatorial testing methods toward FDIR functionality testing. Here, the FDIR functionality testing were treated as combinatorial configuration testing. As a result, we found that the 2-way coverage rate by the human, PICT, ACTS and the HAYST method were 72.7%, 66.3%, 68.8% and 72.2% with 16, 10, 10 and 14 test cases, respectively.

故障检测、隔离和恢复(FDIR)功能是实现空间系统高可靠性的关键因素。JAXA空间系统中FDIR功能的测试套件是由具有数十年经验的专家工程师手动设计的，以便在一个小测试套件中实现尽可能高的组合覆盖率。然而，只有少数工程师能够执行这种特别的测试套件设计。因此，FDIR功能测试需要一种支持性的方法来生成具有高组合覆盖率的测试套件，并且可以在开发时间尺度中执行最小的规模。在本文中，我们描述了我们应用流行的组合测试技术来生成真实地球观测卫星的FDIR功能测试套件的经验，并将其与传统的人类衍生测试套件进行了比较。这种比较的目的是检查现有的组合测试方法对FDIR功能测试的能力。在这里，FDIR功能测试被视为组合配置测试。结果发现，human、PICT、ACTS和HAYST方法的双向覆盖率分别为72.7%、66.3%、68.8%和72.2%，分别为16例、10例、10例和14例。

{"title":"Experience of Combinatorial Testing toward Fault Detection, Isolation and Recovery Functionality","authors":"Naoko Okubo, Shoma Takatsuki, Yasushi Ueda","doi":"10.1109/ICSTW55395.2022.00025","DOIUrl":"https://doi.org/10.1109/ICSTW55395.2022.00025","url":null,"abstract":"The functionality of Fault Detection, Isolation, and Recovery (FDIR) is a key factor in achieving the high reliability of space systems. The test suites for the FDIR functionality in JAXA’s space systems are manually designed by expert engineers with decades of experience to achieve as high combination coverage with a small test suite as possible. However, there are only a few engineers who can perform such ad-hoc test suite design. Therefore, FDIR functionality testing requires a supportive method to generate a test suite with the high combination coverage with the smallest size that can be executed in the development timescale. In this paper, we describe our experience in applying popular combinatorial testing techniques to generate the real-world earth-observation satellite’s FDIR functionality test suites and comparing them with conventional human-derived test suite. The purpose of this comparison is to check the capability of the existing combinatorial testing methods toward FDIR functionality testing. Here, the FDIR functionality testing were treated as combinatorial configuration testing. As a result, we found that the 2-way coverage rate by the human, PICT, ACTS and the HAYST method were 72.7%, 66.3%, 68.8% and 72.2% with 16, 10, 10 and 14 test cases, respectively.","PeriodicalId":147133,"journal":{"name":"2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126481602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combination Frequency Differencing for Identifying Design Weaknesses in Physical Unclonable Functions 组合频差法识别物理不可克隆功能的设计弱点

2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Pub Date : 2022-04-01 DOI: 10.1109/ICSTW55395.2022.00032

D. Kuhn, M. Raunak, Charles B. Prado, Vinay C. Patil, R. Kacker

Combinatorial coverage measures have been defined and applied to a wide range of problems. These methods have been developed using measures that depend on the inclusion or absence of t-tuples of values in inputs and test cases. We extend these coverage measures to include the frequency of occurrence of combinations, in an approach that we refer to as combination frequency differencing (CFD). This method is particularly suited to artificial intelligence and machine learning (AI/ML) applications, where training data sets used in learning systems are dependent on the prevalence of various attributes of elements of class and non-class sets. We illustrate the use of this method by applying it to analyzing the susceptibility of physical unclonable functions (PUFs) to machine learning attacks. Preliminary results suggest that the method may be useful for identifying bit combinations that have a disproportionately strong influence on PUF response bit values.

组合覆盖度量已被定义并应用于广泛的问题。这些方法是使用依赖于输入和测试用例中值的t元组的包含或缺少的度量来开发的。我们扩展了这些覆盖措施，以包括组合发生的频率，我们将这种方法称为组合频率差异(CFD)。这种方法特别适用于人工智能和机器学习(AI/ML)应用，在这些应用中，学习系统中使用的训练数据集依赖于类和非类集元素的各种属性的流行程度。我们通过将这种方法应用于分析物理不可克隆函数(puf)对机器学习攻击的敏感性来说明这种方法的使用。初步结果表明，该方法可用于识别对PUF响应位值有不成比例的强烈影响的位组合。

引用次数: 2

RQCODE – Towards Object-Oriented Requirements in the Software Security Domain 面向软件安全领域的面向对象需求

2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Pub Date : 2022-04-01 DOI: 10.1109/ICSTW55395.2022.00015

Ildar Nigmatullin, A. Sadovykh, Nan Messe, S. Ebersold, J. Bruel

For the last 20 years, the number of vulnerabilities has increased near 20 times, according to NIST statistics. Vulnerabilities expose companies to risks that may seriously threaten their operations. Therefore, for a long time, it has been suggested to apply security engineering – the process of accumulating multiple techniques and practices to ensure a sufficient level of security and to prevent vulnerabilities in the early stages of software development, including establishing security requirements and proper security testing. The informal nature of security requirements makes it uneasy to maintain system security, eliminate redundancy and trace requirements down to verification artifacts such as test cases. To deal with this problem, Seamless Object-Oriented Requirements (SOORs) promote incorporating formal requirements representations and verification means together into requirements classes.This article is a position paper that discusses opportunities to implement the Requirements as Code (RQCODE) concepts, SOORs in Java, applied to the Software Security domain. We argue that this concept has an elegance and the potential to raise the attention of developers since it combines a lightweight formalization of requirements through security tests with seamless integration with off-the-shelf development environments, including modern Continuous Integration/Delivery platforms. The benefits of this approach are yet to be demonstrated in further studies in the VeriDevOps project.

根据NIST的统计，在过去的20年里，漏洞的数量增加了近20倍。漏洞使公司暴露在可能严重威胁其运营的风险中。因此，很长一段时间以来，人们一直建议应用安全工程——在软件开发的早期阶段积累多种技术和实践以确保足够的安全性并防止漏洞的过程，包括建立安全需求和适当的安全测试。安全性需求的非正式性质使得维护系统安全性、消除冗余和跟踪需求直至验证工件(如测试用例)变得不容易。为了处理这个问题，无缝面向对象需求(SOORs)提倡将正式的需求表示和验证方法合并到需求类中。本文是一篇立场论文，讨论了将需求作为代码(RQCODE)概念，即Java中的SOORs，应用于软件安全领域的机会。我们认为这个概念很优雅，并且有可能引起开发人员的注意，因为它通过安全测试将需求的轻量级形式化结合起来，并与现成的开发环境无缝集成，包括现代的持续集成/交付平台。这种方法的好处还需要在VeriDevOps项目的进一步研究中得到证明。

{"title":"RQCODE – Towards Object-Oriented Requirements in the Software Security Domain","authors":"Ildar Nigmatullin, A. Sadovykh, Nan Messe, S. Ebersold, J. Bruel","doi":"10.1109/ICSTW55395.2022.00015","DOIUrl":"https://doi.org/10.1109/ICSTW55395.2022.00015","url":null,"abstract":"For the last 20 years, the number of vulnerabilities has increased near 20 times, according to NIST statistics. Vulnerabilities expose companies to risks that may seriously threaten their operations. Therefore, for a long time, it has been suggested to apply security engineering – the process of accumulating multiple techniques and practices to ensure a sufficient level of security and to prevent vulnerabilities in the early stages of software development, including establishing security requirements and proper security testing. The informal nature of security requirements makes it uneasy to maintain system security, eliminate redundancy and trace requirements down to verification artifacts such as test cases. To deal with this problem, Seamless Object-Oriented Requirements (SOORs) promote incorporating formal requirements representations and verification means together into requirements classes.This article is a position paper that discusses opportunities to implement the Requirements as Code (RQCODE) concepts, SOORs in Java, applied to the Software Security domain. We argue that this concept has an elegance and the potential to raise the attention of developers since it combines a lightweight formalization of requirements through security tests with seamless integration with off-the-shelf development environments, including modern Continuous Integration/Delivery platforms. The benefits of this approach are yet to be demonstrated in further studies in the VeriDevOps project.","PeriodicalId":147133,"journal":{"name":"2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126520747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Identifying Randomness related Flaky Tests through Divergence and Execution Tracing 通过发散和执行跟踪识别与随机性相关的片状测试

2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Pub Date : 2022-04-01 DOI: 10.1109/ICSTW55395.2022.00057

Azeem Ahmad, Erik Norrestam Held, O. Leifler, K. Sandahl

Developers often spend time to determine whether test case failures are real failures or flaky. The flaky tests, known as non-deterministic tests, change their outcomes without any changes in the codebase, thus reducing the trust of developers during a software release as well as in the quality of a product. While rerunning test cases is a common approach, it is resource intensive, unreliable, and does not uncover the actual cause of test flakiness. Our paper evaluates an approach to identify randomness-related flaky. This paper used a divergence algorithm and execution tracing techniques to identify flaky tests, which resulted in the FlakyPy prototype. In addition, this paper discusses the cases where FlakyPy successfully identified the flaky test as well as those cases where FlakyPy failed. The papers discuss how the reporting mechanism of FlakyPy can help developers in identifying the root cause of randomness-related test flakiness. Thirty-two open-source projects were used in this. We concluded that FlakyPy can detect most of the randomness-related test flakiness. In addition, the reporting mechanism of FlakyPy reveals sufficient information about possible root causes of test flakiness.

开发人员经常花费时间来确定测试用例失败是真正的失败还是偶然的失败。不稳定的测试，被称为非确定性测试，在不改变代码库的情况下改变它们的结果，因此在软件发布期间降低了开发人员对产品质量的信任。虽然重新运行测试用例是一种常见的方法，但它是资源密集型的，不可靠的，并且不能揭示测试不稳定的实际原因。我们的论文评估了一种识别随机相关薄片的方法。本文使用发散算法和执行跟踪技术来识别片状测试，从而产生了FlakyPy原型。此外，本文还讨论了FlakyPy成功识别flaky测试的情况以及FlakyPy失败的情况。本文讨论了FlakyPy的报告机制如何帮助开发人员识别与随机性相关的测试漏洞的根本原因。其中使用了32个开源项目。我们得出结论，FlakyPy可以检测到大多数与随机性相关的测试碎片。此外，FlakyPy的报告机制揭示了关于测试脆弱的可能根本原因的足够信息。

{"title":"Identifying Randomness related Flaky Tests through Divergence and Execution Tracing","authors":"Azeem Ahmad, Erik Norrestam Held, O. Leifler, K. Sandahl","doi":"10.1109/ICSTW55395.2022.00057","DOIUrl":"https://doi.org/10.1109/ICSTW55395.2022.00057","url":null,"abstract":"Developers often spend time to determine whether test case failures are real failures or flaky. The flaky tests, known as non-deterministic tests, change their outcomes without any changes in the codebase, thus reducing the trust of developers during a software release as well as in the quality of a product. While rerunning test cases is a common approach, it is resource intensive, unreliable, and does not uncover the actual cause of test flakiness. Our paper evaluates an approach to identify randomness-related flaky. This paper used a divergence algorithm and execution tracing techniques to identify flaky tests, which resulted in the FlakyPy prototype. In addition, this paper discusses the cases where FlakyPy successfully identified the flaky test as well as those cases where FlakyPy failed. The papers discuss how the reporting mechanism of FlakyPy can help developers in identifying the root cause of randomness-related test flakiness. Thirty-two open-source projects were used in this. We concluded that FlakyPy can detect most of the randomness-related test flakiness. In addition, the reporting mechanism of FlakyPy reveals sufficient information about possible root causes of test flakiness.","PeriodicalId":147133,"journal":{"name":"2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131890704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Re-visiting the coupling between mutants and real faults with Defects4J 2.0 用缺陷4j 2.0重新审视突变和实际错误之间的耦合

2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Pub Date : 2022-04-01 DOI: 10.1109/ICSTW55395.2022.00042

Thomas Laurent, Stephen Gaffney, Anthony Ventresque

Mutation analysis is a well-known testing criterion that involves seeding changes in the system under test, i.e. creating mutants, to simulate faults, and measuring the capacity of the test suite to detect these changes. The question of whether real faults are coupled with the mutants is central, as it determines whether tests that detect the mutants will also detect faults that actually occur in code, making the mutants reasonable test requirements. Prior work has explored this question, notably using the Defects4J dataset in Java. As the dataset and the mutation tools used in these prior works have evolved, this work re-visits this question using the newest available versions in order to strengthen and extend prior results. In this work we use 337 real faults from 15 different projects in the Defects4J 2.0.0 dataset, 2,828 test suites, and two well-known Java mutation testing tools (Major and Pitest) to explore (i) to what extent real faults are coupled with mutants, (ii) how both tools compare in terms of producing mutants coupled with faults, (iii) the characteristics of the mutants that are coupled with real faults, and (iv) the characteristics of faults not coupled with the mutants. Most (80.7%) of the faults used were coupled with at least one mutant created by Pitest or Major, most often with mutants created by both tools. All operators used produced a low (<4%) proportion of coupled mutants, although some operators are exclusively coupled to more faults, i.e. coupled to faults where no other operator produces coupled mutants. Finally, faults not coupled with any mutants usually had small fix patches, and although the code related to these faults was mostly affected by the mutation operators used the mutants produces were still not coupled. Results confirm previous findings showing that the coupling effect mostly holds but that additional mutation operators are needed to capture all faults.

突变分析是一个众所周知的测试标准，它包括在被测系统中播种变化，即创建突变，模拟故障，以及测量测试套件检测这些变化的能力。真正的错误是否与突变相耦合的问题是中心问题，因为它决定了检测突变的测试是否也将检测代码中实际发生的错误，从而使突变成为合理的测试需求。以前的工作已经探索了这个问题，特别是使用Java中的缺陷4j数据集。随着这些先前工作中使用的数据集和突变工具的发展，本工作使用最新的可用版本重新访问这个问题，以加强和扩展先前的结果。337年这项工作我们使用真正的错误来自15个不同的项目Defects4J 2.0.0数据集,2828测试套件,和两个著名的Java突变测试工具(主要和坑)探索(i)在多大程度上真正的错误是再加上突变体,(ii)这两个工具都比较而言,如何产生突变体加上错误,(3)突变体的特点,再加上真正的错误,及(iv)缺点不加上突变体的特点。大多数(80.7%)使用的故障与至少一个由Pitest或Major产生的突变相耦合，最常见的是由两个工具产生的突变。使用的所有操作符产生的耦合突变的比例都很低(<4%)，尽管一些操作符只耦合到更多的故障，即耦合到没有其他操作符产生耦合突变的故障。最后，不与任何突变耦合的故障通常具有较小的修复补丁，尽管与这些故障相关的代码主要受到所使用的突变算子的影响，但突变产生的代码仍然不耦合。结果证实了先前的研究结果，表明耦合效应主要成立，但需要额外的突变算子来捕获所有故障。

{"title":"Re-visiting the coupling between mutants and real faults with Defects4J 2.0","authors":"Thomas Laurent, Stephen Gaffney, Anthony Ventresque","doi":"10.1109/ICSTW55395.2022.00042","DOIUrl":"https://doi.org/10.1109/ICSTW55395.2022.00042","url":null,"abstract":"Mutation analysis is a well-known testing criterion that involves seeding changes in the system under test, i.e. creating mutants, to simulate faults, and measuring the capacity of the test suite to detect these changes. The question of whether real faults are coupled with the mutants is central, as it determines whether tests that detect the mutants will also detect faults that actually occur in code, making the mutants reasonable test requirements. Prior work has explored this question, notably using the Defects4J dataset in Java. As the dataset and the mutation tools used in these prior works have evolved, this work re-visits this question using the newest available versions in order to strengthen and extend prior results. In this work we use 337 real faults from 15 different projects in the Defects4J 2.0.0 dataset, 2,828 test suites, and two well-known Java mutation testing tools (Major and Pitest) to explore (i) to what extent real faults are coupled with mutants, (ii) how both tools compare in terms of producing mutants coupled with faults, (iii) the characteristics of the mutants that are coupled with real faults, and (iv) the characteristics of faults not coupled with the mutants. Most (80.7%) of the faults used were coupled with at least one mutant created by Pitest or Major, most often with mutants created by both tools. All operators used produced a low (<4%) proportion of coupled mutants, although some operators are exclusively coupled to more faults, i.e. coupled to faults where no other operator produces coupled mutants. Finally, faults not coupled with any mutants usually had small fix patches, and although the code related to these faults was mostly affected by the mutation operators used the mutants produces were still not coupled. Results confirm previous findings showing that the coupling effect mostly holds but that additional mutation operators are needed to capture all faults.","PeriodicalId":147133,"journal":{"name":"2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133305643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Security Testing as part of Software Quality Assurance: Principles and Challenges 作为软件质量保证一部分的安全测试:原则和挑战

2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Pub Date : 2022-04-01 DOI: 10.1109/ICSTW55395.2022.00019

Wissam Mallouli

Software quality assurance (SQA) is a means and practice of monitoring the software engineering processes and methods used in a project to ensure proper quality of the software. It encompasses the entire software development life-cycle, including requirements engineering, software design, coding, source code reviews, software configuration management, testing , release management, software deployment and software integration. It is organized into goals, commitments, abilities, activities, measurements, verification and validation. In this talk, we will mainly focus on the testing activity part of the software development life-cycle. Its main objective is checking that software is satisfying a set of quality properties that are identified by the "ISO/IEC 25010:2011 System and Software Quality Model" standard [1] .

软件质量保证(SQA)是一种监视项目中使用的软件工程过程和方法的手段和实践，以确保软件的适当质量。它包含了整个软件开发生命周期，包括需求工程、软件设计、编码、源代码审查、软件配置管理、测试、发布管理、软件部署和软件集成。它被组织成目标、承诺、能力、活动、度量、验证和确认。在这次演讲中，我们将主要关注软件开发生命周期中的测试活动部分。其主要目标是检查软件是否满足由“ISO/IEC 25010:2011系统和软件质量模型”标准[1]确定的一组质量属性。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀