Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis最新文献

英文中文

Search-based test and improvement of machine-learning-based anomaly detection systems 基于搜索的异常检测系统的测试与改进

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2019-07-10 DOI: 10.1145/3293882.3330580

Maxime Cordy, S. Muller, Mike Papadakis, Yves Le Traon

Machine-learning-based anomaly detection systems can be vulnerable to new kinds of deceptions, known as training attacks, which exploit the live learning mechanism of these systems by progressively injecting small portions of abnormal data. The injected data seamlessly swift the learned states to a point where harmful data can pass unnoticed. We focus on the systematic testing of these attacks in the context of intrusion detection systems (IDS). We propose a search-based approach to test IDS by making training attacks. Going a step further, we also propose searching for countermeasures, learning from the successful attacks and thereby increasing the resilience of the tested IDS. We evaluate our approach on a denial-of-service attack detection scenario and a dataset recording the network traffic of a real-world system. Our experiments show that our search-based attack scheme generates successful attacks bypassing the current state-of-the-art defences. We also show that our approach is capable of generating attack patterns for all configuration states of the studied IDS and that it is capable of providing appropriate countermeasures. By co-evolving our attack and defence mechanisms we succeeded at improving the defence of the IDS under test by making it resilient to 49 out of 50 independently generated attacks.

基于机器学习的异常检测系统可能容易受到新的欺骗，即所谓的训练攻击，这种攻击通过逐步注入少量异常数据来利用这些系统的实时学习机制。注入的数据无缝地加速了学习状态，使有害数据可以不被注意到。我们的重点是在入侵检测系统(IDS)的背景下对这些攻击进行系统测试。我们提出了一种基于搜索的方法，通过训练攻击来测试IDS。更进一步，我们还建议寻找对策，从成功的攻击中学习，从而提高被测试IDS的弹性。我们在拒绝服务攻击检测场景和记录真实系统网络流量的数据集上评估了我们的方法。我们的实验表明，我们的基于搜索的攻击方案可以成功地绕过当前最先进的防御。我们还表明，我们的方法能够为所研究的IDS的所有配置状态生成攻击模式，并且能够提供适当的对策。通过共同进化我们的攻击和防御机制，我们成功地提高了被测IDS的防御能力，使其能够抵御50个独立产生的攻击中的49个。

{"title":"Search-based test and improvement of machine-learning-based anomaly detection systems","authors":"Maxime Cordy, S. Muller, Mike Papadakis, Yves Le Traon","doi":"10.1145/3293882.3330580","DOIUrl":"https://doi.org/10.1145/3293882.3330580","url":null,"abstract":"Machine-learning-based anomaly detection systems can be vulnerable to new kinds of deceptions, known as training attacks, which exploit the live learning mechanism of these systems by progressively injecting small portions of abnormal data. The injected data seamlessly swift the learned states to a point where harmful data can pass unnoticed. We focus on the systematic testing of these attacks in the context of intrusion detection systems (IDS). We propose a search-based approach to test IDS by making training attacks. Going a step further, we also propose searching for countermeasures, learning from the successful attacks and thereby increasing the resilience of the tested IDS. We evaluate our approach on a denial-of-service attack detection scenario and a dataset recording the network traffic of a real-world system. Our experiments show that our search-based attack scheme generates successful attacks bypassing the current state-of-the-art defences. We also show that our approach is capable of generating attack patterns for all configuration states of the studied IDS and that it is capable of providing appropriate countermeasures. By co-evolving our attack and defence mechanisms we succeeded at improving the defence of the IDS under test by making it resilient to 49 out of 50 independently generated attacks.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89792955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

On the correctness of GPU programs 关于GPU程序的正确性

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2019-07-10 DOI: 10.1145/3293882.3338989

Chao Peng

Testing is an important and challenging part of software development and its effectiveness depends on the quality of test cases. However, there exists no means of measuring quality of tests developed for GPU programs and as a result, no test case generation techniques for GPU programs aiming at high test effectiveness. Existing criteria for sequential and multithreaded CPU programs cannot be directly applied to GPU programs as GPU follows a completely different memory and execution model. We surveyed existing work on GPU program verification and bug fixes of open source GPU programs. Based on our findings, we define barrier, branch and loop coverage criteria and propose a set of mutation operators to measure fault finding capabilities of test cases. CLTestCheck, a framework for measuring quality of tests developed for GPU programs by code coverage analysis, fault seeding and work-group schedule amplification has been developed and evaluated using industry standard benchmarks. Experiments show that the framework is able to automatically measure test effectiveness and reveal unusual behaviours. Our planned work includes data flow coverage adopted for GPU programs to probe the underlying cause of unusual kernel behaviours and a more comprehensive work-group scheduler. We also plan to design and develop an automatic test case generator aiming at generating high quality test suites for GPU programs.

测试是软件开发的一个重要且具有挑战性的部分，其有效性取决于测试用例的质量。然而，不存在衡量为GPU程序开发的测试质量的方法，因此，没有针对GPU程序的测试用例生成技术，旨在提高测试效率。现有的顺序和多线程CPU程序的标准不能直接应用于GPU程序，因为GPU遵循完全不同的内存和执行模型。我们调查了GPU程序验证和开源GPU程序错误修复的现有工作。基于我们的发现，我们定义了屏障、分支和环路覆盖标准，并提出了一组突变算子来度量测试用例的故障查找能力。CLTestCheck是一个通过代码覆盖分析、故障播种和工作组时间表放大来衡量为GPU程序开发的测试质量的框架，已经开发并使用行业标准基准进行评估。实验表明，该框架能够自动度量测试的有效性并揭示异常行为。我们计划的工作包括GPU程序采用的数据流覆盖，以探测异常内核行为的潜在原因，以及更全面的工作组调度器。我们还计划设计和开发一个自动测试用例生成器，旨在为GPU程序生成高质量的测试套件。

{"title":"On the correctness of GPU programs","authors":"Chao Peng","doi":"10.1145/3293882.3338989","DOIUrl":"https://doi.org/10.1145/3293882.3338989","url":null,"abstract":"Testing is an important and challenging part of software development and its effectiveness depends on the quality of test cases. However, there exists no means of measuring quality of tests developed for GPU programs and as a result, no test case generation techniques for GPU programs aiming at high test effectiveness. Existing criteria for sequential and multithreaded CPU programs cannot be directly applied to GPU programs as GPU follows a completely different memory and execution model. We surveyed existing work on GPU program verification and bug fixes of open source GPU programs. Based on our findings, we define barrier, branch and loop coverage criteria and propose a set of mutation operators to measure fault finding capabilities of test cases. CLTestCheck, a framework for measuring quality of tests developed for GPU programs by code coverage analysis, fault seeding and work-group schedule amplification has been developed and evaluated using industry standard benchmarks. Experiments show that the framework is able to automatically measure test effectiveness and reveal unusual behaviours. Our planned work includes data flow coverage adopted for GPU programs to probe the underlying cause of unusual kernel behaviours and a more comprehensive work-group scheduler. We also plan to design and develop an automatic test case generator aiming at generating high quality test suites for GPU programs.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76776811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

CTRAS: a tool for aggregating and summarizing crowdsourced test reports CTRAS:用于聚合和汇总众包测试报告的工具

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2019-07-10 DOI: 10.1145/3293882.3339004

Yuying Li, Rui Hao, Yang Feng, James A. Jones, Xiaofang Zhang, Zhenyu Chen

In this paper, we present CTRAS, a tool for automatically aggregating and summarizing duplicate crowdsourced test reports on the fly. CTRAS can automatically detect duplicates based on both textual information and the screenshots, and further aggregates and summarizes the duplicate test reports. CTRAS provides end users with a comprehensive and comprehensible understanding of all duplicates by identifying the main topics across the group of aggregated test reports and highlighting supplementary topics that are mentioned in subgroups of test reports. Also, it provides the classic tool of issue tracking systems, such as the project-report dashboard and keyword searching, and automates their classic functionalities, such as bug triaging and best fixer recommendation, to assist end users in managing and diagnosing test reports. Video: https://youtu.be/PNP10gKIPFs

在本文中，我们提出了CTRAS，一个自动聚合和总结重复众包测试报告的工具。CTRAS可以根据文本信息和屏幕截图自动检测重复的测试报告，并进一步汇总和总结重复的测试报告。CTRAS通过识别聚合测试报告组中的主要主题，并突出显示测试报告子组中提到的补充主题，为最终用户提供了对所有副本的全面和可理解的理解。此外，它还提供了问题跟踪系统的经典工具，例如项目报告仪表板和关键字搜索，并自动化了它们的经典功能，例如错误分类和最佳修复程序推荐，以帮助最终用户管理和诊断测试报告。的视频:https://youtu.be/PNP10gKIPFs

引用次数: 1

Mining Android crash fixes in the absence of issue- and change-tracking systems 在缺乏问题和变化跟踪系统的情况下挖掘Android崩溃修复

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2019-07-10 DOI: 10.1145/3293882.3330572

Pingfan Kong, Li Li, Jun Gao, Tegawendé F. Bissyandé, Jacques Klein

Android apps are prone to crash. This often arises from the misuse of Android framework APIs, making it harder to debug since official Android documentation does not discuss thoroughly potential exceptions.Recently, the program repair community has also started to investigate the possibility to fix crashes automatically. Current results, however, apply to limited example cases. In both scenarios of repair, the main issue is the need for more example data to drive the fix processes due to the high cost in time and effort needed to collect and identify fix examples. We propose in this work a scalable approach, CraftDroid, to mine crash fixes by leveraging a set of 28 thousand carefully reconstructed app lineages from app markets, without the need for the app source code or issue reports. We developed a replicative testing approach that locates fixes among app versions which output different runtime logs with the exact same test inputs. Overall, we have mined 104 relevant crash fixes, further abstracted 17 fine-grained fix templates that are demonstrated to be effective for patching crashed apks. Finally, we release ReCBench, a benchmark consisting of 200 crashed apks and the crash replication scripts, which the community can explore for evaluating generated crash-inducing bug patches.

安卓应用程序很容易崩溃。这通常是由于Android框架api的误用，使得调试变得更加困难，因为官方的Android文档并没有彻底讨论潜在的异常。最近，程序修复社区也开始研究自动修复崩溃的可能性。然而，目前的结果只适用于有限的实例。在这两种修复场景中，主要问题是需要更多的示例数据来驱动修复过程，因为收集和识别修复示例所需的时间和精力成本很高。在这项工作中，我们提出了一种可扩展的方法，CraftDroid，通过利用来自应用市场的28000个精心重建的应用程序血统来挖掘崩溃修复，而不需要应用程序源代码或问题报告。我们开发了一种复制测试方法，可以在输出不同运行时日志的应用版本之间定位修复，这些版本具有完全相同的测试输入。总的来说，我们已经挖掘了104个相关的崩溃修复程序，进一步抽象了17个细粒度的修复模板，这些模板被证明对修补崩溃的apks是有效的。最后，我们发布了ReCBench，这是一个由200个崩溃的apks和崩溃复制脚本组成的基准测试，社区可以利用它来评估生成的导致崩溃的错误补丁。

{"title":"Mining Android crash fixes in the absence of issue- and change-tracking systems","authors":"Pingfan Kong, Li Li, Jun Gao, Tegawendé F. Bissyandé, Jacques Klein","doi":"10.1145/3293882.3330572","DOIUrl":"https://doi.org/10.1145/3293882.3330572","url":null,"abstract":"Android apps are prone to crash. This often arises from the misuse of Android framework APIs, making it harder to debug since official Android documentation does not discuss thoroughly potential exceptions.Recently, the program repair community has also started to investigate the possibility to fix crashes automatically. Current results, however, apply to limited example cases. In both scenarios of repair, the main issue is the need for more example data to drive the fix processes due to the high cost in time and effort needed to collect and identify fix examples. We propose in this work a scalable approach, CraftDroid, to mine crash fixes by leveraging a set of 28 thousand carefully reconstructed app lineages from app markets, without the need for the app source code or issue reports. We developed a replicative testing approach that locates fixes among app versions which output different runtime logs with the exact same test inputs. Overall, we have mined 104 relevant crash fixes, further abstracted 17 fine-grained fix templates that are demonstrated to be effective for patching crashed apks. Finally, we release ReCBench, a benchmark consisting of 200 crashed apks and the crash replication scripts, which the community can explore for evaluating generated crash-inducing bug patches.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82323767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Improving random GUI testing with image-based widget detection 使用基于图像的小部件检测改进随机GUI测试

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2019-07-10 DOI: 10.1145/3293882.3330551

Thomas D. White, G. Fraser, Guy J. Brown

Graphical User Interfaces (GUIs) are amongst the most common user interfaces, enabling interactions with applications through mouse movements and key presses. Tools for automated testing of programs through their GUI exist, however they usually rely on operating system or framework specific knowledge to interact with an application. Due to frequent operating system updates, which can remove required information, and a large variety of different GUI frameworks using unique underlying data structures, such tools rapidly become obsolete, Consequently, for an automated GUI test generation tool, supporting many frameworks and operating systems is impractical. We propose a technique for improving GUI testing by automatically identifying GUI widgets in screen shots using machine learning techniques. As training data, we generate randomized GUIs to automatically extract widget information. The resulting model provides guidance to GUI testing tools in environments not currently supported by deriving GUI widget information from screen shots only. In our experiments, we found that identifying GUI widgets in screen shots and using this information to guide random testing achieved a significantly higher branch coverage in 18 of 20 applications, with an average increase of 42.5% when compared to conventional random testing.

图形用户界面(gui)是最常见的用户界面之一，支持通过鼠标移动和按键与应用程序进行交互。通过GUI对程序进行自动化测试的工具是存在的，但是它们通常依赖于操作系统或框架特定的知识来与应用程序交互。由于频繁的操作系统更新，可能会删除所需的信息，以及使用独特底层数据结构的各种不同的GUI框架，这些工具很快就会过时，因此，对于一个自动化的GUI测试生成工具，支持许多框架和操作系统是不切实际的。我们提出了一种通过使用机器学习技术自动识别屏幕截图中的GUI小部件来改进GUI测试的技术。作为训练数据，我们生成随机gui来自动提取小部件信息。结果模型为当前不支持的环境中的GUI测试工具提供了指导，这些环境仅从屏幕截图中获取GUI小部件信息。在我们的实验中，我们发现在屏幕截图中识别GUI小部件并使用这些信息来指导随机测试，在20个应用程序中的18个应用程序中获得了显著更高的分支覆盖率，与传统随机测试相比，平均增加了42.5%。

{"title":"Improving random GUI testing with image-based widget detection","authors":"Thomas D. White, G. Fraser, Guy J. Brown","doi":"10.1145/3293882.3330551","DOIUrl":"https://doi.org/10.1145/3293882.3330551","url":null,"abstract":"Graphical User Interfaces (GUIs) are amongst the most common user interfaces, enabling interactions with applications through mouse movements and key presses. Tools for automated testing of programs through their GUI exist, however they usually rely on operating system or framework specific knowledge to interact with an application. Due to frequent operating system updates, which can remove required information, and a large variety of different GUI frameworks using unique underlying data structures, such tools rapidly become obsolete, Consequently, for an automated GUI test generation tool, supporting many frameworks and operating systems is impractical. We propose a technique for improving GUI testing by automatically identifying GUI widgets in screen shots using machine learning techniques. As training data, we generate randomized GUIs to automatically extract widget information. The resulting model provides guidance to GUI testing tools in environments not currently supported by deriving GUI widget information from screen shots only. In our experiments, we found that identifying GUI widgets in screen shots and using this information to guide random testing achieved a significantly higher branch coverage in 18 of 20 applications, with an average increase of 42.5% when compared to conventional random testing.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85862991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

Go-clone: graph-embedding based clone detector for Golang Go-clone:基于图嵌入的Golang克隆检测器

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2019-07-10 DOI: 10.1145/3293882.3338996

Cong Wang, Jian Gao, Yu Jiang, Zhenchang Xing, Huafeng Zhang, Weiliang Yin, M. Gu, Jiaguang Sun

Golang (short for Go programming language) is a fast and compiled language, which has been increasingly used in industry due to its excellent performance on concurrent programming. Golang redefines concurrent programming grammar, making it a challenge for traditional clone detection tools and techniques. However, there exist few tools for detecting duplicates or copy-paste related bugs in Golang. Therefore, an effective and efficient code clone detector on Golang is especially needed. In this paper, we present Go-Clone, a learning-based clone detector for Golang. Go-Clone contains two modules -- the training module and the user interaction module. In the training module, firstly we parse Golang source code into llvm IR (Intermediate Representation). Secondly, we calculate LSFG (labeled semantic flow graph) for each program function automatically. Go-Clone trains a deep neural network model to encode LSFGs for similarity classification. In the user interaction module, users can choose one or more Golang projects. Go-Clone identifies and presents a list of function pairs, which are most likely clone code for user inspection. To evaluate Go-Clone's performance, we collect 6,110 commit versions from 48 Github projects to construct a Golang clone detection data set. Go-Clone can reach the value of AUC (Area Under Curve) and ACC (Accuracy) for 89.61% and 83.80% in clone detection. By testing several groups of unfamiliar data, we also demonstrates the generility of Go-Clone. The address of the abstract demo video: https://youtu.be/o5DogtYGbeo

Golang (Go编程语言的简称)是一种快速编译的语言，由于其在并发编程方面的优异性能，在工业中得到了越来越多的应用。Golang重新定义了并发编程语法，这对传统的克隆检测工具和技术来说是一个挑战。然而，在Golang中很少有工具可以检测复制或复制粘贴相关的错误。因此，在Golang上开发一种高效的代码克隆检测器是非常必要的。在本文中，我们提出了Go-Clone，一个基于学习的Golang克隆检测器。Go-Clone包含两个模块——培训模块和用户交互模块。在训练模块中，我们首先将Golang源代码解析为llvm IR (Intermediate Representation)。其次，我们自动计算每个程序函数的标记语义流图(LSFG)。Go-Clone训练一个深度神经网络模型来编码LSFGs进行相似性分类。在用户交互模块中，用户可以选择一个或多个Golang项目。Go-Clone识别并呈现一个函数对列表，这些函数对最可能是用于用户检查的克隆代码。为了评估Go-Clone的性能，我们从48个Github项目中收集了6110个提交版本来构建Golang克隆检测数据集。Go-Clone在克隆检测中的AUC (Area Under Curve)和ACC (Accuracy)分别达到89.61%和83.80%。通过测试几组不熟悉的数据，我们也证明了Go-Clone的通用性。抽象演示视频的地址:https://youtu.be/o5DogtYGbeo

{"title":"Go-clone: graph-embedding based clone detector for Golang","authors":"Cong Wang, Jian Gao, Yu Jiang, Zhenchang Xing, Huafeng Zhang, Weiliang Yin, M. Gu, Jiaguang Sun","doi":"10.1145/3293882.3338996","DOIUrl":"https://doi.org/10.1145/3293882.3338996","url":null,"abstract":"Golang (short for Go programming language) is a fast and compiled language, which has been increasingly used in industry due to its excellent performance on concurrent programming. Golang redefines concurrent programming grammar, making it a challenge for traditional clone detection tools and techniques. However, there exist few tools for detecting duplicates or copy-paste related bugs in Golang. Therefore, an effective and efficient code clone detector on Golang is especially needed. In this paper, we present Go-Clone, a learning-based clone detector for Golang. Go-Clone contains two modules -- the training module and the user interaction module. In the training module, firstly we parse Golang source code into llvm IR (Intermediate Representation). Secondly, we calculate LSFG (labeled semantic flow graph) for each program function automatically. Go-Clone trains a deep neural network model to encode LSFGs for similarity classification. In the user interaction module, users can choose one or more Golang projects. Go-Clone identifies and presents a list of function pairs, which are most likely clone code for user inspection. To evaluate Go-Clone's performance, we collect 6,110 commit versions from 48 Github projects to construct a Golang clone detection data set. Go-Clone can reach the value of AUC (Area Under Curve) and ACC (Accuracy) for 89.61% and 83.80% in clone detection. By testing several groups of unfamiliar data, we also demonstrates the generility of Go-Clone. The address of the abstract demo video: https://youtu.be/o5DogtYGbeo","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88308318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A new dimension of test quality: assessing and generating higher quality unit test cases 测试质量的一个新维度:评估并生成更高质量的单元测试用例

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2019-07-10 DOI: 10.1145/3293882.3338984

Giovanni Grano

Unit tests form the first defensive line against the introduction of bugs in software systems. Therefore, their quality is of a paramount importance to produce robust and reliable software. To assess test quality, many organizations relies on metrics like code and mutation coverage. However, they are not always optimal to fulfill such a purpose. In my research, I want to make mutation testing scalable by devising a lightweight approach to estimate test effectiveness. Moreover, I plan to introduce a new metric measuring test focus—as a proxy for the effort needed by developers to understand and maintain a test— that both complements code coverage to assess test quality and can be used to drive automated test case generation of higher quality tests.

单元测试形成了防止软件系统中引入错误的第一道防线。因此，它们的质量对于生成健壮和可靠的软件至关重要。为了评估测试质量，许多组织依赖于像代码和突变覆盖这样的度量。然而，它们并不总是实现这一目的的最佳选择。在我的研究中，我想通过设计一种轻量级的方法来评估测试的有效性，从而使突变测试具有可伸缩性。此外，我计划引入一种新的度量测试焦点的度量标准——作为开发人员理解和维护测试所需工作量的代理——它既补充了代码覆盖率以评估测试质量，又可用于驱动更高质量测试的自动化测试用例生成。

引用次数: 8

Root causing flaky tests in a large-scale industrial setting 在大规模工业环境中导致测试掉片的根本原因

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2019-07-10 DOI: 10.1145/3293882.3330570

Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, Suresh Thummalapenta

In today’s agile world, developers often rely on continuous integration pipelines to help build and validate their changes by executing tests in an efficient manner. One of the significant factors that hinder developers’ productivity is flaky tests—tests that may pass and fail with the same version of code. Since flaky test failures are not deterministically reproducible, developers often have to spend hours only to discover that the occasional failures have nothing to do with their changes. However, ignoring failures of flaky tests can be dangerous, since those failures may represent real faults in the production code. Furthermore, identifying the root cause of flakiness is tedious and cumbersome, since they are often a consequence of unexpected and non-deterministic behavior due to various factors, such as concurrency and external dependencies. As developers in a large-scale industrial setting, we first describe our experience with flaky tests by conducting a study on them. Our results show that although the number of distinct flaky tests may be low, the percentage of failing builds due to flaky tests can be substantial. To reduce the burden of flaky tests on developers, we describe our end-to-end framework that helps identify flaky tests and understand their root causes. Our framework instruments flaky tests and all relevant code to log various runtime properties, and then uses a preliminary tool, called RootFinder, to find differences in the logs of passing and failing runs. Using our framework, we collect and publicize a dataset of real-world, anonymized execution logs of flaky tests. By sharing the findings from our study, our framework and tool, and a dataset of logs, we hope to encourage more research on this important problem.

在当今的敏捷世界中，开发人员通常依赖于持续集成管道，通过以有效的方式执行测试来帮助构建和验证他们的更改。阻碍开发人员生产力的一个重要因素是不稳定的测试——对于同一版本的代码，测试可能通过，也可能失败。由于零星的测试失败不能确定地重现，开发人员经常不得不花费数小时才发现偶尔的失败与他们的更改无关。然而，忽略零散测试的失败可能是危险的，因为这些失败可能代表生产代码中的真正错误。此外，确定脆弱的根本原因是乏味和麻烦的，因为它们通常是由于各种因素(如并发性和外部依赖)导致的意外和不确定性行为的结果。作为大型工业环境中的开发人员，我们首先通过对它们进行研究来描述我们使用片状测试的经验。我们的结果表明，尽管不同的片状测试的数量可能很低，但是由于片状测试而导致的构建失败的百分比可能很大。为了减轻不稳定测试对开发人员的负担，我们描述了我们的端到端框架，该框架有助于识别不稳定的测试并了解其根本原因。我们的框架使用片状测试和所有相关代码来记录各种运行时属性，然后使用一个称为RootFinder的初步工具来查找通过和失败运行的日志中的差异。使用我们的框架，我们收集并发布真实世界的数据集，匿名的片状测试执行日志。通过分享我们的研究结果，我们的框架和工具，以及日志数据集，我们希望鼓励对这个重要问题进行更多的研究。

{"title":"Root causing flaky tests in a large-scale industrial setting","authors":"Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, Suresh Thummalapenta","doi":"10.1145/3293882.3330570","DOIUrl":"https://doi.org/10.1145/3293882.3330570","url":null,"abstract":"In today’s agile world, developers often rely on continuous integration pipelines to help build and validate their changes by executing tests in an efficient manner. One of the significant factors that hinder developers’ productivity is flaky tests—tests that may pass and fail with the same version of code. Since flaky test failures are not deterministically reproducible, developers often have to spend hours only to discover that the occasional failures have nothing to do with their changes. However, ignoring failures of flaky tests can be dangerous, since those failures may represent real faults in the production code. Furthermore, identifying the root cause of flakiness is tedious and cumbersome, since they are often a consequence of unexpected and non-deterministic behavior due to various factors, such as concurrency and external dependencies. As developers in a large-scale industrial setting, we first describe our experience with flaky tests by conducting a study on them. Our results show that although the number of distinct flaky tests may be low, the percentage of failing builds due to flaky tests can be substantial. To reduce the burden of flaky tests on developers, we describe our end-to-end framework that helps identify flaky tests and understand their root causes. Our framework instruments flaky tests and all relevant code to log various runtime properties, and then uses a preliminary tool, called RootFinder, to find differences in the logs of passing and failing runs. Using our framework, we collect and publicize a dataset of real-world, anonymized execution logs of flaky tests. By sharing the findings from our study, our framework and tool, and a dataset of logs, we hope to encourage more research on this important problem.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"80 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90366187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 92

From typestate verification to interpretable deep models (invited talk abstract) 从类型状态验证到可解释深度模型(特邀演讲摘要)

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2019-07-10 DOI: 10.1145/3293882.3338992

Eran Yahav, Stephen J. Fink, Nurit Dor, G. Ramalingam, E. Geay

The paper ``Effective Typestate Verification in the Presence of Aliasing'' was published in the International Symposium on Software Testing and Analysis (ISSTA) 2006 Proceedings, and has now been selected to receive the ISSTA 2019 Retrospective Impact Paper Award. The paper described a scalable framework for verification of typestate properties in real-world Java programs. The paper introduced several techniques that have been used widely in the static analysis of real-world programs. Specifically, it introduced an abstract domain combining access-paths, aliasing information, and typestate that turned out to be simple, powerful, and useful. We review the original paper and show the evolution of the ideas over the years. We show how some of these ideas have evolved into work on machine learning for code completion, and discuss recent general results in machine learning for programming.

论文《存在混叠的有效类型状态验证》发表在2006年国际软件测试与分析研讨会(ISSTA)论文集上，并被选为ISSTA 2019年回顾性影响论文奖。本文描述了一个可扩展的框架，用于验证真实Java程序中的类型状态属性。本文介绍了在实际程序的静态分析中广泛使用的几种技术。具体来说，它引入了一个抽象域，该域结合了访问路径、混叠信息和类型状态，结果证明它简单、强大且有用。我们回顾了原始论文，并展示了这些年来思想的演变。我们展示了其中一些想法是如何演变成用于代码完成的机器学习工作的，并讨论了最近机器学习用于编程的一般结果。

引用次数: 0

Optimal context-sensitive dynamic partial order reduction with observers 具有观察者的最优上下文敏感动态偏序约简

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2019-07-10 DOI: 10.1145/3293882.3330565

E. Albert, M. G. D. L. Banda, M. Gómez-Zamalloa, Miguel Isabel, Peter James Stuckey

Dynamic Partial Order Reduction (DPOR) algorithms are used in stateless model checking to avoid the exploration of equivalent execution sequences. DPOR relies on the notion of independence between execution steps to detect equivalence. Recent progress in the area has introduced more accurate ways to detect independence: Context-Sensitive DPOR considers two steps p and t independent in the current state if the states obtained by executing p · t and t · p are the same; Optimal DPOR with Observers makes their dependency conditional to the existence of future events that observe their operations. We introduce a new algorithm, Optimal Context-Sensitive DPOR with Observers, that combines these two notions of conditional independence, and goes beyond them by exploiting their synergies. Experimental evaluation shows that our gains increase exponentially with the size of the considered inputs.

在无状态模型检查中采用动态偏序约简算法，避免了对等效执行序列的探索。DPOR依赖于执行步骤之间的独立性概念来检测等价性。该领域的最新进展引入了更准确的独立性检测方法:如果执行p·t和t·p获得的状态相同，则上下文敏感DPOR认为当前状态下的两个步骤p和t是独立的;带有观察者的最优DPOR使它们的依赖性以观察其操作的未来事件的存在为条件。我们引入了一种新的算法，最优上下文敏感DPOR与观察者，它结合了这两个条件独立的概念，并通过利用它们的协同作用超越了它们。实验评估表明，我们的增益随着考虑的输入的大小呈指数增长。

引用次数: 10

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀