首页 > 最新文献

Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis最新文献

英文 中文
Search-based test and improvement of machine-learning-based anomaly detection systems 基于搜索的异常检测系统的测试与改进
Maxime Cordy, S. Muller, Mike Papadakis, Yves Le Traon
Machine-learning-based anomaly detection systems can be vulnerable to new kinds of deceptions, known as training attacks, which exploit the live learning mechanism of these systems by progressively injecting small portions of abnormal data. The injected data seamlessly swift the learned states to a point where harmful data can pass unnoticed. We focus on the systematic testing of these attacks in the context of intrusion detection systems (IDS). We propose a search-based approach to test IDS by making training attacks. Going a step further, we also propose searching for countermeasures, learning from the successful attacks and thereby increasing the resilience of the tested IDS. We evaluate our approach on a denial-of-service attack detection scenario and a dataset recording the network traffic of a real-world system. Our experiments show that our search-based attack scheme generates successful attacks bypassing the current state-of-the-art defences. We also show that our approach is capable of generating attack patterns for all configuration states of the studied IDS and that it is capable of providing appropriate countermeasures. By co-evolving our attack and defence mechanisms we succeeded at improving the defence of the IDS under test by making it resilient to 49 out of 50 independently generated attacks.
基于机器学习的异常检测系统可能容易受到新的欺骗,即所谓的训练攻击,这种攻击通过逐步注入少量异常数据来利用这些系统的实时学习机制。注入的数据无缝地加速了学习状态,使有害数据可以不被注意到。我们的重点是在入侵检测系统(IDS)的背景下对这些攻击进行系统测试。我们提出了一种基于搜索的方法,通过训练攻击来测试IDS。更进一步,我们还建议寻找对策,从成功的攻击中学习,从而提高被测试IDS的弹性。我们在拒绝服务攻击检测场景和记录真实系统网络流量的数据集上评估了我们的方法。我们的实验表明,我们的基于搜索的攻击方案可以成功地绕过当前最先进的防御。我们还表明,我们的方法能够为所研究的IDS的所有配置状态生成攻击模式,并且能够提供适当的对策。通过共同进化我们的攻击和防御机制,我们成功地提高了被测IDS的防御能力,使其能够抵御50个独立产生的攻击中的49个。
{"title":"Search-based test and improvement of machine-learning-based anomaly detection systems","authors":"Maxime Cordy, S. Muller, Mike Papadakis, Yves Le Traon","doi":"10.1145/3293882.3330580","DOIUrl":"https://doi.org/10.1145/3293882.3330580","url":null,"abstract":"Machine-learning-based anomaly detection systems can be vulnerable to new kinds of deceptions, known as training attacks, which exploit the live learning mechanism of these systems by progressively injecting small portions of abnormal data. The injected data seamlessly swift the learned states to a point where harmful data can pass unnoticed. We focus on the systematic testing of these attacks in the context of intrusion detection systems (IDS). We propose a search-based approach to test IDS by making training attacks. Going a step further, we also propose searching for countermeasures, learning from the successful attacks and thereby increasing the resilience of the tested IDS. We evaluate our approach on a denial-of-service attack detection scenario and a dataset recording the network traffic of a real-world system. Our experiments show that our search-based attack scheme generates successful attacks bypassing the current state-of-the-art defences. We also show that our approach is capable of generating attack patterns for all configuration states of the studied IDS and that it is capable of providing appropriate countermeasures. By co-evolving our attack and defence mechanisms we succeeded at improving the defence of the IDS under test by making it resilient to 49 out of 50 independently generated attacks.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89792955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
On the correctness of GPU programs 关于GPU程序的正确性
Chao Peng
Testing is an important and challenging part of software development and its effectiveness depends on the quality of test cases. However, there exists no means of measuring quality of tests developed for GPU programs and as a result, no test case generation techniques for GPU programs aiming at high test effectiveness. Existing criteria for sequential and multithreaded CPU programs cannot be directly applied to GPU programs as GPU follows a completely different memory and execution model. We surveyed existing work on GPU program verification and bug fixes of open source GPU programs. Based on our findings, we define barrier, branch and loop coverage criteria and propose a set of mutation operators to measure fault finding capabilities of test cases. CLTestCheck, a framework for measuring quality of tests developed for GPU programs by code coverage analysis, fault seeding and work-group schedule amplification has been developed and evaluated using industry standard benchmarks. Experiments show that the framework is able to automatically measure test effectiveness and reveal unusual behaviours. Our planned work includes data flow coverage adopted for GPU programs to probe the underlying cause of unusual kernel behaviours and a more comprehensive work-group scheduler. We also plan to design and develop an automatic test case generator aiming at generating high quality test suites for GPU programs.
测试是软件开发的一个重要且具有挑战性的部分,其有效性取决于测试用例的质量。然而,不存在衡量为GPU程序开发的测试质量的方法,因此,没有针对GPU程序的测试用例生成技术,旨在提高测试效率。现有的顺序和多线程CPU程序的标准不能直接应用于GPU程序,因为GPU遵循完全不同的内存和执行模型。我们调查了GPU程序验证和开源GPU程序错误修复的现有工作。基于我们的发现,我们定义了屏障、分支和环路覆盖标准,并提出了一组突变算子来度量测试用例的故障查找能力。CLTestCheck是一个通过代码覆盖分析、故障播种和工作组时间表放大来衡量为GPU程序开发的测试质量的框架,已经开发并使用行业标准基准进行评估。实验表明,该框架能够自动度量测试的有效性并揭示异常行为。我们计划的工作包括GPU程序采用的数据流覆盖,以探测异常内核行为的潜在原因,以及更全面的工作组调度器。我们还计划设计和开发一个自动测试用例生成器,旨在为GPU程序生成高质量的测试套件。
{"title":"On the correctness of GPU programs","authors":"Chao Peng","doi":"10.1145/3293882.3338989","DOIUrl":"https://doi.org/10.1145/3293882.3338989","url":null,"abstract":"Testing is an important and challenging part of software development and its effectiveness depends on the quality of test cases. However, there exists no means of measuring quality of tests developed for GPU programs and as a result, no test case generation techniques for GPU programs aiming at high test effectiveness. Existing criteria for sequential and multithreaded CPU programs cannot be directly applied to GPU programs as GPU follows a completely different memory and execution model. We surveyed existing work on GPU program verification and bug fixes of open source GPU programs. Based on our findings, we define barrier, branch and loop coverage criteria and propose a set of mutation operators to measure fault finding capabilities of test cases. CLTestCheck, a framework for measuring quality of tests developed for GPU programs by code coverage analysis, fault seeding and work-group schedule amplification has been developed and evaluated using industry standard benchmarks. Experiments show that the framework is able to automatically measure test effectiveness and reveal unusual behaviours. Our planned work includes data flow coverage adopted for GPU programs to probe the underlying cause of unusual kernel behaviours and a more comprehensive work-group scheduler. We also plan to design and develop an automatic test case generator aiming at generating high quality test suites for GPU programs.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76776811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CTRAS: a tool for aggregating and summarizing crowdsourced test reports CTRAS:用于聚合和汇总众包测试报告的工具
Yuying Li, Rui Hao, Yang Feng, James A. Jones, Xiaofang Zhang, Zhenyu Chen
In this paper, we present CTRAS, a tool for automatically aggregating and summarizing duplicate crowdsourced test reports on the fly. CTRAS can automatically detect duplicates based on both textual information and the screenshots, and further aggregates and summarizes the duplicate test reports. CTRAS provides end users with a comprehensive and comprehensible understanding of all duplicates by identifying the main topics across the group of aggregated test reports and highlighting supplementary topics that are mentioned in subgroups of test reports. Also, it provides the classic tool of issue tracking systems, such as the project-report dashboard and keyword searching, and automates their classic functionalities, such as bug triaging and best fixer recommendation, to assist end users in managing and diagnosing test reports. Video: https://youtu.be/PNP10gKIPFs
在本文中,我们提出了CTRAS,一个自动聚合和总结重复众包测试报告的工具。CTRAS可以根据文本信息和屏幕截图自动检测重复的测试报告,并进一步汇总和总结重复的测试报告。CTRAS通过识别聚合测试报告组中的主要主题,并突出显示测试报告子组中提到的补充主题,为最终用户提供了对所有副本的全面和可理解的理解。此外,它还提供了问题跟踪系统的经典工具,例如项目报告仪表板和关键字搜索,并自动化了它们的经典功能,例如错误分类和最佳修复程序推荐,以帮助最终用户管理和诊断测试报告。的视频:https://youtu.be/PNP10gKIPFs
{"title":"CTRAS: a tool for aggregating and summarizing crowdsourced test reports","authors":"Yuying Li, Rui Hao, Yang Feng, James A. Jones, Xiaofang Zhang, Zhenyu Chen","doi":"10.1145/3293882.3339004","DOIUrl":"https://doi.org/10.1145/3293882.3339004","url":null,"abstract":"In this paper, we present CTRAS, a tool for automatically aggregating and summarizing duplicate crowdsourced test reports on the fly. CTRAS can automatically detect duplicates based on both textual information and the screenshots, and further aggregates and summarizes the duplicate test reports. CTRAS provides end users with a comprehensive and comprehensible understanding of all duplicates by identifying the main topics across the group of aggregated test reports and highlighting supplementary topics that are mentioned in subgroups of test reports. Also, it provides the classic tool of issue tracking systems, such as the project-report dashboard and keyword searching, and automates their classic functionalities, such as bug triaging and best fixer recommendation, to assist end users in managing and diagnosing test reports. Video: https://youtu.be/PNP10gKIPFs","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"43 8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77721027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mining Android crash fixes in the absence of issue- and change-tracking systems 在缺乏问题和变化跟踪系统的情况下挖掘Android崩溃修复
Pingfan Kong, Li Li, Jun Gao, Tegawendé F. Bissyandé, Jacques Klein
Android apps are prone to crash. This often arises from the misuse of Android framework APIs, making it harder to debug since official Android documentation does not discuss thoroughly potential exceptions.Recently, the program repair community has also started to investigate the possibility to fix crashes automatically. Current results, however, apply to limited example cases. In both scenarios of repair, the main issue is the need for more example data to drive the fix processes due to the high cost in time and effort needed to collect and identify fix examples. We propose in this work a scalable approach, CraftDroid, to mine crash fixes by leveraging a set of 28 thousand carefully reconstructed app lineages from app markets, without the need for the app source code or issue reports. We developed a replicative testing approach that locates fixes among app versions which output different runtime logs with the exact same test inputs. Overall, we have mined 104 relevant crash fixes, further abstracted 17 fine-grained fix templates that are demonstrated to be effective for patching crashed apks. Finally, we release ReCBench, a benchmark consisting of 200 crashed apks and the crash replication scripts, which the community can explore for evaluating generated crash-inducing bug patches.
安卓应用程序很容易崩溃。这通常是由于Android框架api的误用,使得调试变得更加困难,因为官方的Android文档并没有彻底讨论潜在的异常。最近,程序修复社区也开始研究自动修复崩溃的可能性。然而,目前的结果只适用于有限的实例。在这两种修复场景中,主要问题是需要更多的示例数据来驱动修复过程,因为收集和识别修复示例所需的时间和精力成本很高。在这项工作中,我们提出了一种可扩展的方法,CraftDroid,通过利用来自应用市场的28000个精心重建的应用程序血统来挖掘崩溃修复,而不需要应用程序源代码或问题报告。我们开发了一种复制测试方法,可以在输出不同运行时日志的应用版本之间定位修复,这些版本具有完全相同的测试输入。总的来说,我们已经挖掘了104个相关的崩溃修复程序,进一步抽象了17个细粒度的修复模板,这些模板被证明对修补崩溃的apks是有效的。最后,我们发布了ReCBench,这是一个由200个崩溃的apks和崩溃复制脚本组成的基准测试,社区可以利用它来评估生成的导致崩溃的错误补丁。
{"title":"Mining Android crash fixes in the absence of issue- and change-tracking systems","authors":"Pingfan Kong, Li Li, Jun Gao, Tegawendé F. Bissyandé, Jacques Klein","doi":"10.1145/3293882.3330572","DOIUrl":"https://doi.org/10.1145/3293882.3330572","url":null,"abstract":"Android apps are prone to crash. This often arises from the misuse of Android framework APIs, making it harder to debug since official Android documentation does not discuss thoroughly potential exceptions.Recently, the program repair community has also started to investigate the possibility to fix crashes automatically. Current results, however, apply to limited example cases. In both scenarios of repair, the main issue is the need for more example data to drive the fix processes due to the high cost in time and effort needed to collect and identify fix examples. We propose in this work a scalable approach, CraftDroid, to mine crash fixes by leveraging a set of 28 thousand carefully reconstructed app lineages from app markets, without the need for the app source code or issue reports. We developed a replicative testing approach that locates fixes among app versions which output different runtime logs with the exact same test inputs. Overall, we have mined 104 relevant crash fixes, further abstracted 17 fine-grained fix templates that are demonstrated to be effective for patching crashed apks. Finally, we release ReCBench, a benchmark consisting of 200 crashed apks and the crash replication scripts, which the community can explore for evaluating generated crash-inducing bug patches.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82323767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Improving random GUI testing with image-based widget detection 使用基于图像的小部件检测改进随机GUI测试
Thomas D. White, G. Fraser, Guy J. Brown
Graphical User Interfaces (GUIs) are amongst the most common user interfaces, enabling interactions with applications through mouse movements and key presses. Tools for automated testing of programs through their GUI exist, however they usually rely on operating system or framework specific knowledge to interact with an application. Due to frequent operating system updates, which can remove required information, and a large variety of different GUI frameworks using unique underlying data structures, such tools rapidly become obsolete, Consequently, for an automated GUI test generation tool, supporting many frameworks and operating systems is impractical. We propose a technique for improving GUI testing by automatically identifying GUI widgets in screen shots using machine learning techniques. As training data, we generate randomized GUIs to automatically extract widget information. The resulting model provides guidance to GUI testing tools in environments not currently supported by deriving GUI widget information from screen shots only. In our experiments, we found that identifying GUI widgets in screen shots and using this information to guide random testing achieved a significantly higher branch coverage in 18 of 20 applications, with an average increase of 42.5% when compared to conventional random testing.
图形用户界面(gui)是最常见的用户界面之一,支持通过鼠标移动和按键与应用程序进行交互。通过GUI对程序进行自动化测试的工具是存在的,但是它们通常依赖于操作系统或框架特定的知识来与应用程序交互。由于频繁的操作系统更新,可能会删除所需的信息,以及使用独特底层数据结构的各种不同的GUI框架,这些工具很快就会过时,因此,对于一个自动化的GUI测试生成工具,支持许多框架和操作系统是不切实际的。我们提出了一种通过使用机器学习技术自动识别屏幕截图中的GUI小部件来改进GUI测试的技术。作为训练数据,我们生成随机gui来自动提取小部件信息。结果模型为当前不支持的环境中的GUI测试工具提供了指导,这些环境仅从屏幕截图中获取GUI小部件信息。在我们的实验中,我们发现在屏幕截图中识别GUI小部件并使用这些信息来指导随机测试,在20个应用程序中的18个应用程序中获得了显著更高的分支覆盖率,与传统随机测试相比,平均增加了42.5%。
{"title":"Improving random GUI testing with image-based widget detection","authors":"Thomas D. White, G. Fraser, Guy J. Brown","doi":"10.1145/3293882.3330551","DOIUrl":"https://doi.org/10.1145/3293882.3330551","url":null,"abstract":"Graphical User Interfaces (GUIs) are amongst the most common user interfaces, enabling interactions with applications through mouse movements and key presses. Tools for automated testing of programs through their GUI exist, however they usually rely on operating system or framework specific knowledge to interact with an application. Due to frequent operating system updates, which can remove required information, and a large variety of different GUI frameworks using unique underlying data structures, such tools rapidly become obsolete, Consequently, for an automated GUI test generation tool, supporting many frameworks and operating systems is impractical. We propose a technique for improving GUI testing by automatically identifying GUI widgets in screen shots using machine learning techniques. As training data, we generate randomized GUIs to automatically extract widget information. The resulting model provides guidance to GUI testing tools in environments not currently supported by deriving GUI widget information from screen shots only. In our experiments, we found that identifying GUI widgets in screen shots and using this information to guide random testing achieved a significantly higher branch coverage in 18 of 20 applications, with an average increase of 42.5% when compared to conventional random testing.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85862991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Go-clone: graph-embedding based clone detector for Golang Go-clone:基于图嵌入的Golang克隆检测器
Cong Wang, Jian Gao, Yu Jiang, Zhenchang Xing, Huafeng Zhang, Weiliang Yin, M. Gu, Jiaguang Sun
Golang (short for Go programming language) is a fast and compiled language, which has been increasingly used in industry due to its excellent performance on concurrent programming. Golang redefines concurrent programming grammar, making it a challenge for traditional clone detection tools and techniques. However, there exist few tools for detecting duplicates or copy-paste related bugs in Golang. Therefore, an effective and efficient code clone detector on Golang is especially needed. In this paper, we present Go-Clone, a learning-based clone detector for Golang. Go-Clone contains two modules -- the training module and the user interaction module. In the training module, firstly we parse Golang source code into llvm IR (Intermediate Representation). Secondly, we calculate LSFG (labeled semantic flow graph) for each program function automatically. Go-Clone trains a deep neural network model to encode LSFGs for similarity classification. In the user interaction module, users can choose one or more Golang projects. Go-Clone identifies and presents a list of function pairs, which are most likely clone code for user inspection. To evaluate Go-Clone's performance, we collect 6,110 commit versions from 48 Github projects to construct a Golang clone detection data set. Go-Clone can reach the value of AUC (Area Under Curve) and ACC (Accuracy) for 89.61% and 83.80% in clone detection. By testing several groups of unfamiliar data, we also demonstrates the generility of Go-Clone. The address of the abstract demo video: https://youtu.be/o5DogtYGbeo
Golang (Go编程语言的简称)是一种快速编译的语言,由于其在并发编程方面的优异性能,在工业中得到了越来越多的应用。Golang重新定义了并发编程语法,这对传统的克隆检测工具和技术来说是一个挑战。然而,在Golang中很少有工具可以检测复制或复制粘贴相关的错误。因此,在Golang上开发一种高效的代码克隆检测器是非常必要的。在本文中,我们提出了Go-Clone,一个基于学习的Golang克隆检测器。Go-Clone包含两个模块——培训模块和用户交互模块。在训练模块中,我们首先将Golang源代码解析为llvm IR (Intermediate Representation)。其次,我们自动计算每个程序函数的标记语义流图(LSFG)。Go-Clone训练一个深度神经网络模型来编码LSFGs进行相似性分类。在用户交互模块中,用户可以选择一个或多个Golang项目。Go-Clone识别并呈现一个函数对列表,这些函数对最可能是用于用户检查的克隆代码。为了评估Go-Clone的性能,我们从48个Github项目中收集了6110个提交版本来构建Golang克隆检测数据集。Go-Clone在克隆检测中的AUC (Area Under Curve)和ACC (Accuracy)分别达到89.61%和83.80%。通过测试几组不熟悉的数据,我们也证明了Go-Clone的通用性。抽象演示视频的地址:https://youtu.be/o5DogtYGbeo
{"title":"Go-clone: graph-embedding based clone detector for Golang","authors":"Cong Wang, Jian Gao, Yu Jiang, Zhenchang Xing, Huafeng Zhang, Weiliang Yin, M. Gu, Jiaguang Sun","doi":"10.1145/3293882.3338996","DOIUrl":"https://doi.org/10.1145/3293882.3338996","url":null,"abstract":"Golang (short for Go programming language) is a fast and compiled language, which has been increasingly used in industry due to its excellent performance on concurrent programming. Golang redefines concurrent programming grammar, making it a challenge for traditional clone detection tools and techniques. However, there exist few tools for detecting duplicates or copy-paste related bugs in Golang. Therefore, an effective and efficient code clone detector on Golang is especially needed. In this paper, we present Go-Clone, a learning-based clone detector for Golang. Go-Clone contains two modules -- the training module and the user interaction module. In the training module, firstly we parse Golang source code into llvm IR (Intermediate Representation). Secondly, we calculate LSFG (labeled semantic flow graph) for each program function automatically. Go-Clone trains a deep neural network model to encode LSFGs for similarity classification. In the user interaction module, users can choose one or more Golang projects. Go-Clone identifies and presents a list of function pairs, which are most likely clone code for user inspection. To evaluate Go-Clone's performance, we collect 6,110 commit versions from 48 Github projects to construct a Golang clone detection data set. Go-Clone can reach the value of AUC (Area Under Curve) and ACC (Accuracy) for 89.61% and 83.80% in clone detection. By testing several groups of unfamiliar data, we also demonstrates the generility of Go-Clone. The address of the abstract demo video: https://youtu.be/o5DogtYGbeo","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88308318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A new dimension of test quality: assessing and generating higher quality unit test cases 测试质量的一个新维度:评估并生成更高质量的单元测试用例
Giovanni Grano
Unit tests form the first defensive line against the introduction of bugs in software systems. Therefore, their quality is of a paramount importance to produce robust and reliable software. To assess test quality, many organizations relies on metrics like code and mutation coverage. However, they are not always optimal to fulfill such a purpose. In my research, I want to make mutation testing scalable by devising a lightweight approach to estimate test effectiveness. Moreover, I plan to introduce a new metric measuring test focus—as a proxy for the effort needed by developers to understand and maintain a test— that both complements code coverage to assess test quality and can be used to drive automated test case generation of higher quality tests.
单元测试形成了防止软件系统中引入错误的第一道防线。因此,它们的质量对于生成健壮和可靠的软件至关重要。为了评估测试质量,许多组织依赖于像代码和突变覆盖这样的度量。然而,它们并不总是实现这一目的的最佳选择。在我的研究中,我想通过设计一种轻量级的方法来评估测试的有效性,从而使突变测试具有可伸缩性。此外,我计划引入一种新的度量测试焦点的度量标准——作为开发人员理解和维护测试所需工作量的代理——它既补充了代码覆盖率以评估测试质量,又可用于驱动更高质量测试的自动化测试用例生成。
{"title":"A new dimension of test quality: assessing and generating higher quality unit test cases","authors":"Giovanni Grano","doi":"10.1145/3293882.3338984","DOIUrl":"https://doi.org/10.1145/3293882.3338984","url":null,"abstract":"Unit tests form the first defensive line against the introduction of bugs in software systems. Therefore, their quality is of a paramount importance to produce robust and reliable software. To assess test quality, many organizations relies on metrics like code and mutation coverage. However, they are not always optimal to fulfill such a purpose. In my research, I want to make mutation testing scalable by devising a lightweight approach to estimate test effectiveness. Moreover, I plan to introduce a new metric measuring test focus—as a proxy for the effort needed by developers to understand and maintain a test— that both complements code coverage to assess test quality and can be used to drive automated test case generation of higher quality tests.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86913511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Root causing flaky tests in a large-scale industrial setting 在大规模工业环境中导致测试掉片的根本原因
Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, Suresh Thummalapenta
In today’s agile world, developers often rely on continuous integration pipelines to help build and validate their changes by executing tests in an efficient manner. One of the significant factors that hinder developers’ productivity is flaky tests—tests that may pass and fail with the same version of code. Since flaky test failures are not deterministically reproducible, developers often have to spend hours only to discover that the occasional failures have nothing to do with their changes. However, ignoring failures of flaky tests can be dangerous, since those failures may represent real faults in the production code. Furthermore, identifying the root cause of flakiness is tedious and cumbersome, since they are often a consequence of unexpected and non-deterministic behavior due to various factors, such as concurrency and external dependencies. As developers in a large-scale industrial setting, we first describe our experience with flaky tests by conducting a study on them. Our results show that although the number of distinct flaky tests may be low, the percentage of failing builds due to flaky tests can be substantial. To reduce the burden of flaky tests on developers, we describe our end-to-end framework that helps identify flaky tests and understand their root causes. Our framework instruments flaky tests and all relevant code to log various runtime properties, and then uses a preliminary tool, called RootFinder, to find differences in the logs of passing and failing runs. Using our framework, we collect and publicize a dataset of real-world, anonymized execution logs of flaky tests. By sharing the findings from our study, our framework and tool, and a dataset of logs, we hope to encourage more research on this important problem.
在当今的敏捷世界中,开发人员通常依赖于持续集成管道,通过以有效的方式执行测试来帮助构建和验证他们的更改。阻碍开发人员生产力的一个重要因素是不稳定的测试——对于同一版本的代码,测试可能通过,也可能失败。由于零星的测试失败不能确定地重现,开发人员经常不得不花费数小时才发现偶尔的失败与他们的更改无关。然而,忽略零散测试的失败可能是危险的,因为这些失败可能代表生产代码中的真正错误。此外,确定脆弱的根本原因是乏味和麻烦的,因为它们通常是由于各种因素(如并发性和外部依赖)导致的意外和不确定性行为的结果。作为大型工业环境中的开发人员,我们首先通过对它们进行研究来描述我们使用片状测试的经验。我们的结果表明,尽管不同的片状测试的数量可能很低,但是由于片状测试而导致的构建失败的百分比可能很大。为了减轻不稳定测试对开发人员的负担,我们描述了我们的端到端框架,该框架有助于识别不稳定的测试并了解其根本原因。我们的框架使用片状测试和所有相关代码来记录各种运行时属性,然后使用一个称为RootFinder的初步工具来查找通过和失败运行的日志中的差异。使用我们的框架,我们收集并发布真实世界的数据集,匿名的片状测试执行日志。通过分享我们的研究结果,我们的框架和工具,以及日志数据集,我们希望鼓励对这个重要问题进行更多的研究。
{"title":"Root causing flaky tests in a large-scale industrial setting","authors":"Wing Lam, Patrice Godefroid, Suman Nath, Anirudh Santhiar, Suresh Thummalapenta","doi":"10.1145/3293882.3330570","DOIUrl":"https://doi.org/10.1145/3293882.3330570","url":null,"abstract":"In today’s agile world, developers often rely on continuous integration pipelines to help build and validate their changes by executing tests in an efficient manner. One of the significant factors that hinder developers’ productivity is flaky tests—tests that may pass and fail with the same version of code. Since flaky test failures are not deterministically reproducible, developers often have to spend hours only to discover that the occasional failures have nothing to do with their changes. However, ignoring failures of flaky tests can be dangerous, since those failures may represent real faults in the production code. Furthermore, identifying the root cause of flakiness is tedious and cumbersome, since they are often a consequence of unexpected and non-deterministic behavior due to various factors, such as concurrency and external dependencies. As developers in a large-scale industrial setting, we first describe our experience with flaky tests by conducting a study on them. Our results show that although the number of distinct flaky tests may be low, the percentage of failing builds due to flaky tests can be substantial. To reduce the burden of flaky tests on developers, we describe our end-to-end framework that helps identify flaky tests and understand their root causes. Our framework instruments flaky tests and all relevant code to log various runtime properties, and then uses a preliminary tool, called RootFinder, to find differences in the logs of passing and failing runs. Using our framework, we collect and publicize a dataset of real-world, anonymized execution logs of flaky tests. By sharing the findings from our study, our framework and tool, and a dataset of logs, we hope to encourage more research on this important problem.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"80 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90366187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
LibID: reliable identification of obfuscated third-party Android libraries LibID:可靠地识别混淆的第三方Android库
Jiexin Zhang, A. Beresford, Stephan A. Kollmann
Third-party libraries are vital components of Android apps, yet they can also introduce serious security threats and impede the accuracy and reliability of app analysis tasks, such as app clone detection. Several library detection approaches have been proposed to address these problems. However, we show these techniques are not robust against popular code obfuscators, such as ProGuard, which is now used in nearly half of all apps. We then present LibID, a library detection tool that is more resilient to code shrinking and package modification than state-of-the-art tools. We show that the library identification problem can be formulated using binary integer programming models. LibID is able to identify specific versions of third-party libraries in candidate apps through static analysis of app binaries coupled with a database of third-party libraries. We propose a novel approach to generate synthetic apps to tune the detection thresholds. Then, we use F-Droid apps as the ground truth to evaluate LibID under different obfuscation settings, which shows that LibID is more robust to code obfuscators than state-of-the-art tools. Finally, we demonstrate the utility of LibID by detecting the use of a vulnerable version of the OkHttp library in nearly 10% of 3,958 most popular apps on the Google Play Store.
第三方库是Android应用的重要组成部分,但它们也可能带来严重的安全威胁,阻碍应用分析任务(如应用克隆检测)的准确性和可靠性。为了解决这些问题,已经提出了几种库检测方法。然而,我们表明这些技术对于流行的代码混淆器(如ProGuard)并不健壮,ProGuard现在在近一半的应用程序中使用。然后,我们提出LibID,这是一个库检测工具,它比最先进的工具更能适应代码收缩和包修改。我们证明了库识别问题可以用二进制整数规划模型来表述。LibID能够通过对应用程序二进制文件的静态分析以及第三方库数据库来识别候选应用程序中第三方库的特定版本。我们提出了一种新的方法来生成合成应用程序来调整检测阈值。然后,我们使用F-Droid应用程序作为基础真理来评估不同混淆设置下的LibID,这表明LibID对代码混淆的鲁棒性比最先进的工具更强。最后,我们通过检测Google Play Store上3958个最受欢迎的应用程序中近10%的OkHttp库的易受攻击版本来展示LibID的实用性。
{"title":"LibID: reliable identification of obfuscated third-party Android libraries","authors":"Jiexin Zhang, A. Beresford, Stephan A. Kollmann","doi":"10.1145/3293882.3330563","DOIUrl":"https://doi.org/10.1145/3293882.3330563","url":null,"abstract":"Third-party libraries are vital components of Android apps, yet they can also introduce serious security threats and impede the accuracy and reliability of app analysis tasks, such as app clone detection. Several library detection approaches have been proposed to address these problems. However, we show these techniques are not robust against popular code obfuscators, such as ProGuard, which is now used in nearly half of all apps. We then present LibID, a library detection tool that is more resilient to code shrinking and package modification than state-of-the-art tools. We show that the library identification problem can be formulated using binary integer programming models. LibID is able to identify specific versions of third-party libraries in candidate apps through static analysis of app binaries coupled with a database of third-party libraries. We propose a novel approach to generate synthetic apps to tune the detection thresholds. Then, we use F-Droid apps as the ground truth to evaluate LibID under different obfuscation settings, which shows that LibID is more robust to code obfuscators than state-of-the-art tools. Finally, we demonstrate the utility of LibID by detecting the use of a vulnerable version of the OkHttp library in nearly 10% of 3,958 most popular apps on the Google Play Store.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74906896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
CoCoTest: collaborative crowdsourced testing for Android applications CoCoTest: Android应用的协作众包测试
Haoyu Li, Chunrong Fang, Zhibin Wei, Zhenyu Chen
Testing Android applications is becoming more and more challenging due to the notorious fragmentation issues and the complexity of usage scenarios in different environments. Crowdsourced testing has grown as a trend, especially in mobile application testing. However, due to the lack of professionalism and communication, the crowd workers tend to submit low-quality and duplicate bug reports, leading to a waste of test resources on inspecting and aggregating such reports. To solve these problems, we developed a platform, CoCoTest, embracing the idea of collective intelligence. With the help of CoCoTest Android SDK, workers can efficiently capture a screenshot, write a short description and create a bug report. A series of bug reports are aggregated online and then recommended to the other workers in real time. The crowdsourced workers can (1) help review, verify and enrich each others' bug reports; (2) escape duplicate bug reports; (3) be guided to conduct more professional testing with the help of collective intelligence. CoCoTest can improve the quality of the final report and reduce test costs. The demo video can be found at https://youtu.be/PuVuPbNP4tY.
由于众所周知的碎片问题和不同环境下使用场景的复杂性,测试Android应用程序变得越来越具有挑战性。众包测试已经成为一种趋势,尤其是在移动应用测试领域。然而,由于缺乏专业精神和沟通,众工倾向于提交低质量和重复的bug报告,导致测试资源浪费在检查和汇总这些报告上。为了解决这些问题,我们开发了一个平台,CoCoTest,它包含了集体智慧的理念。在CoCoTest Android SDK的帮助下,工作人员可以高效地捕获屏幕截图,编写简短的描述并创建bug报告。一系列的错误报告被在线汇总,然后实时推荐给其他工作人员。众包工作者可以(1)帮助审核、验证和丰富彼此的bug报告;(2)避免重复的bug报告;(3)在集体智慧的引导下进行更专业的检测。CoCoTest可以提高最终报告的质量,降低测试成本。演示视频可以在https://youtu.be/PuVuPbNP4tY上找到。
{"title":"CoCoTest: collaborative crowdsourced testing for Android applications","authors":"Haoyu Li, Chunrong Fang, Zhibin Wei, Zhenyu Chen","doi":"10.1145/3293882.3339000","DOIUrl":"https://doi.org/10.1145/3293882.3339000","url":null,"abstract":"Testing Android applications is becoming more and more challenging due to the notorious fragmentation issues and the complexity of usage scenarios in different environments. Crowdsourced testing has grown as a trend, especially in mobile application testing. However, due to the lack of professionalism and communication, the crowd workers tend to submit low-quality and duplicate bug reports, leading to a waste of test resources on inspecting and aggregating such reports. To solve these problems, we developed a platform, CoCoTest, embracing the idea of collective intelligence. With the help of CoCoTest Android SDK, workers can efficiently capture a screenshot, write a short description and create a bug report. A series of bug reports are aggregated online and then recommended to the other workers in real time. The crowdsourced workers can (1) help review, verify and enrich each others' bug reports; (2) escape duplicate bug reports; (3) be guided to conduct more professional testing with the help of collective intelligence. CoCoTest can improve the quality of the final report and reduce test costs. The demo video can be found at https://youtu.be/PuVuPbNP4tY.","PeriodicalId":20624,"journal":{"name":"Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74171081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1