Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis最新文献

英文中文

Compiler fuzzing through deep learning 通过深度学习实现编译器模糊测试

Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2018-07-12 DOI: 10.1145/3213846.3213848

Chris Cummins, Pavlos Petoumenos, A. Murray, Hugh Leather

Random program generation — fuzzing — is an effective technique for discovering bugs in compilers but successful fuzzers require extensive development effort for every language supported by the compiler, and often leave parts of the language space untested. We introduce DeepSmith, a novel machine learning approach to accelerating compiler validation through the inference of generative models for compiler inputs. Our approach infers a learned model of the structure of real world code based on a large corpus of open source code. Then, it uses the model to automatically generate tens of thousands of realistic programs. Finally, we apply established differential testing methodologies on them to expose bugs in compilers. We apply our approach to the OpenCL programming language, automatically exposing bugs with little effort on our side. In 1,000 hours of automated testing of commercial and open source compilers, we discover bugs in all of them, submitting 67 bug reports. Our test cases are on average two orders of magnitude smaller than the state-of-the-art, require 3.03× less time to generate and evaluate, and expose bugs which the state-of-the-art cannot. Our random program generator, comprising only 500 lines of code, took 12 hours to train for OpenCL versus the state-of-the-art taking 9 man months to port from a generator for C and 50,000 lines of code. With 18 lines of code we extended our program generator to a second language, uncovering crashes in Solidity compilers in 12 hours of automated testing.

随机程序生成——模糊测试——是发现编译器中的bug的一种有效技术，但是成功的模糊测试需要针对编译器支持的每种语言进行大量的开发工作，并且通常会有部分语言空间未经过测试。我们介绍了DeepSmith，这是一种新的机器学习方法，通过对编译器输入的生成模型进行推理来加速编译器验证。我们的方法根据大量开源代码语料库推断出真实世界代码结构的学习模型。然后，利用该模型自动生成数以万计的逼真程序。最后，我们对它们应用已建立的差异测试方法来暴露编译器中的错误。我们将我们的方法应用于OpenCL编程语言，我们可以轻松地自动发现错误。在对商业和开源编译器进行了1000小时的自动化测试后，我们发现了所有这些编译器中的错误，提交了67个错误报告。我们的测试用例平均比当前状态小两个数量级，生成和评估所需的时间减少了3.03倍，并且暴露了当前状态无法暴露的bug。我们的随机程序生成器只包含500行代码，花了12个小时来训练OpenCL，而最先进的程序需要9个月才能从C生成器移植到50,000行代码。我们用18行代码将程序生成器扩展到第二种语言，在12小时的自动化测试中发现了Solidity编译器的崩溃。

{"title":"Compiler fuzzing through deep learning","authors":"Chris Cummins, Pavlos Petoumenos, A. Murray, Hugh Leather","doi":"10.1145/3213846.3213848","DOIUrl":"https://doi.org/10.1145/3213846.3213848","url":null,"abstract":"Random program generation — fuzzing — is an effective technique for discovering bugs in compilers but successful fuzzers require extensive development effort for every language supported by the compiler, and often leave parts of the language space untested. We introduce DeepSmith, a novel machine learning approach to accelerating compiler validation through the inference of generative models for compiler inputs. Our approach infers a learned model of the structure of real world code based on a large corpus of open source code. Then, it uses the model to automatically generate tens of thousands of realistic programs. Finally, we apply established differential testing methodologies on them to expose bugs in compilers. We apply our approach to the OpenCL programming language, automatically exposing bugs with little effort on our side. In 1,000 hours of automated testing of commercial and open source compilers, we discover bugs in all of them, submitting 67 bug reports. Our test cases are on average two orders of magnitude smaller than the state-of-the-art, require 3.03× less time to generate and evaluate, and expose bugs which the state-of-the-art cannot. Our random program generator, comprising only 500 lines of code, took 12 hours to train for OpenCL versus the state-of-the-art taking 9 man months to port from a generator for C and 50,000 lines of code. With 18 lines of code we extended our program generator to a second language, uncovering crashes in Solidity compilers in 12 hours of automated testing.","PeriodicalId":20542,"journal":{"name":"Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77655783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 108

Evaluating test-suite reduction in real software evolution 评估真实软件演进中的测试套件缩减

Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2018-07-12 DOI: 10.1145/3213846.3213875

A. Shi, A. Gyori, Suleman Mahmood, Peiyuan Zhao, D. Marinov

Test-suite reduction (TSR) speeds up regression testing by removing redundant tests from the test suite, thus running fewer tests in the future builds. To decide whether to use TSR or not, a developer needs some way to predict how well the reduced test suite will detect real faults in the future compared to the original test suite. Prior research evaluated the cost of TSR using only program versions with seeded faults, but such evaluations do not explicitly predict the effectiveness of the reduced test suite in future builds. We perform the first extensive study of TSR using real test failures in (failed) builds that occurred for real code changes. We analyze 1478 failed builds from 32 GitHub projects that run their tests on Travis. Each failed build can have multiple faults, so we propose a family of mappings from test failures to faults. We use these mappings to compute Failed-Build Detection Loss (FBDL), the percentage of failed builds where the reduced test suite misses to detect all the faults detected by the original test suite. We find that FBDL can be up to 52.2%, which is higher than suggested by traditional TSR metrics. Moreover, traditional TSR metrics are not good predictors of FBDL, making it difficult for developers to decide whether to use reduced test suites.

测试套件缩减(TSR)通过从测试套件中删除冗余测试来加速回归测试，从而在未来的构建中运行更少的测试。为了决定是否使用TSR，开发人员需要一些方法来预测与原始测试套件相比，缩减后的测试套件在未来检测真正错误的能力。先前的研究仅使用带有种子错误的程序版本来评估TSR的成本，但是这样的评估并不能明确地预测在未来构建中减少的测试套件的有效性。我们对TSR进行了第一次广泛的研究，使用了实际代码更改中发生的(失败的)构建中的实际测试失败。我们分析了在Travis上运行测试的32个GitHub项目中的1478个失败构建。每个失败的构建都可能有多个错误，因此我们提出了一系列从测试失败到错误的映射。我们使用这些映射来计算失败构建检测损失(FBDL)，即减少的测试套件未能检测到原始测试套件检测到的所有错误的失败构建的百分比。我们发现FBDL可以达到52.2%，高于传统TSR指标的建议。此外，传统的TSR指标并不能很好地预测FBDL，这使得开发人员很难决定是否使用减少的测试套件。

{"title":"Evaluating test-suite reduction in real software evolution","authors":"A. Shi, A. Gyori, Suleman Mahmood, Peiyuan Zhao, D. Marinov","doi":"10.1145/3213846.3213875","DOIUrl":"https://doi.org/10.1145/3213846.3213875","url":null,"abstract":"Test-suite reduction (TSR) speeds up regression testing by removing redundant tests from the test suite, thus running fewer tests in the future builds. To decide whether to use TSR or not, a developer needs some way to predict how well the reduced test suite will detect real faults in the future compared to the original test suite. Prior research evaluated the cost of TSR using only program versions with seeded faults, but such evaluations do not explicitly predict the effectiveness of the reduced test suite in future builds. We perform the first extensive study of TSR using real test failures in (failed) builds that occurred for real code changes. We analyze 1478 failed builds from 32 GitHub projects that run their tests on Travis. Each failed build can have multiple faults, so we propose a family of mappings from test failures to faults. We use these mappings to compute Failed-Build Detection Loss (FBDL), the percentage of failed builds where the reduced test suite misses to detect all the faults detected by the original test suite. We find that FBDL can be up to 52.2%, which is higher than suggested by traditional TSR metrics. Moreover, traditional TSR metrics are not good predictors of FBDL, making it difficult for developers to decide whether to use reduced test suites.","PeriodicalId":20542,"journal":{"name":"Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90668240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Shaping program repair space with existing patches and similar code 利用现有补丁和类似代码塑造程序修复空间

Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2018-07-12 DOI: 10.1145/3213846.3213871

Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, Xiangqun Chen

Automated program repair (APR) has great potential to reduce bug-fixing effort and many approaches have been proposed in recent years. APRs are often treated as a search problem where the search space consists of all the possible patches and the goal is to identify the correct patch in the space. Many techniques take a data-driven approach and analyze data sources such as existing patches and similar source code to help identify the correct patch. However, while existing patches and similar code provide complementary information, existing techniques analyze only a single source and cannot be easily extended to analyze both. In this paper, we propose a novel automatic program repair approach that utilizes both existing patches and similar code. Our approach mines an abstract search space from existing patches and obtains a concrete search space by differencing with similar code snippets. Then we search within the intersection of the two search spaces. We have implemented our approach as a tool called SimFix, and evaluated it on the Defects4J benchmark. Our tool successfully fixed 34 bugs. To our best knowledge, this is the largest number of bugs fixed by a single technology on the Defects4J benchmark. Furthermore, as far as we know, 13 bugs fixed by our approach have never been fixed by the current approaches.

自动程序修复(APR)在减少错误修复工作方面具有很大的潜力，近年来提出了许多方法。apr通常被视为搜索问题，其中搜索空间由所有可能的补丁组成，目标是在空间中识别正确的补丁。许多技术采用数据驱动的方法，并分析数据源(如现有补丁和类似的源代码)，以帮助识别正确的补丁。然而，虽然现有的补丁和类似的代码提供了互补的信息，但现有的技术只能分析单个来源，并且无法轻松扩展到同时分析这两个来源。在本文中，我们提出了一种新的自动程序修复方法，利用现有的补丁和类似的代码。我们的方法从已有的补丁中挖掘抽象的搜索空间，并通过与相似的代码片段进行区分来获得具体的搜索空间。然后我们在两个搜索空间的交点内搜索。我们已经将我们的方法实现为一个名为SimFix的工具，并在缺陷4j基准测试中对其进行了评估。我们的工具成功修复了34个错误。据我们所知，这是在缺陷4j基准测试中单个技术修复的最大数量的bug。此外，据我们所知，我们的方法修复的13个bug从未被当前的方法修复过。

{"title":"Shaping program repair space with existing patches and similar code","authors":"Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, Xiangqun Chen","doi":"10.1145/3213846.3213871","DOIUrl":"https://doi.org/10.1145/3213846.3213871","url":null,"abstract":"Automated program repair (APR) has great potential to reduce bug-fixing effort and many approaches have been proposed in recent years. APRs are often treated as a search problem where the search space consists of all the possible patches and the goal is to identify the correct patch in the space. Many techniques take a data-driven approach and analyze data sources such as existing patches and similar source code to help identify the correct patch. However, while existing patches and similar code provide complementary information, existing techniques analyze only a single source and cannot be easily extended to analyze both. In this paper, we propose a novel automatic program repair approach that utilizes both existing patches and similar code. Our approach mines an abstract search space from existing patches and obtains a concrete search space by differencing with similar code snippets. Then we search within the intersection of the two search spaces. We have implemented our approach as a tool called SimFix, and evaluated it on the Defects4J benchmark. Our tool successfully fixed 34 bugs. To our best knowledge, this is the largest number of bugs fixed by a single technology on the Defects4J benchmark. Furthermore, as far as we know, 13 bugs fixed by our approach have never been fixed by the current approaches.","PeriodicalId":20542,"journal":{"name":"Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"224 3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86195716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 242

Mutode: generic JavaScript and Node.js mutation testing tool 通用的JavaScript和Node.js变异测试工具

Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2018-07-12 DOI: 10.1145/3213846.3229504

Diego Rodríguez-Baquero, M. Vásquez

Mutation testing is a technique in which faults (mutants) are injected into a program or application to assess its test suite effectiveness. It works by inserting mutants and running the application’s test suite to identify if the mutants are detected (killed) or not (survived) by the tests. Although computationally expensive, it has proven to be an effective method to assess application test suites. Several mutation testing frameworks and tools have been built for the various programing languages, however, very few tools have been built for the JavaScript language, more specifically, there is a lack of mutation testing tools for the Node.js runtime and npm based applications. The npm Registry is a public collection of modules of open-source code for Node.js, front-end web applications, mobile applications, robots, routers, and countless other needs of the JavaScript community. The over 700,000 packages hosted in npm are downloaded more than 5 billion times per week. More and more software is published in npm every day, representing a huge opportunity to share code and solutions, but also to share bugs and faulty software. In this paper, we briefly describe prior work for mutation operators in JavaScript and Node.js, and propose Mutode, an open source tool which leverages the npm package ecosystem to perform mutation testing for JavaScript and Node.js applications. We empirically evaluated Mutode effectiveness by running it on 12 of the top 20 npm modules that have automated test suites.

突变测试是一种将错误(突变)注入程序或应用程序以评估其测试套件有效性的技术。它的工作原理是插入突变体并运行应用程序的测试套件，以确定测试是否检测到突变体(杀死)或未检测到突变体(存活)。尽管计算成本很高，但它已被证明是评估应用程序测试套件的有效方法。已经为各种编程语言构建了几个突变测试框架和工具，然而，为JavaScript语言构建的工具很少，更具体地说，缺乏针对Node.js运行时和基于npm的应用程序的突变测试工具。npm Registry是Node.js、前端web应用、移动应用、机器人、路由器和无数其他JavaScript社区需求的开源代码模块的公共集合。npm中托管的70多万个包每周被下载超过50亿次。每天都有越来越多的软件在npm中发布，这意味着共享代码和解决方案的巨大机会，也意味着共享错误和有缺陷的软件的巨大机会。在本文中，我们简要描述了JavaScript和Node.js中突变操作符的先前工作，并提出了Mutode，这是一个利用npm包生态系统对JavaScript和Node.js应用程序执行突变测试的开源工具。我们通过在拥有自动化测试套件的前20个npm模块中的12个上运行Mutode来评估它的有效性。

{"title":"Mutode: generic JavaScript and Node.js mutation testing tool","authors":"Diego Rodríguez-Baquero, M. Vásquez","doi":"10.1145/3213846.3229504","DOIUrl":"https://doi.org/10.1145/3213846.3229504","url":null,"abstract":"Mutation testing is a technique in which faults (mutants) are injected into a program or application to assess its test suite effectiveness. It works by inserting mutants and running the application’s test suite to identify if the mutants are detected (killed) or not (survived) by the tests. Although computationally expensive, it has proven to be an effective method to assess application test suites. Several mutation testing frameworks and tools have been built for the various programing languages, however, very few tools have been built for the JavaScript language, more specifically, there is a lack of mutation testing tools for the Node.js runtime and npm based applications. The npm Registry is a public collection of modules of open-source code for Node.js, front-end web applications, mobile applications, robots, routers, and countless other needs of the JavaScript community. The over 700,000 packages hosted in npm are downloaded more than 5 billion times per week. More and more software is published in npm every day, representing a huge opportunity to share code and solutions, but also to share bugs and faulty software. In this paper, we briefly describe prior work for mutation operators in JavaScript and Node.js, and propose Mutode, an open source tool which leverages the npm package ecosystem to perform mutation testing for JavaScript and Node.js applications. We empirically evaluated Mutode effectiveness by running it on 12 of the top 20 npm modules that have automated test suites.","PeriodicalId":20542,"journal":{"name":"Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82396525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Deep specification mining 深规范开采

Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2018-07-12 DOI: 10.1145/3213846.3213876

Tien-Duy B. Le, D. Lo

Formal specifcations are essential but usually unavailable in software systems. Furthermore, writing these specifcations is costly and requires skills from developers. Recently, many automated techniques have been proposed to mine specifcations in various formats including fnite-state automaton (FSA). However, more works in specifcation mining are needed to further improve the accuracy of the inferred specifcations. In this work, we propose Deep Specifcation Miner (DSM), a new approach that performs deep learning for mining FSA-based specifcations. Our proposed approach uses test case generation to generate a richer set of execution traces for training a Recurrent Neural Network Based Language Model (RNNLM). From these execution traces, we construct a Prefx Tree Acceptor (PTA) and use the learned RNNLM to extract many features. These features are subsequently utilized by clustering algorithms to merge similar automata states in the PTA for constructing a number of FSAs. Then, our approach performs a model selection heuristic to estimate F-measure of FSAs and returns the one with the highest estimated Fmeasure. We execute DSM to mine specifcations of 11 target library classes. Our empirical analysis shows that DSM achieves an average F-measure of 71.97%, outperforming the best performing baseline by 28.22%. We also demonstrate the value of DSM in sandboxing Android apps.

正式的规范是必要的，但在软件系统中通常是不可用的。此外，编写这些规范是昂贵的，并且需要开发人员的技能。最近，已经提出了许多自动化技术来挖掘各种格式的规范，包括有限状态自动机(FSA)。然而，为了进一步提高推断出的规范的准确性，还需要进行更多的规范挖掘工作。在这项工作中，我们提出了深度规范挖掘器(DSM)，这是一种执行深度学习的新方法，用于挖掘基于fsa的规范。我们提出的方法使用测试用例生成来生成更丰富的执行跟踪集，用于训练基于循环神经网络的语言模型(RNNLM)。从这些执行轨迹中，我们构建了一个前缀树受体(PTA)，并使用学习到的RNNLM提取许多特征。这些特征随后被聚类算法用于合并PTA中相似的自动机状态，以构建多个fsa。然后，我们的方法执行模型选择启发式方法来估计金融服务机构的f测度，并返回估计f测度最高的一个。我们执行DSM来挖掘11个目标库类的规范。我们的实证分析表明，帝斯曼的平均f值达到71.97%，比表现最好的基线高出28.22%。我们还展示了DSM在沙箱Android应用中的价值。

{"title":"Deep specification mining","authors":"Tien-Duy B. Le, D. Lo","doi":"10.1145/3213846.3213876","DOIUrl":"https://doi.org/10.1145/3213846.3213876","url":null,"abstract":"Formal specifcations are essential but usually unavailable in software systems. Furthermore, writing these specifcations is costly and requires skills from developers. Recently, many automated techniques have been proposed to mine specifcations in various formats including fnite-state automaton (FSA). However, more works in specifcation mining are needed to further improve the accuracy of the inferred specifcations. In this work, we propose Deep Specifcation Miner (DSM), a new approach that performs deep learning for mining FSA-based specifcations. Our proposed approach uses test case generation to generate a richer set of execution traces for training a Recurrent Neural Network Based Language Model (RNNLM). From these execution traces, we construct a Prefx Tree Acceptor (PTA) and use the learned RNNLM to extract many features. These features are subsequently utilized by clustering algorithms to merge similar automata states in the PTA for constructing a number of FSAs. Then, our approach performs a model selection heuristic to estimate F-measure of FSAs and returns the one with the highest estimated Fmeasure. We execute DSM to mine specifcations of 11 target library classes. Our empirical analysis shows that DSM achieves an average F-measure of 71.97%, outperforming the best performing baseline by 28.22%. We also demonstrate the value of DSM in sandboxing Android apps.","PeriodicalId":20542,"journal":{"name":"Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77928592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Translating code comments to procedure specifications 将代码注释翻译成过程规范

Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2018-07-12 DOI: 10.1145/3213846.3213872

Arianna Blasi, A. Goffi, Konstantin Kuznetsov, Alessandra Gorla, Michael D. Ernst, M. Pezzè, S. D. Castellanos

Procedure specifications are useful in many software development tasks. As one example, in automatic test case generation they can guide testing, act as test oracles able to reveal bugs, and identify illegal inputs. Whereas formal specifications are seldom available in practice, it is standard practice for developers to document their code with semi-structured comments. These comments express the procedure specification with a mix of predefined tags and natural language. This paper presents Jdoctor, an approach that combines pattern, lexical, and semantic matching to translate Javadoc comments into executable procedure specifications written as Java expressions. In an empirical evaluation, Jdoctor achieved precision of 92% and recall of 83% in translating Javadoc into procedure specifications. We also supplied the Jdoctor-derived specifications to an automated test case generation tool, Randoop. The specifications enabled Randoop to generate test cases of higher quality.

过程规范在许多软件开发任务中都很有用。例如，在自动测试用例生成中，它们可以指导测试，充当能够揭示错误和识别非法输入的测试预言者。尽管在实践中很少有正式的规范可用，但开发人员使用半结构化注释来记录代码是标准的实践。这些注释混合了预定义的标记和自然语言来表达过程规范。本文介绍了Jdoctor，这是一种结合了模式、词法和语义匹配的方法，可以将Javadoc注释翻译成用Java表达式编写的可执行过程规范。在一项实证评估中，Jdoctor在将Javadoc翻译成过程规范方面达到了92%的准确率和83%的召回率。我们还向自动化测试用例生成工具Randoop提供了jdoctor派生的规范。规范使Randoop能够生成更高质量的测试用例。

引用次数: 80

Identifying implementation bugs in machine learning based image classifiers using metamorphic testing 使用变形测试识别基于机器学习的图像分类器中的实现错误

Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2018-07-12 DOI: 10.1145/3213846.3213858

Anurag Dwarakanath, Manish Ahuja, Samarth Sikand, Raghotham M. Rao, Jagadeesh Chandra J. C. Bose, Neville Dubash, Sanjay Podder

We have recently witnessed tremendous success of Machine Learning (ML) in practical applications. Computer vision, speech recognition and language translation have all seen a near human level performance. We expect, in the near future, most business applications will have some form of ML. However, testing such applications is extremely challenging and would be very expensive if we follow today's methodologies. In this work, we present an articulation of the challenges in testing ML based applications. We then present our solution approach, based on the concept of Metamorphic Testing, which aims to identify implementation bugs in ML based image classifiers. We have developed metamorphic relations for an application based on Support Vector Machine and a Deep Learning based application. Empirical validation showed that our approach was able to catch 71% of the implementation bugs in the ML applications.

最近，我们见证了机器学习(ML)在实际应用中的巨大成功。计算机视觉、语音识别和语言翻译的表现都接近人类的水平。我们预计，在不久的将来，大多数商业应用程序将具有某种形式的机器学习。然而，如果我们遵循今天的方法，测试这样的应用程序是极具挑战性的，并且将非常昂贵。在这项工作中，我们提出了测试基于ML的应用程序所面临的挑战。然后，我们提出了基于变形测试概念的解决方案，旨在识别基于ML的图像分类器中的实现错误。我们开发了一个基于支持向量机的应用程序和一个基于深度学习的应用程序的变质关系。经验验证表明，我们的方法能够捕获ML应用程序中71%的实现错误。

引用次数: 144

Repositioning of static analysis alarms 静态分析告警重新定位

Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2018-07-12 DOI: 10.1145/3213846.3213850

Tukaram Muske, Rohith Talluri, Alexander Serebrenik

The large number of alarms reported by static analysis tools is often recognized as one of the major obstacles to industrial adoption of such tools. We present repositioning of alarms, a novel automatic postprocessing technique intended to reduce the number of reported alarms without affecting the errors uncovered by them. The reduction in the number of alarms is achieved by moving groups of related alarms along the control flow to a program point where they can be replaced by a single alarm. In the repositioning technique, as the locations of repositioned alarms are different than locations of the errors uncovered by them, we also maintain traceability links between a repositioned alarm and its corresponding original alarm(s). The presented technique is tool-agnostic and orthogonal to many other techniques available for postprocessing alarms. To evaluate the technique, we applied it as a postprocessing step to alarms generated for 4 verification properties on 16 open source and 4 industry applications. The results indicate that the alarms repositioning technique reduces the alarms count by up to 20% over the state-of-the-art alarms grouping techniques with a median reduction of 7.25%.

静态分析工具报告的大量警报通常被认为是工业采用此类工具的主要障碍之一。我们提出了报警重新定位，一种新的自动后处理技术，旨在减少报告的报警数量，而不影响它们发现的错误。警报数量的减少是通过将相关警报组沿着控制流移动到一个程序点来实现的，在这个程序点上，它们可以被单个警报所取代。在重新定位技术中，由于重新定位的报警位置不同于它们所发现的错误的位置，我们也保持了重新定位的报警与其对应的原始报警之间的可追溯性链接。所提出的技术是工具不可知的，并且与许多其他可用于后处理警报的技术正交。为了评估该技术，我们将其作为后处理步骤应用于16个开源和4个工业应用程序上为4个验证属性生成的警报。结果表明，与最先进的警报分组技术相比，警报重定位技术将警报计数减少了20%，中位数减少了7.25%。

{"title":"Repositioning of static analysis alarms","authors":"Tukaram Muske, Rohith Talluri, Alexander Serebrenik","doi":"10.1145/3213846.3213850","DOIUrl":"https://doi.org/10.1145/3213846.3213850","url":null,"abstract":"The large number of alarms reported by static analysis tools is often recognized as one of the major obstacles to industrial adoption of such tools. We present repositioning of alarms, a novel automatic postprocessing technique intended to reduce the number of reported alarms without affecting the errors uncovered by them. The reduction in the number of alarms is achieved by moving groups of related alarms along the control flow to a program point where they can be replaced by a single alarm. In the repositioning technique, as the locations of repositioned alarms are different than locations of the errors uncovered by them, we also maintain traceability links between a repositioned alarm and its corresponding original alarm(s). The presented technique is tool-agnostic and orthogonal to many other techniques available for postprocessing alarms. To evaluate the technique, we applied it as a postprocessing step to alarms generated for 4 verification properties on 16 open source and 4 industry applications. The results indicate that the alarms repositioning technique reduces the alarms count by up to 20% over the state-of-the-art alarms grouping techniques with a median reduction of 7.25%.","PeriodicalId":20542,"journal":{"name":"Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78722118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

MalViz: an interactive visualization tool for tracing malware MalViz:用于跟踪恶意软件的交互式可视化工具

Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2018-07-12 DOI: 10.1145/3213846.3229501

V. Nguyen, A. Namin, Tommy Dang

This demonstration paper introduces MalViz, a visual analytic tool for analyzing malware behavioral patterns through process monitoring events. The goals of this tool are: 1) to investigate the relationship and dependencies among processes interacted with a running malware over a certain period of time, 2) to support professional security experts in detecting and recognizing unusual signature-based patterns exhibited by a running malware, and 3) to help users identify infected system and users' libraries that the malware has reached and possibly tampered. A case study is conducted in a virtual machine environment with a sample of four malware programs. The result of the case study shows that the visualization tool offers a great support for experts in software and system analysis and digital forensics to profile and observe malicious behavior and further identify the traces of affected software artifacts.

本文介绍了MalViz，一个通过进程监控事件分析恶意软件行为模式的可视化分析工具。该工具的目标是:1)调查在一段时间内与正在运行的恶意软件交互的进程之间的关系和依赖关系;2)支持专业安全专家检测和识别正在运行的恶意软件所显示的不寻常的基于签名的模式;3)帮助用户识别恶意软件已经到达并可能被篡改的受感染系统和用户库。在虚拟机环境中，以四个恶意软件程序为例进行了案例研究。案例研究的结果表明，可视化工具为软件和系统分析以及数字取证专家提供了巨大的支持，以分析和观察恶意行为，并进一步识别受影响的软件工件的踪迹。

引用次数: 13

Test input generation with Java PathFinder: then and now (invited talk abstract) 使用Java PathFinder生成测试输入:过去和现在(特邀演讲摘要)

Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pub Date : 2018-07-12 DOI: 10.1145/3213846.3234687

S. Khurshid, C. Pasareanu, W. Visser

The paper Test Input Generation With Java PathFinder was published in the International Symposium on Software Testing and Analysis (ISSTA) 2004 Proceedings, and has now been selected to receive the ISSTA 2018 Retrospective Impact Paper Award. The paper described black-box and white-box techniques for the automated testing of software systems. These techniques were based on model checking and symbolic execution and incorporated in the Java PathFinder analysis tool. The main contribution of the paper was to describe how to perform efficient test input generation for code manipulating complex data that takes into account complex method preconditions and evaluate the techniques for generating high coverage tests. We review the original paper and we discuss the research that preceded it and the research that has happened between then (2004) and now (2018) in the context of the Java PathFinder tool, its symbolic execution component that is now called Symbolic PathFinder, and closely related approaches that target testing of software that manipulates complex data structures. We close with directions for future work.

论文“用Java PathFinder生成测试输入”发表在2004年国际软件测试与分析研讨会(ISSTA)会议录上，并被选为ISSTA 2018年回顾性影响论文奖。本文描述了用于软件系统自动化测试的黑盒和白盒技术。这些技术是基于模型检查和符号执行的，并被合并到Java PathFinder分析工具中。本文的主要贡献是描述了如何为处理复杂数据的代码执行有效的测试输入生成，这些数据考虑了复杂的方法前提条件，并评估了生成高覆盖率测试的技术。我们回顾了原始论文，并讨论了之前的研究以及在Java PathFinder工具的背景下(2004年)和现在(2018年)之间发生的研究，其符号执行组件(现在称为symbolic PathFinder)以及针对操作复杂数据结构的软件测试的密切相关方法。我们以未来工作的方向作为结束。

{"title":"Test input generation with Java PathFinder: then and now (invited talk abstract)","authors":"S. Khurshid, C. Pasareanu, W. Visser","doi":"10.1145/3213846.3234687","DOIUrl":"https://doi.org/10.1145/3213846.3234687","url":null,"abstract":"The paper Test Input Generation With Java PathFinder was published in the International Symposium on Software Testing and Analysis (ISSTA) 2004 Proceedings, and has now been selected to receive the ISSTA 2018 Retrospective Impact Paper Award. The paper described black-box and white-box techniques for the automated testing of software systems. These techniques were based on model checking and symbolic execution and incorporated in the Java PathFinder analysis tool. The main contribution of the paper was to describe how to perform efficient test input generation for code manipulating complex data that takes into account complex method preconditions and evaluate the techniques for generating high coverage tests. We review the original paper and we discuss the research that preceded it and the research that has happened between then (2004) and now (2018) in the context of the Java PathFinder tool, its symbolic execution component that is now called Symbolic PathFinder, and closely related approaches that target testing of software that manipulates complex data structures. We close with directions for future work.","PeriodicalId":20542,"journal":{"name":"Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74633287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀