Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering最新文献_第8页

Fairness testing: testing software for discrimination 公平性测试:测试软件是否存在歧视

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Pub Date : 2017-08-21 DOI: 10.1145/3106237.3106277

Sainyam Galhotra, Yuriy Brun, A. Meliou

This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination.

本文定义了软件公平和歧视，并开发了一种基于测试的方法来衡量软件是否歧视以及歧视的程度，重点关注歧视行为中的因果关系。在现代软件系统中，已经发现了软件歧视的证据，这些系统包括推荐刑事判决、授予金融产品访问权限以及决定允许谁参加促销活动。我们的方法，Themis，生成有效的测试套件来度量歧视。给定描述有效系统输入的模式，Themis自动生成判别测试，而不需要oracle。我们在20个软件系统上评估Themis，其中12个来自先前的工作，明确关注避免歧视。我们发现(1)Themis在发现软件歧视方面是有效的，(2)从算法中去除歧视的最先进技术在许多情况下都失败了，有时会对多达98%的输入子域进行歧视，(3)Themis优化在生成用于测量歧视的有效测试套件方面是有效的，(4)Themis在表现出更多歧视的系统上更有效。因此，我们证明了在可能存在歧视的领域中，公平测试是软件开发周期的一个关键方面，并提供了测量软件歧视的初始工具。

{"title":"Fairness testing: testing software for discrimination","authors":"Sainyam Galhotra, Yuriy Brun, A. Meliou","doi":"10.1145/3106237.3106277","DOIUrl":"https://doi.org/10.1145/3106237.3106277","url":null,"abstract":"This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125668057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 292

Automated control of multiple software goals using multiple actuators 使用多个执行器实现多个软件目标的自动化控制

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Pub Date : 2017-08-21 DOI: 10.1145/3106237.3106247

M. Maggio, A. Papadopoulos, A. Filieri, H. Hoffmann

Modern software should satisfy multiple goals simultaneously: it should provide predictable performance, be robust to failures, handle peak loads and deal seamlessly with unexpected conditions and changes in the execution environment. For this to happen, software designs should account for the possibility of runtime changes and provide formal guarantees of the software's behavior. Control theory is one of the possible design drivers for runtime adaptation, but adopting control theoretic principles often requires additional, specialized knowledge. To overcome this limitation, automated methodologies have been proposed to extract the necessary information from experimental data and design a control system for runtime adaptation. These proposals, however, only process one goal at a time, creating a chain of controllers. In this paper, we propose and evaluate the first automated strategy that takes into account multiple goals without separating them into multiple control strategies. Avoiding the separation allows us to tackle a larger class of problems and provide stronger guarantees. We test our methodology's generality with three case studies that demonstrate its broad applicability in meeting performance, reliability, quality, security, and energy goals despite environmental or requirements changes.

现代软件应该同时满足多个目标:它应该提供可预测的性能、对故障的健壮性、处理峰值负载以及无缝地处理执行环境中的意外条件和变化。为了实现这一点，软件设计应该考虑到运行时变化的可能性，并提供对软件行为的正式保证。控制理论是运行时适应性的可能设计驱动因素之一，但采用控制理论原理通常需要额外的专业知识。为了克服这一限制，人们提出了自动化方法，从实验数据中提取必要的信息，并设计一个运行时适应的控制系统。然而，这些建议一次只能处理一个目标，从而创建了一个控制器链。在本文中，我们提出并评估了第一个考虑多个目标而不将它们分离为多个控制策略的自动化策略。避免分离使我们能够处理更多种类的问题，并提供更强有力的保障。我们用三个案例研究来测试我们的方法的通用性，证明了它在满足性能、可靠性、质量、安全性和能源目标方面的广泛适用性，尽管环境或需求发生了变化。

{"title":"Automated control of multiple software goals using multiple actuators","authors":"M. Maggio, A. Papadopoulos, A. Filieri, H. Hoffmann","doi":"10.1145/3106237.3106247","DOIUrl":"https://doi.org/10.1145/3106237.3106247","url":null,"abstract":"Modern software should satisfy multiple goals simultaneously: it should provide predictable performance, be robust to failures, handle peak loads and deal seamlessly with unexpected conditions and changes in the execution environment. For this to happen, software designs should account for the possibility of runtime changes and provide formal guarantees of the software's behavior. Control theory is one of the possible design drivers for runtime adaptation, but adopting control theoretic principles often requires additional, specialized knowledge. To overcome this limitation, automated methodologies have been proposed to extract the necessary information from experimental data and design a control system for runtime adaptation. These proposals, however, only process one goal at a time, creating a chain of controllers. In this paper, we propose and evaluate the first automated strategy that takes into account multiple goals without separating them into multiple control strategies. Avoiding the separation allows us to tackle a larger class of problems and provide stronger guarantees. We test our methodology's generality with three case studies that demonstrate its broad applicability in meeting performance, reliability, quality, security, and energy goals despite environmental or requirements changes.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127211866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

From scenario modeling to scenario programming for reactive systems with dynamic topology 从动态拓扑系统的场景建模到场景编程

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Pub Date : 2017-08-21 DOI: 10.1145/3106237.3122827

Joel Greenyer, D. Gritzner, F. König, Jannik Dahlke, Jianwei Shi, Eric Wete

Software-intensive systems often consist of cooperating reactive components. In mobile and reconfigurable systems, their topology changes at run-time, which influences how the components must cooperate. The Scenario Modeling Language (SML) offers a formal approach for specifying the reactive behavior such systems that aligns with how humans conceive and communicate behavioral requirements. Simulation and formal checks can find specification flaws early. We present a framework for the Scenario-based Programming (SBP) that reflects the concepts of SML in Java and makes the scenario modeling approach available for programming. SBP code can also be generated from SML and extended with platform-specific code, thus streamlining the transition from design to implementation. As an example serves a car-to-x communication system. Demo video and artifact: http://scenariotools.org/esecfse-2017-tool-demo/

软件密集型系统通常由协作的响应组件组成。在移动和可重构的系统中，它们的拓扑结构在运行时发生变化，这会影响组件必须如何协作。场景建模语言(SML)提供了一种正式的方法，用于指定反应性行为，例如与人类如何构思和交流行为需求相一致的系统。模拟和正式检查可以及早发现规范缺陷。我们提出了一个基于场景的编程(SBP)框架，它反映了Java中SML的概念，并使场景建模方法可用于编程。SBP代码也可以从SML生成，并使用特定于平台的代码进行扩展，从而简化了从设计到实现的过渡。作为一个例子，服务于汽车到x通信系统。演示视频和工件:http://scenariotools.org/esecfse-2017-tool-demo/

引用次数: 4

PATDroid: permission-aware GUI testing of Android PATDroid: Android的权限感知GUI测试

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Pub Date : 2017-08-21 DOI: 10.1145/3106237.3106250

Alireza Sadeghi, Reyhaneh Jabbarvand, S. Malek

Recent introduction of a dynamic permission system in Android, allowing the users to grant and revoke permissions after the installation of an app, has made it harder to properly test apps. Since an app's behavior may change depending on the granted permissions, it needs to be tested under a wide range of permission combinations. At the state-of-the-art, in the absence of any automated tool support, a developer needs to either manually determine the interaction of tests and app permissions, or exhaustively re-execute tests for all possible permission combinations, thereby increasing the time and resources required to test apps. This paper presents an automated approach, called PATDroid, for efficiently testing an Android app while taking the impact of permissions on its behavior into account. PATDroid performs a hybrid program analysis on both an app under test and its test suite to determine which tests should be executed on what permission combinations. Our experimental results show that PATDroid significantly reduces the testing effort, yet achieves comparable code coverage and fault detection capability as exhaustively testing an app under all permission combinations.

最近在Android中引入了动态权限系统，允许用户在安装应用程序后授予和撤销权限，这使得正确测试应用程序变得更加困难。由于应用程序的行为可能会根据授予的权限而改变，因此需要在广泛的权限组合下进行测试。目前，在没有任何自动化工具支持的情况下，开发人员需要手动确定测试和应用程序权限的交互，或者为所有可能的权限组合彻底重新执行测试，从而增加了测试应用程序所需的时间和资源。本文提出了一种称为PATDroid的自动化方法，用于有效地测试Android应用程序，同时考虑到权限对其行为的影响。PATDroid对被测应用及其测试套件执行混合程序分析，以确定应该对哪些权限组合执行哪些测试。我们的实验结果表明，PATDroid显著减少了测试工作量，但达到了与在所有权限组合下详尽测试应用程序相当的代码覆盖率和故障检测能力。

{"title":"PATDroid: permission-aware GUI testing of Android","authors":"Alireza Sadeghi, Reyhaneh Jabbarvand, S. Malek","doi":"10.1145/3106237.3106250","DOIUrl":"https://doi.org/10.1145/3106237.3106250","url":null,"abstract":"Recent introduction of a dynamic permission system in Android, allowing the users to grant and revoke permissions after the installation of an app, has made it harder to properly test apps. Since an app's behavior may change depending on the granted permissions, it needs to be tested under a wide range of permission combinations. At the state-of-the-art, in the absence of any automated tool support, a developer needs to either manually determine the interaction of tests and app permissions, or exhaustively re-execute tests for all possible permission combinations, thereby increasing the time and resources required to test apps. This paper presents an automated approach, called PATDroid, for efficiently testing an Android app while taking the impact of permissions on its behavior into account. PATDroid performs a hybrid program analysis on both an app under test and its test suite to determine which tests should be executed on what permission combinations. Our experimental results show that PATDroid significantly reduces the testing effort, yet achieves comparable code coverage and fault detection capability as exhaustively testing an app under all permission combinations.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114894373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

The MONDO collaboration framework: secure collaborative modeling over existing version control systems MONDO协作框架:对现有版本控制系统进行安全的协作建模

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Pub Date : 2017-08-21 DOI: 10.1145/3106237.3122829

Csaba Debreceni, Gábor Bergmann, Márton Búr, I. Ráth, Dániel Varró

Model-based systems engineering of critical cyber-physical systems necessitates effective collaboration between different stakeholders while still providing secure protection of intellectual properties of all involved parties. While engineering artifacts are frequently stored in version control repositories, secure access control is limited to file-level strategies in most existing frameworks where models are split into multiple fragments with all-or-nothing permissions, which becomes a scalability and usability bottleneck in case of complex industrial models. In this paper, we introduce the MONDO Collaboration Framework, which provides rule-based fine-grained model-level secure access control, property-based locking and automated model merge integrated over existing version control systems such as Subversion (SVN) for storage and version control. Our framework simultaneously supports offline collaboration (asynchronous checkout-modify-commit) on top of off-the-shelf modeling tools and online scenarios (GoogleDocs-style short transactions) scenarios by offering a web-based modeling frontend. Screencast Demo: https://youtu.be/Ix3CgmsYIU0

关键信息物理系统的基于模型的系统工程需要不同利益相关者之间的有效协作，同时仍然为所有相关方的知识产权提供安全保护。虽然工程工件经常存储在版本控制库中，但在大多数现有框架中，安全访问控制仅限于文件级策略，在这些框架中，模型被分成多个具有全有或全无权限的片段，这在复杂工业模型的情况下成为可伸缩性和可用性瓶颈。在本文中，我们介绍了MONDO协作框架，它提供了基于规则的细粒度模型级安全访问控制，基于属性的锁定和自动模型合并，集成在现有的版本控制系统上，如Subversion (SVN)，用于存储和版本控制。我们的框架通过提供基于web的建模前端，同时支持离线协作(异步签出-修改-提交)和在线场景(googledocs风格的短事务)。视频演示:https://youtu.be/Ix3CgmsYIU0

{"title":"The MONDO collaboration framework: secure collaborative modeling over existing version control systems","authors":"Csaba Debreceni, Gábor Bergmann, Márton Búr, I. Ráth, Dániel Varró","doi":"10.1145/3106237.3122829","DOIUrl":"https://doi.org/10.1145/3106237.3122829","url":null,"abstract":"Model-based systems engineering of critical cyber-physical systems necessitates effective collaboration between different stakeholders while still providing secure protection of intellectual properties of all involved parties. While engineering artifacts are frequently stored in version control repositories, secure access control is limited to file-level strategies in most existing frameworks where models are split into multiple fragments with all-or-nothing permissions, which becomes a scalability and usability bottleneck in case of complex industrial models. In this paper, we introduce the MONDO Collaboration Framework, which provides rule-based fine-grained model-level secure access control, property-based locking and automated model merge integrated over existing version control systems such as Subversion (SVN) for storage and version control. Our framework simultaneously supports offline collaboration (asynchronous checkout-modify-commit) on top of off-the-shelf modeling tools and online scenarios (GoogleDocs-style short transactions) scenarios by offering a web-based modeling frontend. Screencast Demo: https://youtu.be/Ix3CgmsYIU0","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114950665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Regression test selection across JVM boundaries 跨JVM边界的回归测试选择

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Pub Date : 2017-08-21 DOI: 10.1145/3106237.3106297

Ahmet Çelik, Marko Vasic, Aleksandar Milicevic, Miloš Gligorić

Modern software development processes recommend that changes be integrated into the main development line of a project multiple times a day. Before a new revision may be integrated, developers practice regression testing to ensure that the latest changes do not break any previously established functionality. The cost of regression testing is high, due to an increase in the number of revisions that are introduced per day, as well as the number of tests developers write per revision. Regression test selection (RTS) optimizes regression testing by skipping tests that are not affected by recent project changes. Existing dynamic RTS techniques support only projects written in a single programming language, which is unfortunate knowing that an open-source project is on average written in several programming languages. We present the first dynamic RTS technique that does not stop at predefined language boundaries. Our technique dynamically detects, at the operating system level, all file artifacts a test depends on. Our technique is, hence, oblivious to the specific means the test uses to actually access the files: be it through spawning a new process, invoking a system call, invoking a library written in a different language, invoking a library that spawns a process which makes a system call, etc. We also provide a set of extension points which allow for a smooth integration with testing frameworks and build systems. We implemented our technique in a tool called RTSLinux as a loadable Linux kernel module and evaluated it on 21 Java projects that escape JVM by spawning new processes or invoking native code, totaling 2,050,791 lines of code. Our results show that RTSLinux, on average, skips 74.17% of tests and saves 52.83% of test execution time compared to executing all tests.

现代软件开发过程建议每天多次将变更集成到项目的主要开发线路中。在集成新版本之前，开发人员会进行回归测试，以确保最新的更改不会破坏任何先前建立的功能。回归测试的成本很高，这是由于每天引入的修订数量的增加，以及开发人员为每个修订编写的测试数量的增加。回归测试选择(RTS)通过跳过不受最近项目变更影响的测试来优化回归测试。现有的动态RTS技术只支持用一种编程语言编写的项目，不幸的是，开源项目通常是用几种编程语言编写的。我们提出了第一种动态RTS技术，它不局限于预定义的语言边界。我们的技术在操作系统级别动态检测测试所依赖的所有文件工件。因此，我们的技术忽略了测试实际访问文件所使用的具体方法:通过生成新进程、调用系统调用、调用用不同语言编写的库、调用生成进行系统调用的进程的库等等。我们还提供了一组扩展点，允许与测试框架和构建系统顺利集成。我们在一个名为RTSLinux的工具中实现了我们的技术，作为一个可加载的Linux内核模块，并在21个Java项目上进行了评估，这些项目通过生成新进程或调用本机代码来逃避JVM，总共有2,050,791行代码。我们的结果表明，与执行所有测试相比，RTSLinux平均跳过了74.17%的测试，节省了52.83%的测试执行时间。

{"title":"Regression test selection across JVM boundaries","authors":"Ahmet Çelik, Marko Vasic, Aleksandar Milicevic, Miloš Gligorić","doi":"10.1145/3106237.3106297","DOIUrl":"https://doi.org/10.1145/3106237.3106297","url":null,"abstract":"Modern software development processes recommend that changes be integrated into the main development line of a project multiple times a day. Before a new revision may be integrated, developers practice regression testing to ensure that the latest changes do not break any previously established functionality. The cost of regression testing is high, due to an increase in the number of revisions that are introduced per day, as well as the number of tests developers write per revision. Regression test selection (RTS) optimizes regression testing by skipping tests that are not affected by recent project changes. Existing dynamic RTS techniques support only projects written in a single programming language, which is unfortunate knowing that an open-source project is on average written in several programming languages. We present the first dynamic RTS technique that does not stop at predefined language boundaries. Our technique dynamically detects, at the operating system level, all file artifacts a test depends on. Our technique is, hence, oblivious to the specific means the test uses to actually access the files: be it through spawning a new process, invoking a system call, invoking a library written in a different language, invoking a library that spawns a process which makes a system call, etc. We also provide a set of extension points which allow for a smooth integration with testing frameworks and build systems. We implemented our technique in a tool called RTSLinux as a loadable Linux kernel module and evaluated it on 21 Java projects that escape JVM by spawning new processes or invoking native code, totaling 2,050,791 lines of code. Our results show that RTSLinux, on average, skips 74.17% of tests and saves 52.83% of test execution time compared to executing all tests.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130271746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

S3: syntax- and semantic-guided repair synthesis via programming by examples S3:通过示例编程实现语法和语义引导的修复综合

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Pub Date : 2017-08-21 DOI: 10.1145/3106237.3106309

X. Le, D. Chu, D. Lo, Claire Le Goues, W. Visser

A notable class of techniques for automatic program repair is known as semantics-based. Such techniques, e.g., Angelix, infer semantic specifications via symbolic execution, and then use program synthesis to construct new code that satisfies those inferred specifications. However, the obtained specifications are naturally incomplete, leaving the synthesis engine with a difficult task of synthesizing a general solution from a sparse space of many possible solutions that are consistent with the provided specifications but that do not necessarily generalize. We present S3, a new repair synthesis engine that leverages programming-by-examples methodology to synthesize high-quality bug repairs. The novelty in S3 that allows it to tackle the sparse search space to create more general repairs is three-fold: (1) A systematic way to customize and constrain the syntactic search space via a domain-specific language, (2) An efficient enumeration- based search strategy over the constrained search space, and (3) A number of ranking features based on measures of the syntactic and semantic distances between candidate solutions and the original buggy program. We compare S3’s repair effectiveness with state-of-the-art synthesis engines Angelix, Enumerative, and CVC4. S3 can successfully and correctly fix at least three times more bugs than the best baseline on datasets of 52 bugs in small programs, and 100 bugs in real-world large programs.

一类值得注意的自动程序修复技术被称为基于语义的技术。这些技术，例如Angelix，通过符号执行来推断语义规范，然后使用程序合成来构建满足这些推断规范的新代码。然而，获得的规范自然是不完整的，这使得合成引擎面临一项困难的任务，即从与提供的规范一致但不一定泛化的许多可能解决方案的稀疏空间中合成一般解决方案。我们提出了S3，一个新的修复合成引擎，它利用实例编程的方法来合成高质量的bug修复。S3的新颖之处在于，它可以处理稀疏搜索空间以创建更通用的修复，这有三个方面:(1)通过特定于领域的语言来定制和约束语法搜索空间的系统方法，(2)在受限搜索空间上有效的基于枚举的搜索策略，以及(3)基于候选解决方案与原始错误程序之间的语法和语义距离的度量的许多排序特性。我们将S3的修复效果与最先进的合成引擎Angelix、Enumerative和CVC4进行了比较。S3可以成功并正确地修复至少三倍于数据集上的最佳基线(小程序中的52个错误，实际大型程序中的100个错误)的错误。

{"title":"S3: syntax- and semantic-guided repair synthesis via programming by examples","authors":"X. Le, D. Chu, D. Lo, Claire Le Goues, W. Visser","doi":"10.1145/3106237.3106309","DOIUrl":"https://doi.org/10.1145/3106237.3106309","url":null,"abstract":"A notable class of techniques for automatic program repair is known as semantics-based. Such techniques, e.g., Angelix, infer semantic specifications via symbolic execution, and then use program synthesis to construct new code that satisfies those inferred specifications. However, the obtained specifications are naturally incomplete, leaving the synthesis engine with a difficult task of synthesizing a general solution from a sparse space of many possible solutions that are consistent with the provided specifications but that do not necessarily generalize. We present S3, a new repair synthesis engine that leverages programming-by-examples methodology to synthesize high-quality bug repairs. The novelty in S3 that allows it to tackle the sparse search space to create more general repairs is three-fold: (1) A systematic way to customize and constrain the syntactic search space via a domain-specific language, (2) An efficient enumeration- based search strategy over the constrained search space, and (3) A number of ranking features based on measures of the syntactic and semantic distances between candidate solutions and the original buggy program. We compare S3’s repair effectiveness with state-of-the-art synthesis engines Angelix, Enumerative, and CVC4. S3 can successfully and correctly fix at least three times more bugs than the best baseline on datasets of 52 bugs in small programs, and 100 bugs in real-world large programs.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122164954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 210

Guided, stochastic model-based GUI testing of Android apps 引导的，随机的基于模型的Android应用GUI测试

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Pub Date : 2017-08-21 DOI: 10.1145/3106237.3106298

Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, W. Yang, Yao Yao, G. Pu, Yang Liu, Z. Su

Mobile apps are ubiquitous, operate in complex environments and are developed under the time-to-market pressure. Ensuring their correctness and reliability thus becomes an important challenge. This paper introduces Stoat, a novel guided approach to perform stochastic model-based testing on Android apps. Stoat operates in two phases: (1) Given an app as input, it uses dynamic analysis enhanced by a weighted UI exploration strategy and static analysis to reverse engineer a stochastic model of the app's GUI interactions; and (2) it adapts Gibbs sampling to iteratively mutate/refine the stochastic model and guides test generation from the mutated models toward achieving high code and model coverage and exhibiting diverse sequences. During testing, system-level events are randomly injected to further enhance the testing effectiveness. Stoat was evaluated on 93 open-source apps. The results show (1) the models produced by Stoat cover 17~31% more code than those by existing modeling tools; (2) Stoat detects 3X more unique crashes than two state-of-the-art testing tools, Monkey and Sapienz. Furthermore, Stoat tested 1661 most popular Google Play apps, and detected 2110 previously unknown and unique crashes. So far, 43 developers have responded that they are investigating our reports. 20 of reported crashes have been confirmed, and 8 already fixed.

移动应用无处不在，在复杂的环境中运行，并且是在上市时间的压力下开发的。因此，确保它们的正确性和可靠性成为一个重要的挑战。本文介绍了一种新的引导方法Stoat，用于在Android应用程序上进行基于随机模型的测试。Stoat分为两个阶段:(1)给定一个应用程序作为输入，它使用动态分析，通过加权UI探索策略和静态分析来增强应用程序GUI交互的随机模型;(2)采用Gibbs抽样对随机模型进行迭代突变/改进，并指导从突变模型生成的测试，以实现高代码和模型覆盖率，并展示多样化的序列。在测试过程中，随机注入系统级事件，进一步提高测试效率。Stoat在93个开源应用程序中进行了评估。结果表明:(1)与现有建模工具相比，Stoat模型的代码覆盖量增加了17% ~31%;(2)与Monkey和Sapienz这两种最先进的测试工具相比，Stoat检测到的独特崩溃多出3倍。此外，Stoat测试了1661个最受欢迎的b谷歌Play应用程序，并检测到2110个以前未知的独特崩溃。到目前为止，已经有43个开发者回应说他们正在调查我们的报告。报告的崩溃中有20个已经确认，8个已经修复。

{"title":"Guided, stochastic model-based GUI testing of Android apps","authors":"Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, W. Yang, Yao Yao, G. Pu, Yang Liu, Z. Su","doi":"10.1145/3106237.3106298","DOIUrl":"https://doi.org/10.1145/3106237.3106298","url":null,"abstract":"Mobile apps are ubiquitous, operate in complex environments and are developed under the time-to-market pressure. Ensuring their correctness and reliability thus becomes an important challenge. This paper introduces Stoat, a novel guided approach to perform stochastic model-based testing on Android apps. Stoat operates in two phases: (1) Given an app as input, it uses dynamic analysis enhanced by a weighted UI exploration strategy and static analysis to reverse engineer a stochastic model of the app's GUI interactions; and (2) it adapts Gibbs sampling to iteratively mutate/refine the stochastic model and guides test generation from the mutated models toward achieving high code and model coverage and exhibiting diverse sequences. During testing, system-level events are randomly injected to further enhance the testing effectiveness. Stoat was evaluated on 93 open-source apps. The results show (1) the models produced by Stoat cover 17~31% more code than those by existing modeling tools; (2) Stoat detects 3X more unique crashes than two state-of-the-art testing tools, Monkey and Sapienz. Furthermore, Stoat tested 1661 most popular Google Play apps, and detected 2110 previously unknown and unique crashes. So far, 43 developers have responded that they are investigating our reports. 20 of reported crashes have been confirmed, and 8 already fixed.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"560 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123137990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 268

Model-level, platform-independent debugging in the context of the model-driven development of real-time systems 模型级、平台无关的实时系统模型驱动开发环境下的调试

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Pub Date : 2017-08-21 DOI: 10.1145/3106237.3106278

M. Bagherzadeh, N. Hili, J. Dingel

Providing proper support for debugging models at model-level is one of the main barriers to a broader adoption of Model Driven Development (MDD). In this paper, we focus on the use of MDD for the development of real-time embedded systems (RTE). We introduce a new platform-independent approach to implement model-level debuggers. We describe how to realize support for model-level debugging entirely in terms of the modeling language and show how to implement this support in terms of a model-to-model transformation. Key advantages of the approach over existing work are that (1) it does not require a program debugger for the code generated from the model, and that (2) any changes to, e.g., the code generator, the target language, or the hardware platform leave the debugger completely unaffected. We also describe an implementation of the approach in the context of Papyrus-RT, an open source MDD tool based on the modeling language UML-RT. We summarize the results of the use of our model-based debugger on several use cases to determine its overhead in terms of size and performance. Despite being a prototype, the performance overhead is in the order of microseconds, while the size overhead is comparable with that of GDB, the GNU Debugger.

在模型级别为调试模型提供适当的支持是广泛采用模型驱动开发(MDD)的主要障碍之一。在本文中，我们关注于MDD在实时嵌入式系统(RTE)开发中的应用。我们引入了一种新的平台无关的方法来实现模型级调试器。我们描述了如何完全根据建模语言实现对模型级调试的支持，并展示了如何根据模型到模型的转换实现这种支持。与现有工作相比，该方法的主要优点是:(1)它不需要从模型生成的代码的程序调试器，以及(2)对代码生成器、目标语言或硬件平台的任何更改都不会完全影响调试器。我们还描述了该方法在Papyrus-RT上下文中的实现，Papyrus-RT是一种基于建模语言UML-RT的开源MDD工具。我们总结了在几个用例中使用基于模型的调试器的结果，以确定其在大小和性能方面的开销。尽管它是一个原型，但其性能开销在微秒量级，而其大小开销与GDB (GNU调试器)相当。

{"title":"Model-level, platform-independent debugging in the context of the model-driven development of real-time systems","authors":"M. Bagherzadeh, N. Hili, J. Dingel","doi":"10.1145/3106237.3106278","DOIUrl":"https://doi.org/10.1145/3106237.3106278","url":null,"abstract":"Providing proper support for debugging models at model-level is one of the main barriers to a broader adoption of Model Driven Development (MDD). In this paper, we focus on the use of MDD for the development of real-time embedded systems (RTE). We introduce a new platform-independent approach to implement model-level debuggers. We describe how to realize support for model-level debugging entirely in terms of the modeling language and show how to implement this support in terms of a model-to-model transformation. Key advantages of the approach over existing work are that (1) it does not require a program debugger for the code generated from the model, and that (2) any changes to, e.g., the code generator, the target language, or the hardware platform leave the debugger completely unaffected. We also describe an implementation of the approach in the context of Papyrus-RT, an open source MDD tool based on the modeling language UML-RT. We summarize the results of the use of our model-based debugger on several use cases to determine its overhead in terms of size and performance. Despite being a prototype, the performance overhead is in the order of microseconds, while the size overhead is comparable with that of GDB, the GNU Debugger.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122900194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

LAMP: data provenance for graph based machine learning algorithms through derivative computation LAMP:通过导数计算的基于图的机器学习算法的数据来源

Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

Pub Date : 2017-08-21 DOI: 10.1145/3106237.3106291

Shiqing Ma, Yousra Aafer, Zhaogui Xu, Wen-Chuan Lee, Juan Zhai, Yingqi Liu, X. Zhang

Data provenance tracking determines the set of inputs related to a given output. It enables quality control and problem diagnosis in data engineering. Most existing techniques work by tracking program dependencies. They cannot quantitatively assess the importance of related inputs, which is critical to machine learning algorithms, in which an output tends to depend on a huge set of inputs while only some of them are of importance. In this paper, we propose LAMP, a provenance computation system for machine learning algorithms. Inspired by automatic differentiation (AD), LAMP quantifies the importance of an input for an output by computing the partial derivative. LAMP separates the original data processing and the more expensive derivative computation to different processes to achieve cost-effectiveness. In addition, it allows quantifying importance for inputs related to discrete behavior, such as control flow selection. The evaluation on a set of real world programs and data sets illustrates that LAMP produces more precise and succinct provenance than program dependence based techniques, with much less overhead. Our case studies demonstrate the potential of LAMP in problem diagnosis in data engineering.

数据来源跟踪确定与给定输出相关的一组输入。它使数据工程中的质量控制和问题诊断成为可能。大多数现有技术都是通过跟踪程序依赖关系来工作的。它们不能定量地评估相关输入的重要性，这对机器学习算法至关重要，因为机器学习算法的输出往往依赖于大量的输入，而其中只有一部分是重要的。在本文中，我们提出了LAMP，一个用于机器学习算法的来源计算系统。受自动微分(AD)的启发，LAMP通过计算偏导数来量化输入对输出的重要性。LAMP将原始数据处理和较昂贵的导数计算分离到不同的过程中，以达到成本效益。此外，它允许量化与离散行为相关的输入的重要性，例如控制流选择。对一组真实世界的程序和数据集的评估表明，LAMP比基于程序依赖的技术产生更精确和简洁的来源，开销更小。我们的案例研究证明了LAMP在数据工程问题诊断中的潜力。

{"title":"LAMP: data provenance for graph based machine learning algorithms through derivative computation","authors":"Shiqing Ma, Yousra Aafer, Zhaogui Xu, Wen-Chuan Lee, Juan Zhai, Yingqi Liu, X. Zhang","doi":"10.1145/3106237.3106291","DOIUrl":"https://doi.org/10.1145/3106237.3106291","url":null,"abstract":"Data provenance tracking determines the set of inputs related to a given output. It enables quality control and problem diagnosis in data engineering. Most existing techniques work by tracking program dependencies. They cannot quantitatively assess the importance of related inputs, which is critical to machine learning algorithms, in which an output tends to depend on a huge set of inputs while only some of them are of importance. In this paper, we propose LAMP, a provenance computation system for machine learning algorithms. Inspired by automatic differentiation (AD), LAMP quantifies the importance of an input for an output by computing the partial derivative. LAMP separates the original data processing and the more expensive derivative computation to different processes to achieve cost-effectiveness. In addition, it allows quantifying importance for inputs related to discrete behavior, such as control flow selection. The evaluation on a set of real world programs and data sets illustrates that LAMP produces more precise and succinct provenance than program dependence based techniques, with much less overhead. Our case studies demonstrate the potential of LAMP in problem diagnosis in data engineering.","PeriodicalId":313494,"journal":{"name":"Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116976654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21