首页 > 最新文献

2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)最新文献

英文 中文
Automata Language Equivalence vs. Simulations for Model-Based Mutant Equivalence: An Empirical Evaluation 自动机语言等价与基于模型的突变等价的模拟:一个经验评价
Xavier Devroey, Gilles Perrouin, Mike Papadakis, Axel Legay, Pierre-Yves Schobbens, P. Heymans
Mutation analysis is a popular test assessment method. It relies on the mutation score, which indicates how many mutants are revealed by a test suite. Yet, there are mutants whose behaviour is equivalent to the original system, wasting analysis resources and preventing the satisfaction of the full (100%) mutation score. For finite behavioural models, the Equivalent Mutant Problem (EMP) can be addressed through language equivalence of non-deterministic finite automata, which is a well-studied, yet computationally expensive, problem in automata theory. In this paper, we report on our preliminary assessment of a state-of-the-art exact language equivalence tool to handle the EMP against 3 models of size up to 15,000 states on 1170 mutants. We introduce random and mutation-biased simulation heuristics as baselines for comparison. Results show that the exact approach is often more than ten times faster in the weak mutation scenario. For strong mutation, our biased simulations are faster for models larger than 300 states. They can be up to 1,000 times faster while limiting the error of misclassifying non-equivalent mutants as equivalent to 10% on average. We therefore conclude that the approaches can be combined for improved efficiency.
突变分析是一种流行的测试评估方法。它依赖于突变分数,它表明一个测试套件揭示了多少突变。然而,也有一些突变体的行为与原系统相当,浪费了分析资源,阻碍了对全部(100%)突变评分的满足。对于有限行为模型,等效突变问题(EMP)可以通过非确定性有限自动机的语言等价来解决,这是自动机理论中一个研究得很好的问题,但计算成本很高。在本文中,我们报告了我们对最先进的精确语言等效工具的初步评估,该工具可针对1170个突变体的3个模型处理多达15,000个状态的EMP。我们引入随机和突变偏差模拟启发式作为比较的基线。结果表明,在弱突变情况下,精确的方法通常要快十倍以上。对于强突变,我们的偏差模拟在大于300个状态的模型中更快。它们的速度可以提高1000倍,同时将错误分类非等效突变的错误率平均限制在10%。因此,我们得出结论,这些方法可以结合起来提高效率。
{"title":"Automata Language Equivalence vs. Simulations for Model-Based Mutant Equivalence: An Empirical Evaluation","authors":"Xavier Devroey, Gilles Perrouin, Mike Papadakis, Axel Legay, Pierre-Yves Schobbens, P. Heymans","doi":"10.1109/ICST.2017.46","DOIUrl":"https://doi.org/10.1109/ICST.2017.46","url":null,"abstract":"Mutation analysis is a popular test assessment method. It relies on the mutation score, which indicates how many mutants are revealed by a test suite. Yet, there are mutants whose behaviour is equivalent to the original system, wasting analysis resources and preventing the satisfaction of the full (100%) mutation score. For finite behavioural models, the Equivalent Mutant Problem (EMP) can be addressed through language equivalence of non-deterministic finite automata, which is a well-studied, yet computationally expensive, problem in automata theory. In this paper, we report on our preliminary assessment of a state-of-the-art exact language equivalence tool to handle the EMP against 3 models of size up to 15,000 states on 1170 mutants. We introduce random and mutation-biased simulation heuristics as baselines for comparison. Results show that the exact approach is often more than ten times faster in the weak mutation scenario. For strong mutation, our biased simulations are faster for models larger than 300 states. They can be up to 1,000 times faster while limiting the error of misclassifying non-equivalent mutants as equivalent to 10% on average. We therefore conclude that the approaches can be combined for improved efficiency.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116105977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Recovering Semantic Traceability Links between APIs and Security Vulnerabilities: An Ontological Modeling Approach 恢复api和安全漏洞之间的语义可追溯性链接:一种本体建模方法
Sultan S. Al-Qahtani, Ellis E. Eghan, J. Rilling
Over the last decade, a globalization of the software industry took place, which facilitated the sharing and reuse of code across existing project boundaries. At the same time, such global reuse also introduces new challenges to the software engineering community, with not only components but also their problems and vulnerabilities being now shared. For example, vulnerabilities found in APIs no longer affect only individual projects but instead might spread across projects and even global software ecosystem borders. Tracing these vulnerabilities at a global scale becomes an inherently difficult task since many of the existing resources required for such analysis still rely on proprietary knowledge representation. In this research, we introduce an ontology-based knowledge modeling approach that can eliminate such information silos. More specifically, we focus on linking security knowledge with other software knowledge to improve traceability and trust in software products (APIs). Our approach takes advantage of the Semantic Web and its reasoning services, to trace and assess the impact of security vulnerabilities across project boundaries. We present a case study, to illustrate the applicability and flexibility of our ontological modeling approach by tracing vulnerabilities across project and resource boundaries.
在过去的十年中,软件行业的全球化发生了,它促进了代码的共享和重用,并跨越了现有的项目边界。同时,这种全局重用也给软件工程社区带来了新的挑战,不仅是组件,而且它们的问题和漏洞现在也被共享了。例如,在api中发现的漏洞不再只影响单个项目,而是可能跨项目甚至全球软件生态系统边界传播。在全球范围内追踪这些漏洞本身就是一项艰巨的任务,因为这种分析所需的许多现有资源仍然依赖于专有知识表示。在本研究中,我们引入了一种基于本体的知识建模方法来消除这种信息孤岛。更具体地说,我们专注于将安全知识与其他软件知识联系起来,以提高软件产品(api)的可追溯性和信任度。我们的方法利用语义Web及其推理服务来跟踪和评估跨项目边界的安全漏洞的影响。我们提出了一个案例研究,通过跟踪跨项目和资源边界的漏洞来说明本体论建模方法的适用性和灵活性。
{"title":"Recovering Semantic Traceability Links between APIs and Security Vulnerabilities: An Ontological Modeling Approach","authors":"Sultan S. Al-Qahtani, Ellis E. Eghan, J. Rilling","doi":"10.1109/ICST.2017.15","DOIUrl":"https://doi.org/10.1109/ICST.2017.15","url":null,"abstract":"Over the last decade, a globalization of the software industry took place, which facilitated the sharing and reuse of code across existing project boundaries. At the same time, such global reuse also introduces new challenges to the software engineering community, with not only components but also their problems and vulnerabilities being now shared. For example, vulnerabilities found in APIs no longer affect only individual projects but instead might spread across projects and even global software ecosystem borders. Tracing these vulnerabilities at a global scale becomes an inherently difficult task since many of the existing resources required for such analysis still rely on proprietary knowledge representation. In this research, we introduce an ontology-based knowledge modeling approach that can eliminate such information silos. More specifically, we focus on linking security knowledge with other software knowledge to improve traceability and trust in software products (APIs). Our approach takes advantage of the Semantic Web and its reasoning services, to trace and assess the impact of security vulnerabilities across project boundaries. We present a case study, to illustrate the applicability and flexibility of our ontological modeling approach by tracing vulnerabilities across project and resource boundaries.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123669907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
O!Snap: Cost-Efficient Testing in the Cloud O !Snap:云中的成本效益测试
Alessio Gambi, Alessandra Gorla, A. Zeller
Porting a testing environment to a cloud infrastructure is not straightforward. This paper presents O!Snap, an approach to generate test plans to cost-efficiently execute tests in the cloud. O!Snap automatically maximizes reuse of existing virtual machines, and interleaves the creation of updated test images with the execution of tests to minimize overall test execution time and/or cost. In an evaluation involving 2,600+ packages and 24,900+ test jobs of the Debian continuous integration environment, O!Snap reduces test setup time by up to 88% and test execution time by up to 43.3% without additional costs.
将测试环境移植到云基础设施并不简单。本文提出了O!Snap,一种生成测试计划的方法,以便在云中经济高效地执行测试。O !Snap会自动最大化现有虚拟机的重用,并将更新的测试映像的创建与测试的执行交织在一起,以最小化总体测试执行时间和/或成本。在一项涉及2,600多个软件包和24,900多个Debian持续集成环境测试作业的评估中,O!Snap将测试设置时间减少了88%,测试执行时间减少了43.3%,而无需额外成本。
{"title":"O!Snap: Cost-Efficient Testing in the Cloud","authors":"Alessio Gambi, Alessandra Gorla, A. Zeller","doi":"10.1109/ICST.2017.51","DOIUrl":"https://doi.org/10.1109/ICST.2017.51","url":null,"abstract":"Porting a testing environment to a cloud infrastructure is not straightforward. This paper presents O!Snap, an approach to generate test plans to cost-efficiently execute tests in the cloud. O!Snap automatically maximizes reuse of existing virtual machines, and interleaves the creation of updated test images with the execution of tests to minimize overall test execution time and/or cost. In an evaluation involving 2,600+ packages and 24,900+ test jobs of the Debian continuous integration environment, O!Snap reduces test setup time by up to 88% and test execution time by up to 43.3% without additional costs.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122677985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Perphecy: Performance Regression Test Selection Made Simple but Effective 假设:性能回归测试选择简单而有效
Augusto Born de Oliveira, S. Fischmeister, Amer Diwan, Matthias Hauswirth, P. Sweeney
Developers of performance sensitive production software are in a dilemma: performance regression tests are too costly to run at each commit, but skipping the tests delays and complicates performance regression detection. Ideally, developers would have a system that predicts whether a given commit is likely to impact performance and suggests which tests to run to detect a potential performance regression. Prior approaches towards this problem require static or dynamic analyses that limit their generality and applicability. This paper presents an approach that is simple and general, and that works surprisingly well for real applications.
性能敏感型产品软件的开发人员正处于两难境地:每次提交时运行性能回归测试成本太高,但跳过测试会延迟并使性能回归检测变得复杂。理想情况下,开发人员应该有一个系统来预测给定的提交是否可能影响性能,并建议运行哪些测试来检测潜在的性能退化。先前解决这个问题的方法需要静态或动态分析,这限制了它们的通用性和适用性。本文提出了一种简单而通用的方法,并且在实际应用中工作得非常好。
{"title":"Perphecy: Performance Regression Test Selection Made Simple but Effective","authors":"Augusto Born de Oliveira, S. Fischmeister, Amer Diwan, Matthias Hauswirth, P. Sweeney","doi":"10.1109/ICST.2017.17","DOIUrl":"https://doi.org/10.1109/ICST.2017.17","url":null,"abstract":"Developers of performance sensitive production software are in a dilemma: performance regression tests are too costly to run at each commit, but skipping the tests delays and complicates performance regression detection. Ideally, developers would have a system that predicts whether a given commit is likely to impact performance and suggests which tests to run to detect a potential performance regression. Prior approaches towards this problem require static or dynamic analyses that limit their generality and applicability. This paper presents an approach that is simple and general, and that works surprisingly well for real applications.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133704868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
FIFA: A Kernel-Level Fault Injection Framework for ARM-Based Embedded Linux System 基于arm的嵌入式Linux系统的内核级故障注入框架
Eunji Jeong, Namgoo Lee, Jinhan Kim, Duseok Kang, S. Ha
Emulating fault scenarios by injecting faults intentionally is commonly used to test and verify the robustness of a system. As the number of hardware devices integrated into an embedded system tends to increase consistently and the chance of hardware failure is expected to increase in an SoC, it becomes important to emulate fault scenarios caused by hardware-related errors. To this end, we present a kernel-level fault injection framework for ARM-based embedded Linux systems, called FIFA, aiming to investigate the effect of an individual hardware error in a real hardware platform rather than performing statistical analysis by random experiments. FIFA consists of two complementary fault injection techniques, one is based on the Kernel GNU Debugger and the other on hardware breakpoints. Compared with the previous work that emulates bit-flip errors only, FIFA supports other types of errors such as time delay and device failure. The viability of the proposed framework is proved by real-life experiments with an ODROID-XU4 system.
通过故意注入故障来模拟故障场景通常用于测试和验证系统的鲁棒性。由于集成到嵌入式系统中的硬件设备数量趋于持续增加,并且预计SoC中硬件故障的机会也会增加,因此模拟由硬件相关错误引起的故障场景变得非常重要。为此,我们提出了一个基于arm的嵌入式Linux系统的内核级故障注入框架,称为FIFA,旨在研究真实硬件平台中单个硬件错误的影响,而不是通过随机实验进行统计分析。FIFA由两种互补的故障注入技术组成,一种基于Kernel GNU Debugger,另一种基于硬件断点。与之前的工作相比,FIFA支持其他类型的错误,如时间延迟和设备故障。在ODROID-XU4系统上的实际实验证明了该框架的可行性。
{"title":"FIFA: A Kernel-Level Fault Injection Framework for ARM-Based Embedded Linux System","authors":"Eunji Jeong, Namgoo Lee, Jinhan Kim, Duseok Kang, S. Ha","doi":"10.1109/ICST.2017.10","DOIUrl":"https://doi.org/10.1109/ICST.2017.10","url":null,"abstract":"Emulating fault scenarios by injecting faults intentionally is commonly used to test and verify the robustness of a system. As the number of hardware devices integrated into an embedded system tends to increase consistently and the chance of hardware failure is expected to increase in an SoC, it becomes important to emulate fault scenarios caused by hardware-related errors. To this end, we present a kernel-level fault injection framework for ARM-based embedded Linux systems, called FIFA, aiming to investigate the effect of an individual hardware error in a real hardware platform rather than performing statistical analysis by random experiments. FIFA consists of two complementary fault injection techniques, one is based on the Kernel GNU Debugger and the other on hardware breakpoints. Compared with the previous work that emulates bit-flip errors only, FIFA supports other types of errors such as time delay and device failure. The viability of the proposed framework is proved by real-life experiments with an ODROID-XU4 system.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133664971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Using Semantic Similarity in Crawling-Based Web Application Testing 语义相似度在基于爬虫的Web应用程序测试中的应用
Jun-Wei Lin, Farn Wang, Paul Chu
To automatically test web applications, crawling-based techniques are usually adopted to mine the behavior models, explore the state spaces or detect the violated invariants of the applications. However, their broad use is limited by the required manual configurations for input value selection, GUI state comparison and clickable detection. In existing crawlers, the configurations are usually string-matching based rules looking for tags or attributes of DOM elements, and often application-specific. Moreover, in input topic identification, it can be difficult to determine which rule suggests a better match when several rules match an input field to more than one topic. This paper presents a natural-language approach based on semantic similarity to address the above issues. The proposed approach represents DOM elements as vectors in a vector space formed by the words used in the elements. The topics of encountered input fields during crawling can then be inferred by their similarities with ones in a labeled corpus. Semantic similarity can also be applied to suggest if a GUI state is newly discovered and a DOM element is clickable under an unsupervised learning paradigm. We evaluated the proposed approach in input topic identification with 100 real-world forms and GUI state comparison with real data from industry. Our evaluation shows that the proposed approach has comparable or better performance to the conventional techniques. Experiments in input topic identification also show that the accuracy of the rule-based approach can be improved by up to 22% when integrated with our approach.
为了自动测试web应用程序,通常采用基于爬虫的技术来挖掘行为模型,探索状态空间或检测应用程序的违反不变量。然而,它们的广泛使用受到输入值选择、GUI状态比较和可点击检测所需的手动配置的限制。在现有的爬虫中,配置通常是基于字符串匹配的规则,查找DOM元素的标记或属性,并且通常是特定于应用程序的。此外,在输入主题识别中,当多个规则将一个输入字段与多个主题匹配时,很难确定哪个规则建议更好的匹配。本文提出了一种基于语义相似度的自然语言方法来解决上述问题。建议的方法将DOM元素表示为向量空间中的向量,向量空间由元素中使用的单词组成。在爬行过程中遇到的输入字段的主题可以通过它们与标记语料库中的主题的相似性来推断。语义相似性还可以应用于提示是否新发现GUI状态以及在无监督学习范式下是否可单击DOM元素。我们用100个真实世界的表单和GUI状态与来自工业的真实数据进行比较,评估了所提出的输入主题识别方法。我们的评估表明,所提出的方法具有相当的性能或更好的传统技术。输入主题识别的实验也表明,当与我们的方法相结合时,基于规则的方法的准确率可以提高22%。
{"title":"Using Semantic Similarity in Crawling-Based Web Application Testing","authors":"Jun-Wei Lin, Farn Wang, Paul Chu","doi":"10.1109/ICST.2017.20","DOIUrl":"https://doi.org/10.1109/ICST.2017.20","url":null,"abstract":"To automatically test web applications, crawling-based techniques are usually adopted to mine the behavior models, explore the state spaces or detect the violated invariants of the applications. However, their broad use is limited by the required manual configurations for input value selection, GUI state comparison and clickable detection. In existing crawlers, the configurations are usually string-matching based rules looking for tags or attributes of DOM elements, and often application-specific. Moreover, in input topic identification, it can be difficult to determine which rule suggests a better match when several rules match an input field to more than one topic. This paper presents a natural-language approach based on semantic similarity to address the above issues. The proposed approach represents DOM elements as vectors in a vector space formed by the words used in the elements. The topics of encountered input fields during crawling can then be inferred by their similarities with ones in a labeled corpus. Semantic similarity can also be applied to suggest if a GUI state is newly discovered and a DOM element is clickable under an unsupervised learning paradigm. We evaluated the proposed approach in input topic identification with 100 real-world forms and GUI state comparison with real data from industry. Our evaluation shows that the proposed approach has comparable or better performance to the conventional techniques. Experiments in input topic identification also show that the accuracy of the rule-based approach can be improved by up to 22% when integrated with our approach.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133235709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Behavioral Execution Comparison: Are Tests Representative of Field Behavior? 行为执行比较:测试是否代表现场行为?
Qianqian Wang, Yuriy Brun, A. Orso
Software testing is the most widely used approach for assessing and improving software quality, but it is inherently incomplete and may not be representative of how the software is used in the field. This paper addresses the questions of to what extent tests represent how real users use software, and how to measure behavioral differences between test and field executions. We study four real-world systems, one used by endusers and three used by other (client) software, and compare test suites written by the systems' developers to field executions using four models of behavior: statement coverage, method coverage, mutation score, and a temporal-invariant-based model we developed. We find that developer-written test suites fail to accurately represent field executions: the tests, on average, miss 6.2% of the statements and 7.7% of the methods exercised in the field, the behavior exercised only in the field kills an extra 8.6% of the mutants, finally, the tests miss 52.6% of the behavioral invariants that occur in the field. In addition, augmenting the in-house test suites with automatically-generated tests by a tool targeting high code coverage only marginally improves the tests' behavioral representativeness. These differences between field and test executions—and in particular the finer-grained and more sophisticated ones that we measured using our invariantbased model—can provide insight for developers and suggest a better method for measuring test suite quality.
软件测试是用于评估和改进软件质量的最广泛使用的方法,但它本质上是不完整的,并且可能不能代表软件在该领域的使用方式。本文解决了以下问题:测试在多大程度上代表了真实用户如何使用软件,以及如何度量测试和现场执行之间的行为差异。我们研究了四个真实世界的系统,一个由最终用户使用,三个由其他(客户端)软件使用,并使用四个行为模型来比较系统开发人员编写的测试套件和现场执行:语句覆盖、方法覆盖、突变得分,以及我们开发的基于时间不变的模型。我们发现开发人员编写的测试套件不能准确地表示字段执行:平均而言,测试错过了6.2%的语句和7.7%的在字段中执行的方法,仅在字段中执行的行为杀死了额外的8.6%的突变,最后,测试错过了在字段中发生的52.6%的行为不变量。此外,通过一个以高代码覆盖率为目标的工具,用自动生成的测试来增加内部测试套件,只能略微提高测试的行为代表性。字段和测试执行之间的这些差异——特别是我们使用基于不变量的模型测量的更细粒度和更复杂的差异——可以为开发人员提供洞察力,并建议测量测试套件质量的更好方法。
{"title":"Behavioral Execution Comparison: Are Tests Representative of Field Behavior?","authors":"Qianqian Wang, Yuriy Brun, A. Orso","doi":"10.1109/ICST.2017.36","DOIUrl":"https://doi.org/10.1109/ICST.2017.36","url":null,"abstract":"Software testing is the most widely used approach for assessing and improving software quality, but it is inherently incomplete and may not be representative of how the software is used in the field. This paper addresses the questions of to what extent tests represent how real users use software, and how to measure behavioral differences between test and field executions. We study four real-world systems, one used by endusers and three used by other (client) software, and compare test suites written by the systems' developers to field executions using four models of behavior: statement coverage, method coverage, mutation score, and a temporal-invariant-based model we developed. We find that developer-written test suites fail to accurately represent field executions: the tests, on average, miss 6.2% of the statements and 7.7% of the methods exercised in the field, the behavior exercised only in the field kills an extra 8.6% of the mutants, finally, the tests miss 52.6% of the behavioral invariants that occur in the field. In addition, augmenting the in-house test suites with automatically-generated tests by a tool targeting high code coverage only marginally improves the tests' behavioral representativeness. These differences between field and test executions—and in particular the finer-grained and more sophisticated ones that we measured using our invariantbased model—can provide insight for developers and suggest a better method for measuring test suite quality.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122138643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
ADRENALIN-RV: Android Runtime Verification Using Load-Time Weaving 肾上腺素- rv: Android运行时验证使用加载时间编织
Haiyang Sun, Andrea Rosà, Omar Javed, Walter Binder
Android has become one of the most popular operating systems for mobile devices. As the number of applications for the Android ecosystem grows, so is their complexity, increasing the need for runtime verification on the Android platform. Unfortunately, despite the presence of several runtime verification frameworks for Java bytecode, DEX bytecode used in Android does not benefit from such a wide support. While a few runtime verification tools support applications developed for Android, such tools offer only limited bytecode coverage and may not be able to detect property violations in certain classes. In this paper, we show that ADRENALIN-RV, our new runtime verification tool for Android, overcomes this limitation. In contrast to other frameworks, ADRENALIN-RV weaves monitoring code at load time and is able to instrument all loaded classes. In addition to the default classes inside the application package (APK), ADRENALIN-RV covers both the Android class library and libraries dynamically loaded from the storage, network, or generated dynamically, which related tools cannot verify. Evaluation results demonstrate the increased code coverage of ADRENALIN-RV with respect to other runtime validation tools for Android. Thanks to ADRENALIN-RV, we were able to detect violations that cannot be detected by other tools.
Android已经成为移动设备上最流行的操作系统之一。随着Android生态系统中应用程序数量的增长,它们的复杂性也在增加,从而增加了对Android平台上运行时验证的需求。不幸的是,尽管存在几个Java字节码的运行时验证框架,但Android中使用的DEX字节码并没有从如此广泛的支持中受益。虽然一些运行时验证工具支持为Android开发的应用程序,但这些工具仅提供有限的字节码覆盖,并且可能无法检测某些类中的属性违规。在本文中,我们展示了我们新的Android运行时验证工具肾上腺素- rv,克服了这一限制。与其他框架相比,肾上腺素- rv在加载时编织监视代码,并且能够检测所有加载的类。除了应用程序包(APK)中的默认类外,肾上腺素- rv还包括Android类库和从存储、网络动态加载或动态生成的库,相关工具无法验证。评估结果表明,相对于其他Android运行时验证工具,adrenaline - rv增加了代码覆盖率。多亏了肾上腺素- rv,我们能够检测到其他工具无法检测到的违规行为。
{"title":"ADRENALIN-RV: Android Runtime Verification Using Load-Time Weaving","authors":"Haiyang Sun, Andrea Rosà, Omar Javed, Walter Binder","doi":"10.1109/ICST.2017.61","DOIUrl":"https://doi.org/10.1109/ICST.2017.61","url":null,"abstract":"Android has become one of the most popular operating systems for mobile devices. As the number of applications for the Android ecosystem grows, so is their complexity, increasing the need for runtime verification on the Android platform. Unfortunately, despite the presence of several runtime verification frameworks for Java bytecode, DEX bytecode used in Android does not benefit from such a wide support. While a few runtime verification tools support applications developed for Android, such tools offer only limited bytecode coverage and may not be able to detect property violations in certain classes. In this paper, we show that ADRENALIN-RV, our new runtime verification tool for Android, overcomes this limitation. In contrast to other frameworks, ADRENALIN-RV weaves monitoring code at load time and is able to instrument all loaded classes. In addition to the default classes inside the application package (APK), ADRENALIN-RV covers both the Android class library and libraries dynamically loaded from the storage, network, or generated dynamically, which related tools cannot verify. Evaluation results demonstrate the increased code coverage of ADRENALIN-RV with respect to other runtime validation tools for Android. Thanks to ADRENALIN-RV, we were able to detect violations that cannot be detected by other tools.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131234092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Comparative Study of Manual and Automated Testing for Industrial Control Software 工业控制软件人工与自动化测试的比较研究
Eduard Paul Enoiu, Daniel Sundmark, Adnan Causevic, P. Pettersson
Automated test generation has been suggested as a way of creating tests at a lower cost. Nonetheless, it is not very well studied how such tests compare to manually written ones in terms of cost and effectiveness. This is particularly true for industrial control software, where strict requirements on both specification-based testing and code coverage typically are met with rigorous manual testing. To address this issue, we conducted a case study in which we compared manually and automatically created tests. We used recently developed real-world industrial programs written in the IEC 61131-3, a popular programming language for developing industrial control systems using programmable logic controllers. The results show that automatically generated tests achieve similar code coverage as manually created tests, but in a fraction of the time (an average improvement of roughly 90%). We also found that the use of an automated test generation tool does not result in better fault detection in terms of mutation score compared to manual testing. Specifically, manual tests more effectively detect logical, timer and negation type of faults, compared to automatically generated tests. The results underscore the need to further study how manual testing is performed in industrial practice and the extent to which automated test generation can be used in the development of reliable systems.
自动化测试生成被认为是一种以较低成本创建测试的方法。尽管如此,在成本和效率方面,这些测试与手工编写的测试相比如何,还没有得到很好的研究。对于工业控制软件来说尤其如此,在这种情况下,基于规范的测试和代码覆盖的严格要求通常都是通过严格的手动测试来满足的。为了解决这个问题,我们进行了一个案例研究,比较了手动和自动创建的测试。我们使用最近开发的用IEC 61131-3编写的实际工业程序,这是一种用于使用可编程逻辑控制器开发工业控制系统的流行编程语言。结果表明,自动生成的测试实现了与手动创建的测试相似的代码覆盖率,但是所用的时间很短(平均提高了大约90%)。我们还发现,就突变得分而言,与手动测试相比,使用自动化测试生成工具并不会产生更好的故障检测。具体来说,与自动生成的测试相比,手动测试更有效地检测逻辑、计时器和否定类型的错误。结果强调需要进一步研究如何在工业实践中进行手动测试,以及自动化测试生成可以在可靠系统的开发中使用的程度。
{"title":"A Comparative Study of Manual and Automated Testing for Industrial Control Software","authors":"Eduard Paul Enoiu, Daniel Sundmark, Adnan Causevic, P. Pettersson","doi":"10.1109/ICST.2017.44","DOIUrl":"https://doi.org/10.1109/ICST.2017.44","url":null,"abstract":"Automated test generation has been suggested as a way of creating tests at a lower cost. Nonetheless, it is not very well studied how such tests compare to manually written ones in terms of cost and effectiveness. This is particularly true for industrial control software, where strict requirements on both specification-based testing and code coverage typically are met with rigorous manual testing. To address this issue, we conducted a case study in which we compared manually and automatically created tests. We used recently developed real-world industrial programs written in the IEC 61131-3, a popular programming language for developing industrial control systems using programmable logic controllers. The results show that automatically generated tests achieve similar code coverage as manually created tests, but in a fraction of the time (an average improvement of roughly 90%). We also found that the use of an automated test generation tool does not result in better fault detection in terms of mutation score compared to manual testing. Specifically, manual tests more effectively detect logical, timer and negation type of faults, compared to automatically generated tests. The results underscore the need to further study how manual testing is performed in industrial practice and the extent to which automated test generation can be used in the development of reliable systems.","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125690503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Towards a Testbed for Automotive Cybersecurity 迈向汽车网络安全测试平台
D. S. Fowler, Madeline Cheah, S. Shaikh, J. Bryans
Modern automotive platforms are cyber-physical in nature and increasingly connected. Cybersecurity testing of such platforms is expensive and carries safety concerns, making it challenging to perform tests for vulnerabilities and refine test methodologies. We propose a testbed, built over a Controller Area Network (CAN) simulator, and validate it against a real-world demonstration of a weakness in a test vehicle using aftermarket On Board Diagnostic (OBD) scanners (dongles).
现代汽车平台本质上是网络物理的,互联程度越来越高。此类平台的网络安全测试成本高昂,而且存在安全隐患,因此对漏洞进行测试和改进测试方法具有挑战性。我们提出了一个测试平台,建立在控制器局域网(CAN)模拟器上,并使用售后车载诊断(OBD)扫描仪(加密狗)对测试车辆的实际弱点进行验证。
{"title":"Towards a Testbed for Automotive Cybersecurity","authors":"D. S. Fowler, Madeline Cheah, S. Shaikh, J. Bryans","doi":"10.1109/ICST.2017.62","DOIUrl":"https://doi.org/10.1109/ICST.2017.62","url":null,"abstract":"Modern automotive platforms are cyber-physical in nature and increasingly connected. Cybersecurity testing of such platforms is expensive and carries safety concerns, making it challenging to perform tests for vulnerabilities and refine test methodologies. We propose a testbed, built over a Controller Area Network (CAN) simulator, and validate it against a real-world demonstration of a weakness in a test vehicle using aftermarket On Board Diagnostic (OBD) scanners (dongles).","PeriodicalId":112258,"journal":{"name":"2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129541657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
期刊
2017 IEEE International Conference on Software Testing, Verification and Validation (ICST)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1