2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)最新文献

英文中文

Charting a Course Through Uncertain Environments: SEA Uses Past Problems to Avoid Future Failures 在不确定的环境中制定路线:SEA利用过去的问题来避免未来的失败

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00011

P. Moore, Justin Cappos, P. Frankl, Thomas Wies

A common problem for developers is applications exhibiting new bugs after deployment. Many of these bugs can be traced to unexpected network, operating system, and file system differences that cause program executions that were successful in a development environment to fail once deployed. Preventing these bugs is difficult because it is impractical to test an application in every environment. Enter Simulating Environmental Anomalies (SEA), a technique that utilizes evidence of one application's failure in a given environment to generate tests that can be applied to other applications, to see whether they suffer from analogous faults. In SEA, models of unusual properties extracted from interactions between an application, A, and its environment guide simulations of another application, B, running in the anomalous environment. This reveals faults B may experience in this environment without the expense of deployment. By accumulating these anomalies, applications can be tested against an increasing set of problematic conditions. We implemented a tool called CrashSimulator, which uses SEA, and evaluated it against Linux applications selected from coreutils and the Debian popularity contest. Our tests found a total of 63 bugs in 31 applications with effects including hangs, crashes, data loss, and remote denial of service conditions.

开发人员的一个常见问题是应用程序在部署后出现新的bug。这些错误中的许多可以追溯到意外的网络、操作系统和文件系统差异，这些差异导致在开发环境中成功的程序执行在部署后失败。防止这些错误是很困难的，因为在每个环境中测试应用程序是不切实际的。进入模拟环境异常(SEA)，这是一种技术，它利用给定环境中一个应用程序故障的证据来生成可应用于其他应用程序的测试，以查看它们是否遭受类似故障。在SEA中，从应用程序A及其环境之间的交互中提取的异常属性模型指导在异常环境中运行的另一个应用程序B的模拟。这揭示了B在此环境中可能遇到的错误，而无需花费部署费用。通过累积这些异常，应用程序可以针对越来越多的问题条件进行测试。我们实现了一个名为CrashSimulator的工具，它使用SEA，并将其与从coretils和Debian流行度竞赛中选择的Linux应用程序进行了评估。我们的测试在31个应用程序中发现了总共63个bug，其影响包括挂起、崩溃、数据丢失和远程拒绝服务条件。

{"title":"Charting a Course Through Uncertain Environments: SEA Uses Past Problems to Avoid Future Failures","authors":"P. Moore, Justin Cappos, P. Frankl, Thomas Wies","doi":"10.1109/ISSRE.2019.00011","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00011","url":null,"abstract":"A common problem for developers is applications exhibiting new bugs after deployment. Many of these bugs can be traced to unexpected network, operating system, and file system differences that cause program executions that were successful in a development environment to fail once deployed. Preventing these bugs is difficult because it is impractical to test an application in every environment. Enter Simulating Environmental Anomalies (SEA), a technique that utilizes evidence of one application's failure in a given environment to generate tests that can be applied to other applications, to see whether they suffer from analogous faults. In SEA, models of unusual properties extracted from interactions between an application, A, and its environment guide simulations of another application, B, running in the anomalous environment. This reveals faults B may experience in this environment without the expense of deployment. By accumulating these anomalies, applications can be tested against an increasing set of problematic conditions. We implemented a tool called CrashSimulator, which uses SEA, and evaluated it against Linux applications selected from coreutils and the Debian popularity contest. Our tests found a total of 63 bugs in 31 applications with effects including hangs, crashes, data loss, and remote denial of service conditions.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123153394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Understanding and Improving Regression Test Selection in Continuous Integration 持续集成中回归测试选择的理解与改进

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00031

A. Shi, Peiyuan Zhao, D. Marinov

Developers rely on regression testing in their continuous integration (CI) environment to find changes that introduce regression faults. While regression testing is widely practiced, it can be costly. Regression test selection (RTS) reduces the cost of regression testing by not running the tests that are unaffected by the changes. Industry has adopted module-level RTS for their CI environment, while researchers have proposed class-level RTS. In this paper, we compare module-and class-level RTS techniques in a cloud-based CI environment, Travis. We also develop and evaluate a hybrid RTS technique that combines aspects of the module-and class-level RTS techniques. We evaluate all the techniques on real Travis builds. We find that the RTS techniques do save testing time compared to running all tests (RetestAll), but the percentage of time for a full build using RTS (76.0%) is not as low as found in previous work, due to the extra overhead in a cloud-based CI environment. Moreover, we inspect test failures from RetestAll builds, and although we find that RTS techniques can miss to select failed tests, these test failures are almost all flaky test failures. As such, RTS techniques provide additional value in helping developers avoid wasting time debugging failures not related to the recent code changes. Overall, our results show that RTS can be beneficial for the developers in the CI environment, and RTS not only saves time but also avoids misleading developers by flaky test failures.

开发人员依靠持续集成(CI)环境中的回归测试来发现引入回归错误的更改。虽然回归测试得到了广泛的实践，但它可能是昂贵的。回归测试选择(RTS)通过不运行不受更改影响的测试来降低回归测试的成本。工业界已经为他们的CI环境采用了模块级RTS，而研究人员则提出了类级RTS。在本文中，我们比较了基于云的CI环境Travis中的模块级和类级RTS技术。我们还开发和评估了一种混合RTS技术，该技术结合了模块级和类级RTS技术的各个方面。我们评估了所有真正的特拉维斯建筑技术。我们发现，与运行所有测试(RetestAll)相比，RTS技术确实节省了测试时间，但使用RTS进行完整构建的时间百分比(76.0%)并不像以前的工作那样低，这是由于在基于云的CI环境中有额外的开销。此外，我们从RetestAll构建中检查测试失败，尽管我们发现RTS技术可能会错过选择失败的测试，但这些测试失败几乎都是零散的测试失败。因此，RTS技术在帮助开发人员避免浪费时间调试与最近代码更改无关的故障方面提供了额外的价值。总的来说，我们的结果表明，RTS对CI环境中的开发人员是有益的，RTS不仅节省了时间，还避免了因零散的测试失败而误导开发人员。

{"title":"Understanding and Improving Regression Test Selection in Continuous Integration","authors":"A. Shi, Peiyuan Zhao, D. Marinov","doi":"10.1109/ISSRE.2019.00031","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00031","url":null,"abstract":"Developers rely on regression testing in their continuous integration (CI) environment to find changes that introduce regression faults. While regression testing is widely practiced, it can be costly. Regression test selection (RTS) reduces the cost of regression testing by not running the tests that are unaffected by the changes. Industry has adopted module-level RTS for their CI environment, while researchers have proposed class-level RTS. In this paper, we compare module-and class-level RTS techniques in a cloud-based CI environment, Travis. We also develop and evaluate a hybrid RTS technique that combines aspects of the module-and class-level RTS techniques. We evaluate all the techniques on real Travis builds. We find that the RTS techniques do save testing time compared to running all tests (RetestAll), but the percentage of time for a full build using RTS (76.0%) is not as low as found in previous work, due to the extra overhead in a cloud-based CI environment. Moreover, we inspect test failures from RetestAll builds, and although we find that RTS techniques can miss to select failed tests, these test failures are almost all flaky test failures. As such, RTS techniques provide additional value in helping developers avoid wasting time debugging failures not related to the recent code changes. Overall, our results show that RTS can be beneficial for the developers in the CI environment, and RTS not only saves time but also avoids misleading developers by flaky test failures.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125454815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

ISSRE 2019 External Reviewers ISSRE 2019外部审稿人

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/issre.2019.00009

引用次数: 0

Trustworthiness Assessment of Web Applications: Approach and Experimental Study using Input Validation Coding Practices Web应用程序的可信度评估:使用输入验证编码实践的方法和实验研究

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00050

C. Lemes, Vincent Naessens, M. Vieira

The popularity of web applications and their world-wide use to support business critical operations raised the interest of hackers on exploiting security vulnerabilities to perform malicious operations. Fostering trust calls for assessment techniques that provide indicators about the quality of a web application from a security perspective. This paper studies the problem of using coding practices to characterize the trustworthiness of web applications from a security perspective. The hypothesis is that applying feasible security practices results in applications having a reduced number of unknown vulnerabilities, and can therefore be considered more trustworthy. The proposed approach is instantiated for the concrete case of input validation practices, and includes a Quality Model to compute trustworthiness scores that can be used to compare different applications or different code elements in the same application. Experimental results show that the higher scores are obtained for more secure code, suggesting that it can be used in practice to characterize trustworthiness, also providing guidance to compare and/or improve the security of web applications.

web应用程序的流行及其在全球范围内支持关键业务操作的使用，提高了黑客利用安全漏洞执行恶意操作的兴趣。培养信任需要从安全角度提供web应用程序质量指标的评估技术。本文从安全的角度研究了使用编码实践来表征web应用程序的可信性的问题。假设是，应用可行的安全实践会导致应用程序具有较少的未知漏洞，因此可以被认为是更值得信赖的。所提出的方法针对输入验证实践的具体案例进行了实例化，并包括一个质量模型，用于计算可用于比较不同应用程序或同一应用程序中的不同代码元素的可信度分数。实验结果表明，越安全的代码获得的分数越高，表明该方法可以在实践中用于表征可信性，也为比较和/或提高web应用程序的安全性提供指导。

{"title":"Trustworthiness Assessment of Web Applications: Approach and Experimental Study using Input Validation Coding Practices","authors":"C. Lemes, Vincent Naessens, M. Vieira","doi":"10.1109/ISSRE.2019.00050","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00050","url":null,"abstract":"The popularity of web applications and their world-wide use to support business critical operations raised the interest of hackers on exploiting security vulnerabilities to perform malicious operations. Fostering trust calls for assessment techniques that provide indicators about the quality of a web application from a security perspective. This paper studies the problem of using coding practices to characterize the trustworthiness of web applications from a security perspective. The hypothesis is that applying feasible security practices results in applications having a reduced number of unknown vulnerabilities, and can therefore be considered more trustworthy. The proposed approach is instantiated for the concrete case of input validation practices, and includes a Quality Model to compute trustworthiness scores that can be used to compare different applications or different code elements in the same application. Experimental results show that the higher scores are obtained for more secure code, suggesting that it can be used in practice to characterize trustworthiness, also providing guidance to compare and/or improve the security of web applications.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127185006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

An Empirical Study of Common Challenges in Developing Deep Learning Applications 深度学习应用开发中常见挑战的实证研究

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00020

Tianyi Zhang, Cuiyun Gao, Lei Ma, Michael R. Lyu, Miryung Kim

Recent advances in deep learning promote the innovation of many intelligent systems and applications such as autonomous driving and image recognition. Despite enormous efforts and investments in this field, a fundamental question remains under-investigated—what challenges do developers commonly face when building deep learning applications? To seek an answer, this paper presents a large-scale empirical study of deep learning questions in a popular Q&A website, Stack Overflow. We manually inspect a sample of 715 questions and identify seven kinds of frequently asked questions. We further build a classification model to quantify the distribution of different kinds of deep learning questions in the entire set of 39,628 deep learning questions. We find that program crashes, model migration, and implementation questions are the top three most frequently asked questions. After carefully examining accepted answers of these questions, we summarize five main root causes that may deserve attention from the research community, including API misuse, incorrect hyperparameter selection, GPU computation, static graph computation, and limited debugging and profiling support. Our results highlight the need for new techniques such as cross-framework differential testing to improve software development productivity and software reliability in deep learning.

深度学习的最新进展推动了许多智能系统和应用的创新，如自动驾驶和图像识别。尽管在这个领域付出了巨大的努力和投资，但一个基本的问题仍然没有得到充分的研究——开发人员在构建深度学习应用程序时通常面临哪些挑战?为了寻找答案，本文对一个流行的问答网站Stack Overflow上的深度学习问题进行了大规模的实证研究。我们手动检查了715个问题的样本，并确定了7种常见问题。我们进一步建立了一个分类模型来量化不同类型的深度学习问题在整个39,628个深度学习问题集中的分布。我们发现程序崩溃、模型迁移和实现问题是最常被问到的三个问题。在仔细检查了这些问题的公认答案后，我们总结了可能值得研究界关注的五个主要根本原因，包括API滥用，不正确的超参数选择，GPU计算，静态图形计算以及有限的调试和分析支持。我们的研究结果强调了对跨框架差分测试等新技术的需求，以提高深度学习中的软件开发效率和软件可靠性。

{"title":"An Empirical Study of Common Challenges in Developing Deep Learning Applications","authors":"Tianyi Zhang, Cuiyun Gao, Lei Ma, Michael R. Lyu, Miryung Kim","doi":"10.1109/ISSRE.2019.00020","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00020","url":null,"abstract":"Recent advances in deep learning promote the innovation of many intelligent systems and applications such as autonomous driving and image recognition. Despite enormous efforts and investments in this field, a fundamental question remains under-investigated—what challenges do developers commonly face when building deep learning applications? To seek an answer, this paper presents a large-scale empirical study of deep learning questions in a popular Q&A website, Stack Overflow. We manually inspect a sample of 715 questions and identify seven kinds of frequently asked questions. We further build a classification model to quantify the distribution of different kinds of deep learning questions in the entire set of 39,628 deep learning questions. We find that program crashes, model migration, and implementation questions are the top three most frequently asked questions. After carefully examining accepted answers of these questions, we summarize five main root causes that may deserve attention from the research community, including API misuse, incorrect hyperparameter selection, GPU computation, static graph computation, and limited debugging and profiling support. Our results highlight the need for new techniques such as cross-framework differential testing to improve software development productivity and software reliability in deep learning.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115021012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 111

Symbolic Execution for Importance Analysis and Adversarial Generation in Neural Networks 神经网络中重要性分析和对抗生成的符号执行

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00039

D. Gopinath, Mengshi Zhang, Kaiyuan Wang, Ismet Burak Kadron, C. Pasareanu, S. Khurshid

Deep Neural Networks (DNN) are increasingly used in a variety of applications, many of them with serious safety and security concerns. This paper describes DeepCheck, a new approach for validating DNNs based on core ideas from program analysis, specifically from symbolic execution. DeepCheck implements novel techniques for lightweight symbolic analysis of DNNs and applies them to address two challenging problems in DNN analysis: 1) identification of important input features and 2) leveraging those features to create adversarial inputs. Experimental results with an MNIST image classification network and a sentiment network for textual data show that DeepCheck promises to be a valuable tool for DNN analysis.

深度神经网络(DNN)越来越多地应用于各种应用中，其中许多应用具有严重的安全性和安全性问题。本文描述了DeepCheck，这是一种基于程序分析(特别是符号执行)的核心思想来验证dnn的新方法。DeepCheck实现了DNN轻量级符号分析的新技术，并将其应用于解决DNN分析中的两个具有挑战性的问题:1)识别重要的输入特征，2)利用这些特征创建对抗性输入。使用MNIST图像分类网络和文本数据情感网络进行的实验结果表明，DeepCheck有望成为深度神经网络分析的一个有价值的工具。

引用次数: 17

A Tale of Two Injectors: End-to-End Comparison of IR-Level and Assembly-Level Fault Injection 两个注入器的故事:ir级和装配级故障注入的端到端比较

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00024

Lucas Palazzi, Guanpeng Li, Bo Fang, K. Pattabiraman

Fault injection (FI) is a commonly used experimental technique to evaluate the resilience of software techniques for tolerating hardware faults. Software-implemented FI can be performed at different levels of abstraction in the system stack; FI performed at the compiler's intermediate representation (IR) level has the advantage that it is closer to the program being evaluated and is hence easier to derive insights from for the design of software fault-tolerance mechanisms. Unfortunately, it is not clear how accurate IR-level FI is vis-a-vis FI performed at the assembly code level, and prior work has presented contradictory findings. In this paper, we perform an analysis of said prior work, find an inconsistency in the FI methodology used in one study, and show that it results in a flawed comparison between IR-level and assembly-level FI. We further confirm this finding by performing a comprehensive evaluation of the accuracy of IR-level FI across a range of benchmark programs and compiler optimization levels. Our results show that IR-level FI is as accurate as assembly-level FI for silent data corruptions (SDCs) across different benchmarks and optimization levels.

故障注入(FI)是一种常用的实验技术，用于评估软件技术对硬件故障的容错能力。软件实现的FI可以在系统堆栈的不同抽象层次上执行;在编译器的中间表示(IR)级别执行的FI的优点是，它更接近被评估的程序，因此更容易从软件容错机制的设计中获得见解。不幸的是，目前尚不清楚ir级FI相对于汇编代码级执行的FI有多准确，并且先前的工作提出了相互矛盾的发现。在本文中，我们对上述先前的工作进行了分析，发现一项研究中使用的FI方法存在不一致之处，并表明它导致ir级和装配级FI之间的比较存在缺陷。通过在一系列基准程序和编译器优化级别上对ir级FI的准确性进行全面评估，我们进一步证实了这一发现。我们的研究结果表明，对于不同基准测试和优化级别的静默数据损坏(sdc)， ir级FI与组装级FI一样准确。

{"title":"A Tale of Two Injectors: End-to-End Comparison of IR-Level and Assembly-Level Fault Injection","authors":"Lucas Palazzi, Guanpeng Li, Bo Fang, K. Pattabiraman","doi":"10.1109/ISSRE.2019.00024","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00024","url":null,"abstract":"Fault injection (FI) is a commonly used experimental technique to evaluate the resilience of software techniques for tolerating hardware faults. Software-implemented FI can be performed at different levels of abstraction in the system stack; FI performed at the compiler's intermediate representation (IR) level has the advantage that it is closer to the program being evaluated and is hence easier to derive insights from for the design of software fault-tolerance mechanisms. Unfortunately, it is not clear how accurate IR-level FI is vis-a-vis FI performed at the assembly code level, and prior work has presented contradictory findings. In this paper, we perform an analysis of said prior work, find an inconsistency in the FI methodology used in one study, and show that it results in a flawed comparison between IR-level and assembly-level FI. We further confirm this finding by performing a comprehensive evaluation of the accuracy of IR-level FI across a range of benchmark programs and compiler optimization levels. Our results show that IR-level FI is as accurate as assembly-level FI for silent data corruptions (SDCs) across different benchmarks and optimization levels.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134036268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

FluxRank: A Widely-Deployable Framework to Automatically Localizing Root Cause Machines for Software Service Failure Mitigation FluxRank:一个可广泛部署的框架，用于自动定位软件服务故障缓解的根本原因机器

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00014

Ping Liu, Yu Chen, Xiaohui Nie, Jing Zhu, Shenglin Zhang, Kaixin Sui, Ming Zhang, Dan Pei

The failures of software service directly affect user experiences and service revenue. Thus operators monitor both service-level KPIs (e.g., response time) and machine-level KPIs (e.g., CPU usage) on each machine underlying the service. When a service fails, the operators must localize the root cause machines, and mitigate the failure as quickly as possible. Existing approaches have limited application due to the difficulty to obtain the required additional measurement data. As a result, failure localization is largely manual and very time-consuming. This paper presents FluxRank, a widely-deployable framework that can automatically and accurately localize the root cause machines, so that some actions can be triggered to mitigate the service failure. Our evaluation using historical cases from five real services (with tens of thousands of machines) of a top search company shows that the root cause machines are ranked top 1 (top 3) for 55 (66) cases out of 70 cases. Comparing to existing approaches, FluxRank cuts the localization time by more than 80% on average. FluxRank has been deployed online at one Internet service and six banking services for three months, and correctly localized the root cause machines as the top 1 for 55 cases out of 59 cases.

软件服务的失败直接影响用户体验和服务收益。因此，操作员监控服务级别的kpi(例如，响应时间)和机器级别的kpi(例如，CPU使用情况)。当服务发生故障时，操作员必须定位根本原因机器，并尽可能快地减轻故障。由于难以获得所需的额外测量数据，现有方法的应用受到限制。因此，故障定位在很大程度上是手工的，而且非常耗时。本文介绍了一个可广泛部署的框架FluxRank，它可以自动准确地定位根本原因机器，从而触发一些操作来减轻服务故障。我们使用一家顶级搜索公司的五个真实服务(拥有数万台机器)的历史案例进行评估，结果显示，在70个案例中，有55个(66个)案例中，根本原因机器排名前1(前3)。与现有方法相比，FluxRank的定位时间平均缩短了80%以上。FluxRank在3个月的时间里，在一家互联网服务公司和6家银行服务公司进行了在线部署，在59起案件中，有55起案件的根本原因机器被正确定位为前1。

{"title":"FluxRank: A Widely-Deployable Framework to Automatically Localizing Root Cause Machines for Software Service Failure Mitigation","authors":"Ping Liu, Yu Chen, Xiaohui Nie, Jing Zhu, Shenglin Zhang, Kaixin Sui, Ming Zhang, Dan Pei","doi":"10.1109/ISSRE.2019.00014","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00014","url":null,"abstract":"The failures of software service directly affect user experiences and service revenue. Thus operators monitor both service-level KPIs (e.g., response time) and machine-level KPIs (e.g., CPU usage) on each machine underlying the service. When a service fails, the operators must localize the root cause machines, and mitigate the failure as quickly as possible. Existing approaches have limited application due to the difficulty to obtain the required additional measurement data. As a result, failure localization is largely manual and very time-consuming. This paper presents FluxRank, a widely-deployable framework that can automatically and accurately localize the root cause machines, so that some actions can be triggered to mitigate the service failure. Our evaluation using historical cases from five real services (with tens of thousands of machines) of a top search company shows that the root cause machines are ranked top 1 (top 3) for 55 (66) cases out of 70 cases. Comparing to existing approaches, FluxRank cuts the localization time by more than 80% on average. FluxRank has been deployed online at one Internet service and six banking services for three months, and correctly localized the root cause machines as the top 1 for 55 cases out of 59 cases.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"63 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128238837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

ISSRE 2019 Program Committee ISSRE 2019项目委员会

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/issre.2019.00008

引用次数: 0

Integrating Safety Certification Into Model-Based Testing of Safety-Critical Systems 将安全认证集成到安全关键系统的基于模型的测试中

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00033

Aiman Gannous, A. Andrews

Testing plays an important role in assuring the safety of safety-critical systems (SCS). Testing SCSs should include tasks to test how the system operates in the presence of failures. With the increase of autonomous, sensing-based functionality in safety-critical systems, efficient and cost-effective testing that maximizes safety evidences has become increasingly challenging. A previously proposed framework for testing safety-critical systems called Model-Combinatorial based testing (MCbt) has the potential for addressing these challenges. MCbt is a framework that proposes an integration of model-based testing, fault analysis, and combinatorial testing to produce the maximum number of evidences for an efficient safety certification process but was never actually used to derive a specific testing approach. In this paper, we present a concrete application of MCbt with an application to a case study. The validation showed that MCbt is more efficient and produces more safety evidences compared to state-of-the-art testing approaches.

测试在确保安全关键系统(SCS)的安全性方面起着重要作用。测试scs应包括测试系统在出现故障时如何运行的任务。随着安全关键系统中基于传感的自主功能的增加，最大化安全证据的高效和经济测试变得越来越具有挑战性。先前提出的用于测试安全关键系统的框架称为基于模型组合的测试(MCbt)，具有解决这些挑战的潜力。MCbt是一个框架，它提出了基于模型的测试、故障分析和组合测试的集成，为有效的安全认证过程提供最大数量的证据，但实际上从未用于派生特定的测试方法。在本文中，我们介绍了MCbt的一个具体应用，并给出了一个应用案例。验证表明，与最先进的检测方法相比，MCbt更有效，产生更多的安全性证据。

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀