DyCause: Crowdsourcing to Diagnose Microservice Kernel Failure

IF 7 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Dependable and Secure Computing Pub Date : 2023-11-01 DOI:10.1109/tdsc.2022.3233915
Yicheng Pan, Meng Ma, Xinrui Jiang, Ping Wang
{"title":"DyCause: Crowdsourcing to Diagnose Microservice Kernel Failure","authors":"Yicheng Pan, Meng Ma, Xinrui Jiang, Ping Wang","doi":"10.1109/tdsc.2022.3233915","DOIUrl":null,"url":null,"abstract":"Today many web applications in the cloud (apps) are built based on microservices. However, as the anomaly propagates in a highly dynamic and complex way, troubleshooting them becomes full of challenges. Existing diagnostic methods are mostly designed based on monitoring metrics retrieved from the microservice system kernel. Therefore, application owners and even site reliability engineers (SREs) cannot effectively resort to those methods when the microservice systems lack such a comprehensive monitoring infrastructure. In this article, we develop DyCause, a crowdsourcing solution to the asymmetric diagnostic information problem. Our solution collects the operational status of kernel services collaboratively from the user space and initiates diagnosis on demand. Without the requirement of any architectural or functional infrastructure, it is both fast and lightweight to deploy DyCause in a microservice system. In order to discover the fine-grained dynamic causalities between services during the anomaly, we also design an efficient algorithm based on statistical analysis. Based on this algorithm, we can also analyze the anomaly propagation paths within the microservice system and generate a better interpretable diagnosis. In our evaluation, we test DyCause in a controlled simulation environment and a real-world cloud system. Our results have shown that DyCause has the best accuracy and efficiency among several state-of-the-art methods and is more robust in terms of parameters.","PeriodicalId":13047,"journal":{"name":"IEEE Transactions on Dependable and Secure Computing","volume":null,"pages":null},"PeriodicalIF":7.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Dependable and Secure Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tdsc.2022.3233915","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Today many web applications in the cloud (apps) are built based on microservices. However, as the anomaly propagates in a highly dynamic and complex way, troubleshooting them becomes full of challenges. Existing diagnostic methods are mostly designed based on monitoring metrics retrieved from the microservice system kernel. Therefore, application owners and even site reliability engineers (SREs) cannot effectively resort to those methods when the microservice systems lack such a comprehensive monitoring infrastructure. In this article, we develop DyCause, a crowdsourcing solution to the asymmetric diagnostic information problem. Our solution collects the operational status of kernel services collaboratively from the user space and initiates diagnosis on demand. Without the requirement of any architectural or functional infrastructure, it is both fast and lightweight to deploy DyCause in a microservice system. In order to discover the fine-grained dynamic causalities between services during the anomaly, we also design an efficient algorithm based on statistical analysis. Based on this algorithm, we can also analyze the anomaly propagation paths within the microservice system and generate a better interpretable diagnosis. In our evaluation, we test DyCause in a controlled simulation environment and a real-world cloud system. Our results have shown that DyCause has the best accuracy and efficiency among several state-of-the-art methods and is more robust in terms of parameters.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DyCause:众包诊断微服务内核故障
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing 工程技术-计算机:软件工程
CiteScore
11.20
自引率
5.50%
发文量
354
审稿时长
9 months
期刊介绍: The "IEEE Transactions on Dependable and Secure Computing (TDSC)" is a prestigious journal that publishes high-quality, peer-reviewed research in the field of computer science, specifically targeting the development of dependable and secure computing systems and networks. This journal is dedicated to exploring the fundamental principles, methodologies, and mechanisms that enable the design, modeling, and evaluation of systems that meet the required levels of reliability, security, and performance. The scope of TDSC includes research on measurement, modeling, and simulation techniques that contribute to the understanding and improvement of system performance under various constraints. It also covers the foundations necessary for the joint evaluation, verification, and design of systems that balance performance, security, and dependability. By publishing archival research results, TDSC aims to provide a valuable resource for researchers, engineers, and practitioners working in the areas of cybersecurity, fault tolerance, and system reliability. The journal's focus on cutting-edge research ensures that it remains at the forefront of advancements in the field, promoting the development of technologies that are critical for the functioning of modern, complex systems.
期刊最新文献
Blockchain Based Auditable Access Control For Business Processes With Event Driven Policies. DSChain: A Blockchain System for Complete Lifecycle Security of Data in Internet of Things Privacy-Preserving and Energy-Saving Random Forest-Based Disease Detection Framework for Green Internet of Things in Mobile Healthcare Networks IvyRedaction: Enabling Atomic, Consistent and Accountable Cross-Chain Rewriting Multi-Adjustable Join Schemes With Adaptive Indistinguishably Security
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1