Towards systems level prognostics in the Cloud

Budhaditya Deb, Mohak Shah, S. Evans, Manoj Mehta, Anthony Gargulak, Tom Lasky
{"title":"Towards systems level prognostics in the Cloud","authors":"Budhaditya Deb, Mohak Shah, S. Evans, Manoj Mehta, Anthony Gargulak, Tom Lasky","doi":"10.1109/ICPHM.2013.6621449","DOIUrl":null,"url":null,"abstract":"Many application systems are transforming from device centric architectures to cloud based systems that leverage shared compute resources to reduce cost and maximize reach. These systems require new paradigms to assure availability and quality of service. In this paper, we discuss the challenges in assuring Availability and Quality of Service in a Cloud Based Application System. We propose machine learning techniques for monitoring systems logs to assess the health of the system. A web services data set is employed to show that variety of services can be clustered to different service classes using a k-means clustering scheme. Reliability, Availability, and Serviceability (RAS) logs and Job logs dataset from high performance computing system is employed to show that impending fatal errors in the system can be predicted from the logs using an SVM classifier. These approaches illustrate the feasibility of methods to monitor the systems health and performance of compute resources and hence can be used to manage these systems for high availability and quality of service for critical tasks such as health care monitoring in the cloud.","PeriodicalId":178906,"journal":{"name":"2013 IEEE Conference on Prognostics and Health Management (PHM)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Conference on Prognostics and Health Management (PHM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPHM.2013.6621449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Many application systems are transforming from device centric architectures to cloud based systems that leverage shared compute resources to reduce cost and maximize reach. These systems require new paradigms to assure availability and quality of service. In this paper, we discuss the challenges in assuring Availability and Quality of Service in a Cloud Based Application System. We propose machine learning techniques for monitoring systems logs to assess the health of the system. A web services data set is employed to show that variety of services can be clustered to different service classes using a k-means clustering scheme. Reliability, Availability, and Serviceability (RAS) logs and Job logs dataset from high performance computing system is employed to show that impending fatal errors in the system can be predicted from the logs using an SVM classifier. These approaches illustrate the feasibility of methods to monitor the systems health and performance of compute resources and hence can be used to manage these systems for high availability and quality of service for critical tasks such as health care monitoring in the cloud.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
迈向云中的系统级预测
许多应用程序系统正在从以设备为中心的体系结构转变为基于云的系统,这些系统利用共享计算资源来降低成本并最大化覆盖范围。这些系统需要新的范例来确保服务的可用性和质量。在本文中,我们讨论了在基于云的应用系统中保证可用性和服务质量所面临的挑战。我们建议使用机器学习技术来监控系统日志,以评估系统的健康状况。使用一个web服务数据集来展示使用k-means聚类方案可以将各种服务聚类到不同的服务类。利用高性能计算系统的可靠性、可用性和可服务性(RAS)日志和作业日志数据集,利用SVM分类器可以从日志中预测系统中即将发生的致命错误。这些方法说明了监控系统运行状况和计算资源性能的方法的可行性,因此可用于管理这些系统,以实现关键任务(如云中的医疗保健监控)的高可用性和服务质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A decentralized fault accommodation scheme for nonlinear interconnected systems A circuit-centric approach to electronic system-level diagnostics and prognostics Predictive maintenance policy optimization by discrimination of marginally distinct signals Data mining based fault isolation with FMEA rank: A case study of APU fault identification Complete parametric estimation of the Weibull model with an optimized inspection interval
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1