基于云的环境中的故障预测方法

Adamu Hussaini, Bashir Mohammed, A. M. Bukar, A. Cullen, H. Ugail, I. Awan
{"title":"基于云的环境中的故障预测方法","authors":"Adamu Hussaini, Bashir Mohammed, A. M. Bukar, A. Cullen, H. Ugail, I. Awan","doi":"10.1109/FiCloud.2017.56","DOIUrl":null,"url":null,"abstract":"Failure in cloud system is defined as an even that occurs when the delivered service deviates from the correct intended service. As the cloud computing systems continue to grow in scale and complexity, there is an urgent need for cloud service providers (CSP) to guarantee a reliable on-demand resource to their customers in the presence of faults thereby fulfilling their service level agreement (SLA). Component failures in cloud systems are very familiar phenomena. However, large cloud service providers' data centers should be designed to provide a certain level of availability to the business system. Infrastructure-as-a-service (Iaas) cloud delivery model presents computational resources (CPU and memory), storage resources and networking capacity that ensures high availability in the presence of such failures. The data in-production-faults recorded within a 2 years period has been studied and analyzed from the National Energy Research Scientific computing center (NERSC). Using the real-time data collected from the Computer Failure Data Repository (CFDR), this paper presents the performance of two machine learning (ML) algorithms, Linear Regression (LR) Model and Support Vector Machine (SVM) with a Linear Gaussian kernel for predicting hardware failures in a real-time cloud environment to improve system availability. The performance of the two algorithms have been rigorously evaluated using K-folds cross-validation technique. Furthermore, steps and procedure for future studies has been presented. This research will aid computer hardware companies and cloud service providers (CSP) in designing a reliable fault-tolerant system by providing a better device selection, thereby improving system availability and minimizing unscheduled system downtime.","PeriodicalId":115925,"journal":{"name":"2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"An Approach to Failure Prediction in a Cloud Based Environment\",\"authors\":\"Adamu Hussaini, Bashir Mohammed, A. M. Bukar, A. Cullen, H. Ugail, I. Awan\",\"doi\":\"10.1109/FiCloud.2017.56\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Failure in cloud system is defined as an even that occurs when the delivered service deviates from the correct intended service. As the cloud computing systems continue to grow in scale and complexity, there is an urgent need for cloud service providers (CSP) to guarantee a reliable on-demand resource to their customers in the presence of faults thereby fulfilling their service level agreement (SLA). Component failures in cloud systems are very familiar phenomena. However, large cloud service providers' data centers should be designed to provide a certain level of availability to the business system. Infrastructure-as-a-service (Iaas) cloud delivery model presents computational resources (CPU and memory), storage resources and networking capacity that ensures high availability in the presence of such failures. The data in-production-faults recorded within a 2 years period has been studied and analyzed from the National Energy Research Scientific computing center (NERSC). Using the real-time data collected from the Computer Failure Data Repository (CFDR), this paper presents the performance of two machine learning (ML) algorithms, Linear Regression (LR) Model and Support Vector Machine (SVM) with a Linear Gaussian kernel for predicting hardware failures in a real-time cloud environment to improve system availability. The performance of the two algorithms have been rigorously evaluated using K-folds cross-validation technique. Furthermore, steps and procedure for future studies has been presented. This research will aid computer hardware companies and cloud service providers (CSP) in designing a reliable fault-tolerant system by providing a better device selection, thereby improving system availability and minimizing unscheduled system downtime.\",\"PeriodicalId\":115925,\"journal\":{\"name\":\"2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FiCloud.2017.56\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FiCloud.2017.56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

云系统中的故障被定义为当交付的服务偏离正确的预期服务时发生的故障。随着云计算系统规模和复杂性的不断增长,云服务提供商(CSP)迫切需要在出现故障时为客户提供可靠的按需资源,从而履行其服务水平协议(SLA)。云系统中的组件故障是非常常见的现象。但是,大型云服务提供商的数据中心应该设计为为业务系统提供一定级别的可用性。基础设施即服务(Iaas)云交付模型提供计算资源(CPU和内存)、存储资源和网络容量,以确保出现此类故障时的高可用性。对国家能源研究科学计算中心(NERSC)记录的2年内生产故障数据进行了研究和分析。利用从计算机故障数据存储库(CFDR)收集的实时数据,本文介绍了两种机器学习(ML)算法的性能,线性回归(LR)模型和支持向量机(SVM)具有线性高斯核,用于预测实时云环境中的硬件故障,以提高系统可用性。使用k -fold交叉验证技术严格评估了这两种算法的性能。并提出了今后研究的步骤和步骤。这项研究将帮助计算机硬件公司和云服务提供商(CSP)通过提供更好的设备选择来设计可靠的容错系统,从而提高系统可用性并最大限度地减少计划外系统停机时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An Approach to Failure Prediction in a Cloud Based Environment
Failure in cloud system is defined as an even that occurs when the delivered service deviates from the correct intended service. As the cloud computing systems continue to grow in scale and complexity, there is an urgent need for cloud service providers (CSP) to guarantee a reliable on-demand resource to their customers in the presence of faults thereby fulfilling their service level agreement (SLA). Component failures in cloud systems are very familiar phenomena. However, large cloud service providers' data centers should be designed to provide a certain level of availability to the business system. Infrastructure-as-a-service (Iaas) cloud delivery model presents computational resources (CPU and memory), storage resources and networking capacity that ensures high availability in the presence of such failures. The data in-production-faults recorded within a 2 years period has been studied and analyzed from the National Energy Research Scientific computing center (NERSC). Using the real-time data collected from the Computer Failure Data Repository (CFDR), this paper presents the performance of two machine learning (ML) algorithms, Linear Regression (LR) Model and Support Vector Machine (SVM) with a Linear Gaussian kernel for predicting hardware failures in a real-time cloud environment to improve system availability. The performance of the two algorithms have been rigorously evaluated using K-folds cross-validation technique. Furthermore, steps and procedure for future studies has been presented. This research will aid computer hardware companies and cloud service providers (CSP) in designing a reliable fault-tolerant system by providing a better device selection, thereby improving system availability and minimizing unscheduled system downtime.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Edge-Supported Approximate Analysis for Long Running Computations A Holistic Monitoring Service for Fog/Edge Infrastructures: A Foresight Study Intelligent Checkpointing Strategies for IoT System Management Production Deployment Tools for IaaSes: An Overall Model and Survey An Empirical Study of Cultural Dimensions and Cybersecurity Development
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1