基于云的环境中的故障预测方法

2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud) Pub Date : 2017-08-01 DOI:10.1109/FiCloud.2017.56

Adamu Hussaini, Bashir Mohammed, A. M. Bukar, A. Cullen, H. Ugail, I. Awan

{"title":"基于云的环境中的故障预测方法","authors":"Adamu Hussaini, Bashir Mohammed, A. M. Bukar, A. Cullen, H. Ugail, I. Awan","doi":"10.1109/FiCloud.2017.56","DOIUrl":null,"url":null,"abstract":"Failure in cloud system is defined as an even that occurs when the delivered service deviates from the correct intended service. As the cloud computing systems continue to grow in scale and complexity, there is an urgent need for cloud service providers (CSP) to guarantee a reliable on-demand resource to their customers in the presence of faults thereby fulfilling their service level agreement (SLA). Component failures in cloud systems are very familiar phenomena. However, large cloud service providers' data centers should be designed to provide a certain level of availability to the business system. Infrastructure-as-a-service (Iaas) cloud delivery model presents computational resources (CPU and memory), storage resources and networking capacity that ensures high availability in the presence of such failures. The data in-production-faults recorded within a 2 years period has been studied and analyzed from the National Energy Research Scientific computing center (NERSC). Using the real-time data collected from the Computer Failure Data Repository (CFDR), this paper presents the performance of two machine learning (ML) algorithms, Linear Regression (LR) Model and Support Vector Machine (SVM) with a Linear Gaussian kernel for predicting hardware failures in a real-time cloud environment to improve system availability. The performance of the two algorithms have been rigorously evaluated using K-folds cross-validation technique. Furthermore, steps and procedure for future studies has been presented. This research will aid computer hardware companies and cloud service providers (CSP) in designing a reliable fault-tolerant system by providing a better device selection, thereby improving system availability and minimizing unscheduled system downtime.","PeriodicalId":115925,"journal":{"name":"2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"An Approach to Failure Prediction in a Cloud Based Environment\",\"authors\":\"Adamu Hussaini, Bashir Mohammed, A. M. Bukar, A. Cullen, H. Ugail, I. Awan\",\"doi\":\"10.1109/FiCloud.2017.56\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Failure in cloud system is defined as an even that occurs when the delivered service deviates from the correct intended service. As the cloud computing systems continue to grow in scale and complexity, there is an urgent need for cloud service providers (CSP) to guarantee a reliable on-demand resource to their customers in the presence of faults thereby fulfilling their service level agreement (SLA). Component failures in cloud systems are very familiar phenomena. However, large cloud service providers' data centers should be designed to provide a certain level of availability to the business system. Infrastructure-as-a-service (Iaas) cloud delivery model presents computational resources (CPU and memory), storage resources and networking capacity that ensures high availability in the presence of such failures. The data in-production-faults recorded within a 2 years period has been studied and analyzed from the National Energy Research Scientific computing center (NERSC). Using the real-time data collected from the Computer Failure Data Repository (CFDR), this paper presents the performance of two machine learning (ML) algorithms, Linear Regression (LR) Model and Support Vector Machine (SVM) with a Linear Gaussian kernel for predicting hardware failures in a real-time cloud environment to improve system availability. The performance of the two algorithms have been rigorously evaluated using K-folds cross-validation technique. Furthermore, steps and procedure for future studies has been presented. This research will aid computer hardware companies and cloud service providers (CSP) in designing a reliable fault-tolerant system by providing a better device selection, thereby improving system availability and minimizing unscheduled system downtime.\",\"PeriodicalId\":115925,\"journal\":{\"name\":\"2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FiCloud.2017.56\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FiCloud.2017.56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

云系统中的故障被定义为当交付的服务偏离正确的预期服务时发生的故障。随着云计算系统规模和复杂性的不断增长，云服务提供商(CSP)迫切需要在出现故障时为客户提供可靠的按需资源，从而履行其服务水平协议(SLA)。云系统中的组件故障是非常常见的现象。但是，大型云服务提供商的数据中心应该设计为为业务系统提供一定级别的可用性。基础设施即服务(Iaas)云交付模型提供计算资源(CPU和内存)、存储资源和网络容量，以确保出现此类故障时的高可用性。对国家能源研究科学计算中心(NERSC)记录的2年内生产故障数据进行了研究和分析。利用从计算机故障数据存储库(CFDR)收集的实时数据，本文介绍了两种机器学习(ML)算法的性能，线性回归(LR)模型和支持向量机(SVM)具有线性高斯核，用于预测实时云环境中的硬件故障，以提高系统可用性。使用k -fold交叉验证技术严格评估了这两种算法的性能。并提出了今后研究的步骤和步骤。这项研究将帮助计算机硬件公司和云服务提供商(CSP)通过提供更好的设备选择来设计可靠的容错系统，从而提高系统可用性并最大限度地减少计划外系统停机时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Approach to Failure Prediction in a Cloud Based Environment

Failure in cloud system is defined as an even that occurs when the delivered service deviates from the correct intended service. As the cloud computing systems continue to grow in scale and complexity, there is an urgent need for cloud service providers (CSP) to guarantee a reliable on-demand resource to their customers in the presence of faults thereby fulfilling their service level agreement (SLA). Component failures in cloud systems are very familiar phenomena. However, large cloud service providers' data centers should be designed to provide a certain level of availability to the business system. Infrastructure-as-a-service (Iaas) cloud delivery model presents computational resources (CPU and memory), storage resources and networking capacity that ensures high availability in the presence of such failures. The data in-production-faults recorded within a 2 years period has been studied and analyzed from the National Energy Research Scientific computing center (NERSC). Using the real-time data collected from the Computer Failure Data Repository (CFDR), this paper presents the performance of two machine learning (ML) algorithms, Linear Regression (LR) Model and Support Vector Machine (SVM) with a Linear Gaussian kernel for predicting hardware failures in a real-time cloud environment to improve system availability. The performance of the two algorithms have been rigorously evaluated using K-folds cross-validation technique. Furthermore, steps and procedure for future studies has been presented. This research will aid computer hardware companies and cloud service providers (CSP) in designing a reliable fault-tolerant system by providing a better device selection, thereby improving system availability and minimizing unscheduled system downtime.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud)

自引率

0.00%

发文量

期刊最新文献

Edge-Supported Approximate Analysis for Long Running Computations A Holistic Monitoring Service for Fog/Edge Infrastructures: A Foresight Study Intelligent Checkpointing Strategies for IoT System Management Production Deployment Tools for IaaSes: An Overall Model and Survey An Empirical Study of Cultural Dimensions and Cybersecurity Development