Machine learning job failure analysis and prediction model for the cloud environment

IF 3.2 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS High-Confidence Computing Pub Date : 2023-09-27 DOI:10.1016/j.hcc.2023.100165
Harikrishna Bommala , Uma Maheswari V. , Rajanikanth Aluvalu , Swapna Mudrakola
{"title":"Machine learning job failure analysis and prediction model for the cloud environment","authors":"Harikrishna Bommala ,&nbsp;Uma Maheswari V. ,&nbsp;Rajanikanth Aluvalu ,&nbsp;Swapna Mudrakola","doi":"10.1016/j.hcc.2023.100165","DOIUrl":null,"url":null,"abstract":"<div><p>Reliable and accessible cloud applications are essential for the future of ubiquitous computing, smart appliances, and electronic health. Owing to the vastness and diversity of the cloud, a most cloud services, both physical and logical services have failed. Using currently accessible traces, we assessed and characterized the behaviors of successful and unsuccessful activities. We devised and implemented a method to forecast which jobs will fail. The proposed method optimizes cloud applications more efficiently in terms of resource usage. Using Google Cluster, Mustang, and Trinity traces, which are publicly available, an in-depth evaluation of the proposed model was conducted. The traces were also fed into several different machine learning models to select the most reliable model. Our efficiency analysis proves that the model performs well in terms of accuracy, F1-score, and recall. Several factors, such as failure of forecasting work, design of scheduling algorithms, modification of priority criteria, and restriction of task resubmission, may increase cloud service dependability and availability.</p></div>","PeriodicalId":100605,"journal":{"name":"High-Confidence Computing","volume":"3 4","pages":"Article 100165"},"PeriodicalIF":3.2000,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667295223000636/pdfft?md5=bfe61b5b8fb7fd53b685e1c9be60171b&pid=1-s2.0-S2667295223000636-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"High-Confidence Computing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667295223000636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Reliable and accessible cloud applications are essential for the future of ubiquitous computing, smart appliances, and electronic health. Owing to the vastness and diversity of the cloud, a most cloud services, both physical and logical services have failed. Using currently accessible traces, we assessed and characterized the behaviors of successful and unsuccessful activities. We devised and implemented a method to forecast which jobs will fail. The proposed method optimizes cloud applications more efficiently in terms of resource usage. Using Google Cluster, Mustang, and Trinity traces, which are publicly available, an in-depth evaluation of the proposed model was conducted. The traces were also fed into several different machine learning models to select the most reliable model. Our efficiency analysis proves that the model performs well in terms of accuracy, F1-score, and recall. Several factors, such as failure of forecasting work, design of scheduling algorithms, modification of priority criteria, and restriction of task resubmission, may increase cloud service dependability and availability.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向云环境的机器学习作业失效分析与预测模型
可靠和可访问的云应用程序对于无处不在的计算、智能设备和电子健康的未来至关重要。由于云的浩瀚和多样性,大多数云服务,包括物理服务和逻辑服务都失败了。使用当前可访问的痕迹,我们评估并描述了成功和不成功活动的行为。我们设计并实施了一种方法来预测哪些工作将失败。提出的方法在资源使用方面更有效地优化了云应用程序。使用公开可用的Google Cluster、Mustang和Trinity跟踪,对所提议的模型进行了深入的评估。这些轨迹也被输入到几个不同的机器学习模型中,以选择最可靠的模型。我们的效率分析证明,该模型在准确率、f1分数和召回率方面表现良好。预测工作的失败、调度算法的设计、优先级标准的修改和任务重新提交的限制等几个因素可能会增加云服务的可靠性和可用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.70
自引率
0.00%
发文量
0
期刊最新文献
Identity-based threshold (multi) signature with private accountability for privacy-preserving blockchain Navigating the Digital Twin Network landscape: A survey on architecture, applications, privacy and security Erratum to “An effective digital audio watermarking using a deep convolutional neural network with a search location optimization algorithm for improvement in Robustness and Imperceptibility” [High-Confid. Comput. 3 (2023) 100153] On Building Automation System security SoK: Decentralized Storage Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1