Facilitating HPC Operation and Administration via Cloud

Chaoqun Sha, Jingfeng Zhang, Lei An, Yongsheng Zhang, Zhipeng Wang, T. Ilijaš, Nejc Bat, Miha Verlic, Qing Ji
{"title":"Facilitating HPC Operation and Administration via Cloud","authors":"Chaoqun Sha, Jingfeng Zhang, Lei An, Yongsheng Zhang, Zhipeng Wang, T. Ilijaš, Nejc Bat, Miha Verlic, Qing Ji","doi":"10.14529/JSFI190105","DOIUrl":null,"url":null,"abstract":"Experiencing a tremendous growth, Cloud Computing offers a number of advantages over other distributed platforms. Introducing the advantages of High Performance Computing (HPC) also brought forward the development of HPCaaS (HPC as a Service), which has mainly focused on flexible access to resources, cost-effectiveness, and the no-maintenance-needed for end-users. Besides providing and using HPCaaS, HPC centers could leverage more from Cloud Computing technology, for instance to facilitate operation and administration of deployed HPC systems, commonly faced by most supercomputer centers. This paper reports the product, EasyOP, developed to realize the idea that one or more Cloud or HPC facilities can be run over a centralized and unified control platform. The main purpose of EasyOP is that the information of HPC systems hardware and system software, failure alarms, jobs scheduling, etc. is sent to the Wuxi cloud computing center. After a series of analysis and processing, we are able to share many valuable data, including alarm and job scheduling status, to HPC users through SMS, email, and WeChat. More importantly, with the data accumulated on the cloud computing center, EasyOP can offer several easy-to-use functions, such as user(s) management, monthly/yearly reports, one-screen monitoring and so on. By the end of 2016, EasyOP successfully served more than 50 HPC systems with almost 10000 nodes and over of 300 regular users.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Supercomput. Front. Innov.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14529/JSFI190105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Experiencing a tremendous growth, Cloud Computing offers a number of advantages over other distributed platforms. Introducing the advantages of High Performance Computing (HPC) also brought forward the development of HPCaaS (HPC as a Service), which has mainly focused on flexible access to resources, cost-effectiveness, and the no-maintenance-needed for end-users. Besides providing and using HPCaaS, HPC centers could leverage more from Cloud Computing technology, for instance to facilitate operation and administration of deployed HPC systems, commonly faced by most supercomputer centers. This paper reports the product, EasyOP, developed to realize the idea that one or more Cloud or HPC facilities can be run over a centralized and unified control platform. The main purpose of EasyOP is that the information of HPC systems hardware and system software, failure alarms, jobs scheduling, etc. is sent to the Wuxi cloud computing center. After a series of analysis and processing, we are able to share many valuable data, including alarm and job scheduling status, to HPC users through SMS, email, and WeChat. More importantly, with the data accumulated on the cloud computing center, EasyOP can offer several easy-to-use functions, such as user(s) management, monthly/yearly reports, one-screen monitoring and so on. By the end of 2016, EasyOP successfully served more than 50 HPC systems with almost 10000 nodes and over of 300 regular users.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过云促进HPC的运营和管理
经历了巨大的增长,云计算提供了许多优于其他分布式平台的优势。高性能计算(High Performance Computing, HPC)优势的引入,带动了HPC as a Service (HPC as a Service)的发展。HPC as a Service主要关注于资源的灵活访问、成本效益和终端用户无需维护。除了提供和使用HPC caas之外,HPC中心还可以更多地利用云计算技术,例如,简化部署的HPC系统的操作和管理,这是大多数超级计算机中心普遍面临的问题。本文报告了EasyOP产品的开发,以实现一个或多个云或HPC设施可以在一个集中和统一的控制平台上运行的想法。EasyOP的主要目的是将HPC系统软硬件、故障报警、作业调度等信息发送到无锡云计算中心。经过一系列的分析和处理,我们可以通过短信、邮件、微信等方式将报警、作业调度等许多有价值的数据分享给HPC用户。更重要的是,随着数据在云计算中心的积累,EasyOP可以提供几个易于使用的功能,如用户管理,月/年报告,一屏监控等。截至2016年底,EasyOP已成功服务50多个HPC系统,节点近10000个,固定用户超过300个。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Supercomputer-Based Modeling System for Short-Term Prediction of Urban Surface Air Quality River Routing in the INM RAS-MSU Land Surface Model: Numerical Scheme and Parallel Implementation on Hybrid Supercomputers Data Assimilation by Neural Network for Ocean Circulation: Parallel Implementation Multistage Iterative Method to Tackle Inverse Problems of Wave Tomography Machine Learning Approaches to Extreme Weather Events Forecast in Urban Areas: Challenges and Initial Results
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1