文献互助智能选刊最新文献

高级搜索发布求助登录注册

Towards resilient EU HPC systems: a blueprint

Proceedings of the 16th ACM International Conference on Computing Frontiers Pub Date : 2019-04-30 DOI:10.1145/3310273.3323434

Petar Radojkovic

引用次数: 7

Abstract

In high-performance computing (HPC) a single tightly-coupled job may execute for days on thousands of servers. Since a server failure typically leads to cascading effects on the whole job, requiring redundancy and/or aggressive checkpointing to prevent the whole job from failing. This has an adverse impact on the system performance and resource usage; which limits the ability to scale to larger systems. System resiliency is therefore one of the most important Exascale requirements and challenges.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

迈向弹性的欧盟高性能计算系统:蓝图

在高性能计算(HPC)中，一个紧密耦合的作业可能在数千台服务器上执行数天。由于服务器故障通常会导致整个作业的级联效应，因此需要冗余和/或积极的检查点来防止整个作业失败。这对系统性能和资源使用有不利影响;这限制了扩展到更大系统的能力。因此，系统弹性是Exascale最重要的需求和挑战之一。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 16th ACM International Conference on Computing Frontiers

Proceedings of the 16th ACM International Conference on Computing Frontiers

自引率

0.00%

发文量

0

期刊最新文献

Extending classical processors to support future large scale quantum accelerators Analysing the tor web with high performance graph algorithms The FitOptiVis ECSEL project: highly efficient distributed embedded image/video processing in cyber-physical systems The german informatics society's new ethical guidelines: POSTER Go green radio astronomy: Approximate Computing Perspective: Opportunities and Challenges: POSTER

0

微信

客服QQ

Book学术公众号

扫码关注我们

反馈

Book学术官方微信

Book学术文献互助

Book学术文献互助群
群号：604180095

文献互助智能选刊最新文献互助须知联系我们：info@booksci.cn

Book学术提供免费学术资源搜索服务，方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。

Copyright © 2023 Book学术 All rights reserved.

京公网安备 11010802042870号京ICP备2023020795号-1