通过任务迁移机制实现异构多集群系统的容错

Uriel Cabello, José Rodríguez, A. Viveros, S. Mendoza, D. Decouchant
{"title":"通过任务迁移机制实现异构多集群系统的容错","authors":"Uriel Cabello, José Rodríguez, A. Viveros, S. Mendoza, D. Decouchant","doi":"10.1109/ICEEE.2014.6978266","DOIUrl":null,"url":null,"abstract":"The GRID computing paradigm consists of multiple heterogeneous distributed clusters connected by heterogeneous network interfaces. One advantage of this paradigm is to analyze massive amounts of data employing computing resources at different geographic places with different platforms. However in order to harness the power of those resources, many problems must be solved. In this work we deal with the problem of fault tolerance on heterogeneous computer systems. Our proposal aims to ease the process of recovery when system failures are detected at runtime avoiding the necessity for application restarts. Our proposal works through a set of services that performs transparent task migration over the computing nodes, hiding the complexity related with error handling when a hybrid programming model based on Open MPI and OpenCL is employed.","PeriodicalId":6661,"journal":{"name":"2014 11th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE)","volume":"51 1","pages":"1-7"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Fault tolerance in heterogeneous multi-cluster systems through a task migration mechanism\",\"authors\":\"Uriel Cabello, José Rodríguez, A. Viveros, S. Mendoza, D. Decouchant\",\"doi\":\"10.1109/ICEEE.2014.6978266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The GRID computing paradigm consists of multiple heterogeneous distributed clusters connected by heterogeneous network interfaces. One advantage of this paradigm is to analyze massive amounts of data employing computing resources at different geographic places with different platforms. However in order to harness the power of those resources, many problems must be solved. In this work we deal with the problem of fault tolerance on heterogeneous computer systems. Our proposal aims to ease the process of recovery when system failures are detected at runtime avoiding the necessity for application restarts. Our proposal works through a set of services that performs transparent task migration over the computing nodes, hiding the complexity related with error handling when a hybrid programming model based on Open MPI and OpenCL is employed.\",\"PeriodicalId\":6661,\"journal\":{\"name\":\"2014 11th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE)\",\"volume\":\"51 1\",\"pages\":\"1-7\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 11th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICEEE.2014.6978266\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 11th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEEE.2014.6978266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

网格计算范式由多个异构分布式集群组成,这些集群由异构网络接口连接。这种范例的一个优点是可以使用不同地理位置和不同平台上的计算资源来分析大量数据。然而,为了利用这些资源的力量,必须解决许多问题。本文主要研究异构计算机系统的容错问题。我们的建议旨在简化在运行时检测到系统故障时的恢复过程,避免重新启动应用程序的必要性。我们的建议通过一组在计算节点上执行透明任务迁移的服务来工作,当使用基于Open MPI和OpenCL的混合编程模型时,隐藏了与错误处理相关的复杂性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Fault tolerance in heterogeneous multi-cluster systems through a task migration mechanism
The GRID computing paradigm consists of multiple heterogeneous distributed clusters connected by heterogeneous network interfaces. One advantage of this paradigm is to analyze massive amounts of data employing computing resources at different geographic places with different platforms. However in order to harness the power of those resources, many problems must be solved. In this work we deal with the problem of fault tolerance on heterogeneous computer systems. Our proposal aims to ease the process of recovery when system failures are detected at runtime avoiding the necessity for application restarts. Our proposal works through a set of services that performs transparent task migration over the computing nodes, hiding the complexity related with error handling when a hybrid programming model based on Open MPI and OpenCL is employed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Development of a vision algorithm for close-range relative navigation of underwater vehicles Fabrication of Pure Tin Oxide Pellets at Different Annealed Temperatures for CO and C3H8 Gas Sensors Study of sensing properties of ZnTe synthesized by mechanosynthesis for detecting gas CO ECG Arrhythmia Classification for Comparing Pre-Trained Deep Learning Models Reduction Of Energy Consumption in NoC Through The Application Of Novel Encoding Techniques
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1