Phoenix: making data-intensive grid applications fault-tolerant

Fifth IEEE/ACM International Workshop on Grid Computing Pub Date : 2004-11-08 DOI:10.1109/GRID.2004.51

George Kola, T. Kosar, M. Livny

引用次数: 29

Abstract

A major hurdle facing data intensive grid applications is the appropriate handling of failures that occur in the grid-environment. Implementing the fault-tolerance transparently at the grid-middleware level would make different data intensive applications fault-tolerant without each having to pay a separate cost and reduce the time to grid-based solution for many scientific problems. We analyzed the failures encountered by four real-life production data intensive applications: NCSA image processing pipeline, WCER video processing pipeline, US-CMS pipeline and BMRB BLAST pipeline. Taking the result of the analysis into account, we have designed and implemented Phoenix, a transparent middleware-level fault-tolerance layer that detects failures early, classifies failures into transient and permanent and appropriately handIes the transient failures. We applied our fault-tolerance layer to a prototype of the NCSA image processing pipeline and considerably improved the failure handling and report on the insights gained in the process.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Phoenix:使数据密集型网格应用程序具有容错性

数据密集型网格应用程序面临的一个主要障碍是正确处理网格环境中发生的故障。在网格中间件级别透明地实现容错将使不同的数据密集型应用程序具有容错能力，而无需每个应用程序支付单独的成本，并且减少了许多科学问题的基于网格的解决方案的时间。我们分析了NCSA图像处理管道、WCER视频处理管道、US-CMS管道和BMRB BLAST管道四个实际生产数据密集型应用中遇到的故障。考虑到分析的结果，我们设计并实现了Phoenix，这是一个透明的中间件级容错层，可以早期检测故障，将故障分为瞬态故障和永久故障，并适当地处理瞬态故障。我们将我们的容错层应用于NCSA图像处理管道的原型，并大大改进了故障处理和报告过程中获得的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Fifth IEEE/ACM International Workshop on Grid Computing

自引率

0.00%

发文量

期刊最新文献

Dynamic measurement scenarios in the virtual laboratory system Dynamic reconfiguration for grid fabrics A global grid for analysis of arthropod evolution Usage policy-based CPU sharing in virtual organizations Toward characterizing the performance of SOAP toolkits