分布式复制对象的经验:尼罗河项目

Theory Pract. Object Syst. Pub Date : 1998-04-01 DOI:10.1002/(SICI)1096-9942(1998)4:2<107::AID-TAPO5>3.0.CO;2-Q

Aleta Ricciardi, Michael Ogg, Fabio Previato

{"title":"分布式复制对象的经验:尼罗河项目","authors":"Aleta Ricciardi, Michael Ogg, Fabio Previato","doi":"10.1002/(SICI)1096-9942(1998)4:2<107::AID-TAPO5>3.0.CO;2-Q","DOIUrl":null,"url":null,"abstract":"The goal of the Nile project is to develop an inexpensive, scalable, fault-tolerant, widely distributed job processing environment. On the systems side, Nile must manage and provide transparent access to hundreds of commodity processors spread across the United States, and a distributed database that will exceed 100 terabytes. These are scales not commonly encountered by projects concerned with fault tolerance. On the software engineering side, Nile must be easily maintained, outlive its development phase, and be able to incorporate, or even migrate to, new software components; these requirements led us to the CORBA standard. CORBA does not yet include a fault tolerance speciication, and only a small number of experimental Object Request Brokers support object fault tolerance through replication. Our experiences over two years of building Nile have taught us a great deal that may be of use to designers of ORBs that will support object replication and fault tolerance. c The Nile collaboration is building a self-managing, highly-available job processing environment, able to provide transparent access to hundreds of heterogeneous commodity processors and a very large distributed database. The rst deployed system, and the genesis of the Nile project, will be for the CLEO high energy physics (HEP) experiment 6] where the computing resources are spread across the United States at 24 collaborating institutions, and the distributed data base exceeds 100 terabytes. Because it is managing and providing transparent access to distributed resources,. Nile can be considered a distributed operating system for a global virtual computer composed of a multitude of resources spread over a large geographic area. Consequently , while originally developed to address CLEO's computing needs, Nile is not speciic to CLEO, and could equally well be used for any large-scale processing. Nile's modular object structure means that any special purpose algorithms, such as scheduling, can easily be dropped in. Nile is currently being tested by other HEP collaborations and in other target application domains. Nile must outlive its development phase, be able to adapt to and scale with changes in computing needs, be easily maintained, and be able to incorporate new software components as they become available. These concerns led us to the CORBA standard. 1 Pragmatic issues such as cost, scalability, and institutional autonomy led us to a widely distributed global architecture, hierarchically composed of tens of geographically proximate sites. Each site implements the fault-tolerant aspects required of Nile, but inter-site coordination is loosely-coupled. Unfortunately, …","PeriodicalId":293061,"journal":{"name":"Theory Pract. Object Syst.","volume":"125 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Experience with Distributed Replicated Objects: The Nile Project\",\"authors\":\"Aleta Ricciardi, Michael Ogg, Fabio Previato\",\"doi\":\"10.1002/(SICI)1096-9942(1998)4:2<107::AID-TAPO5>3.0.CO;2-Q\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of the Nile project is to develop an inexpensive, scalable, fault-tolerant, widely distributed job processing environment. On the systems side, Nile must manage and provide transparent access to hundreds of commodity processors spread across the United States, and a distributed database that will exceed 100 terabytes. These are scales not commonly encountered by projects concerned with fault tolerance. On the software engineering side, Nile must be easily maintained, outlive its development phase, and be able to incorporate, or even migrate to, new software components; these requirements led us to the CORBA standard. CORBA does not yet include a fault tolerance speciication, and only a small number of experimental Object Request Brokers support object fault tolerance through replication. Our experiences over two years of building Nile have taught us a great deal that may be of use to designers of ORBs that will support object replication and fault tolerance. c The Nile collaboration is building a self-managing, highly-available job processing environment, able to provide transparent access to hundreds of heterogeneous commodity processors and a very large distributed database. The rst deployed system, and the genesis of the Nile project, will be for the CLEO high energy physics (HEP) experiment 6] where the computing resources are spread across the United States at 24 collaborating institutions, and the distributed data base exceeds 100 terabytes. Because it is managing and providing transparent access to distributed resources,. Nile can be considered a distributed operating system for a global virtual computer composed of a multitude of resources spread over a large geographic area. Consequently , while originally developed to address CLEO's computing needs, Nile is not speciic to CLEO, and could equally well be used for any large-scale processing. Nile's modular object structure means that any special purpose algorithms, such as scheduling, can easily be dropped in. Nile is currently being tested by other HEP collaborations and in other target application domains. Nile must outlive its development phase, be able to adapt to and scale with changes in computing needs, be easily maintained, and be able to incorporate new software components as they become available. These concerns led us to the CORBA standard. 1 Pragmatic issues such as cost, scalability, and institutional autonomy led us to a widely distributed global architecture, hierarchically composed of tens of geographically proximate sites. Each site implements the fault-tolerant aspects required of Nile, but inter-site coordination is loosely-coupled. Unfortunately, …\",\"PeriodicalId\":293061,\"journal\":{\"name\":\"Theory Pract. Object Syst.\",\"volume\":\"125 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Theory Pract. Object Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/(SICI)1096-9942(1998)4:2<107::AID-TAPO5>3.0.CO;2-Q\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theory Pract. Object Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/(SICI)1096-9942(1998)4:2<107::AID-TAPO5>3.0.CO;2-Q","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

Nile项目的目标是开发一个廉价的、可扩展的、容错的、广泛分布的作业处理环境。在系统方面，Nile必须管理并提供对遍布美国的数百个商品处理器的透明访问，以及一个将超过100tb的分布式数据库。这些是与容错有关的项目不常遇到的规模。在软件工程方面，Nile必须易于维护，长于其开发阶段，并且能够合并，甚至迁移到新的软件组件;这些需求使我们产生了CORBA标准。CORBA还没有包含容错规范，只有少数实验性的对象请求代理通过复制支持对象容错。我们在两年多的时间里构建Nile的经验教会了我们很多东西，这些东西可能对支持对象复制和容错的orb设计者很有用。尼罗河合作正在构建一个自我管理、高可用性的作业处理环境，能够提供对数百个异构商品处理器和一个非常大的分布式数据库的透明访问。第一个部署的系统，也是尼罗河项目的起源，将用于CLEO高能物理(HEP)实验[6]，其中计算资源分布在美国24个合作机构中，分布式数据库超过100tb。因为它正在管理和提供对分布式资源的透明访问，Nile可以被认为是一个分布式操作系统，用于由分布在大地理区域的大量资源组成的全球虚拟计算机。因此，虽然最初是为了解决CLEO的计算需求而开发的，但Nile并不是专门针对CLEO的，并且同样可以用于任何大规模处理。Nile的模块化对象结构意味着任何特殊用途的算法，如调度，都可以很容易地加入。Nile目前正在其他HEP合作项目和其他目标应用领域进行测试。Nile必须活过开发阶段，能够适应和扩展计算需求的变化，易于维护，并且能够在新的软件组件可用时合并它们。这些问题使我们想到了CORBA标准。实用主义的问题，如成本、可扩展性和机构自治，使我们得到了一个广泛分布的全球架构，层次上由几十个地理上接近的站点组成。每个站点都实现了Nile所需的容错方面，但是站点间的协调是松耦合的。不幸的是,…

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Experience with Distributed Replicated Objects: The Nile Project

The goal of the Nile project is to develop an inexpensive, scalable, fault-tolerant, widely distributed job processing environment. On the systems side, Nile must manage and provide transparent access to hundreds of commodity processors spread across the United States, and a distributed database that will exceed 100 terabytes. These are scales not commonly encountered by projects concerned with fault tolerance. On the software engineering side, Nile must be easily maintained, outlive its development phase, and be able to incorporate, or even migrate to, new software components; these requirements led us to the CORBA standard. CORBA does not yet include a fault tolerance speciication, and only a small number of experimental Object Request Brokers support object fault tolerance through replication. Our experiences over two years of building Nile have taught us a great deal that may be of use to designers of ORBs that will support object replication and fault tolerance. c The Nile collaboration is building a self-managing, highly-available job processing environment, able to provide transparent access to hundreds of heterogeneous commodity processors and a very large distributed database. The rst deployed system, and the genesis of the Nile project, will be for the CLEO high energy physics (HEP) experiment 6] where the computing resources are spread across the United States at 24 collaborating institutions, and the distributed data base exceeds 100 terabytes. Because it is managing and providing transparent access to distributed resources,. Nile can be considered a distributed operating system for a global virtual computer composed of a multitude of resources spread over a large geographic area. Consequently , while originally developed to address CLEO's computing needs, Nile is not speciic to CLEO, and could equally well be used for any large-scale processing. Nile's modular object structure means that any special purpose algorithms, such as scheduling, can easily be dropped in. Nile is currently being tested by other HEP collaborations and in other target application domains. Nile must outlive its development phase, be able to adapt to and scale with changes in computing needs, be easily maintained, and be able to incorporate new software components as they become available. These concerns led us to the CORBA standard. 1 Pragmatic issues such as cost, scalability, and institutional autonomy led us to a widely distributed global architecture, hierarchically composed of tens of geographically proximate sites. Each site implements the fault-tolerant aspects required of Nile, but inter-site coordination is loosely-coupled. Unfortunately, …

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Theory Pract. Object Syst.

自引率

0.00%

发文量

期刊最新文献

The Electronic Library Project: SGML Document Management System Based on ODBMS A Performance Study of Object Database Management Systems Building CORBA Applications with an Object Database System Object Management for a Visual Data Analysis Tool In the Trenches with ObjectStore