{"title":"分布式复制对象的经验:尼罗河项目","authors":"Aleta Ricciardi, Michael Ogg, Fabio Previato","doi":"10.1002/(SICI)1096-9942(1998)4:2<107::AID-TAPO5>3.0.CO;2-Q","DOIUrl":null,"url":null,"abstract":"The goal of the Nile project is to develop an inexpensive, scalable, fault-tolerant, widely distributed job processing environment. On the systems side, Nile must manage and provide transparent access to hundreds of commodity processors spread across the United States, and a distributed database that will exceed 100 terabytes. These are scales not commonly encountered by projects concerned with fault tolerance. On the software engineering side, Nile must be easily maintained, outlive its development phase, and be able to incorporate, or even migrate to, new software components; these requirements led us to the CORBA standard. CORBA does not yet include a fault tolerance speciication, and only a small number of experimental Object Request Brokers support object fault tolerance through replication. Our experiences over two years of building Nile have taught us a great deal that may be of use to designers of ORBs that will support object replication and fault tolerance. c The Nile collaboration is building a self-managing, highly-available job processing environment, able to provide transparent access to hundreds of heterogeneous commodity processors and a very large distributed database. The rst deployed system, and the genesis of the Nile project, will be for the CLEO high energy physics (HEP) experiment 6] where the computing resources are spread across the United States at 24 collaborating institutions, and the distributed data base exceeds 100 terabytes. Because it is managing and providing transparent access to distributed resources,. Nile can be considered a distributed operating system for a global virtual computer composed of a multitude of resources spread over a large geographic area. Consequently , while originally developed to address CLEO's computing needs, Nile is not speciic to CLEO, and could equally well be used for any large-scale processing. Nile's modular object structure means that any special purpose algorithms, such as scheduling, can easily be dropped in. Nile is currently being tested by other HEP collaborations and in other target application domains. Nile must outlive its development phase, be able to adapt to and scale with changes in computing needs, be easily maintained, and be able to incorporate new software components as they become available. These concerns led us to the CORBA standard. 1 Pragmatic issues such as cost, scalability, and institutional autonomy led us to a widely distributed global architecture, hierarchically composed of tens of geographically proximate sites. Each site implements the fault-tolerant aspects required of Nile, but inter-site coordination is loosely-coupled. Unfortunately, …","PeriodicalId":293061,"journal":{"name":"Theory Pract. Object Syst.","volume":"125 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Experience with Distributed Replicated Objects: The Nile Project\",\"authors\":\"Aleta Ricciardi, Michael Ogg, Fabio Previato\",\"doi\":\"10.1002/(SICI)1096-9942(1998)4:2<107::AID-TAPO5>3.0.CO;2-Q\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of the Nile project is to develop an inexpensive, scalable, fault-tolerant, widely distributed job processing environment. On the systems side, Nile must manage and provide transparent access to hundreds of commodity processors spread across the United States, and a distributed database that will exceed 100 terabytes. These are scales not commonly encountered by projects concerned with fault tolerance. On the software engineering side, Nile must be easily maintained, outlive its development phase, and be able to incorporate, or even migrate to, new software components; these requirements led us to the CORBA standard. CORBA does not yet include a fault tolerance speciication, and only a small number of experimental Object Request Brokers support object fault tolerance through replication. Our experiences over two years of building Nile have taught us a great deal that may be of use to designers of ORBs that will support object replication and fault tolerance. c The Nile collaboration is building a self-managing, highly-available job processing environment, able to provide transparent access to hundreds of heterogeneous commodity processors and a very large distributed database. The rst deployed system, and the genesis of the Nile project, will be for the CLEO high energy physics (HEP) experiment 6] where the computing resources are spread across the United States at 24 collaborating institutions, and the distributed data base exceeds 100 terabytes. Because it is managing and providing transparent access to distributed resources,. Nile can be considered a distributed operating system for a global virtual computer composed of a multitude of resources spread over a large geographic area. Consequently , while originally developed to address CLEO's computing needs, Nile is not speciic to CLEO, and could equally well be used for any large-scale processing. Nile's modular object structure means that any special purpose algorithms, such as scheduling, can easily be dropped in. Nile is currently being tested by other HEP collaborations and in other target application domains. Nile must outlive its development phase, be able to adapt to and scale with changes in computing needs, be easily maintained, and be able to incorporate new software components as they become available. These concerns led us to the CORBA standard. 1 Pragmatic issues such as cost, scalability, and institutional autonomy led us to a widely distributed global architecture, hierarchically composed of tens of geographically proximate sites. Each site implements the fault-tolerant aspects required of Nile, but inter-site coordination is loosely-coupled. Unfortunately, …\",\"PeriodicalId\":293061,\"journal\":{\"name\":\"Theory Pract. Object Syst.\",\"volume\":\"125 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Theory Pract. Object Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/(SICI)1096-9942(1998)4:2<107::AID-TAPO5>3.0.CO;2-Q\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theory Pract. Object Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/(SICI)1096-9942(1998)4:2<107::AID-TAPO5>3.0.CO;2-Q","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Experience with Distributed Replicated Objects: The Nile Project
The goal of the Nile project is to develop an inexpensive, scalable, fault-tolerant, widely distributed job processing environment. On the systems side, Nile must manage and provide transparent access to hundreds of commodity processors spread across the United States, and a distributed database that will exceed 100 terabytes. These are scales not commonly encountered by projects concerned with fault tolerance. On the software engineering side, Nile must be easily maintained, outlive its development phase, and be able to incorporate, or even migrate to, new software components; these requirements led us to the CORBA standard. CORBA does not yet include a fault tolerance speciication, and only a small number of experimental Object Request Brokers support object fault tolerance through replication. Our experiences over two years of building Nile have taught us a great deal that may be of use to designers of ORBs that will support object replication and fault tolerance. c The Nile collaboration is building a self-managing, highly-available job processing environment, able to provide transparent access to hundreds of heterogeneous commodity processors and a very large distributed database. The rst deployed system, and the genesis of the Nile project, will be for the CLEO high energy physics (HEP) experiment 6] where the computing resources are spread across the United States at 24 collaborating institutions, and the distributed data base exceeds 100 terabytes. Because it is managing and providing transparent access to distributed resources,. Nile can be considered a distributed operating system for a global virtual computer composed of a multitude of resources spread over a large geographic area. Consequently , while originally developed to address CLEO's computing needs, Nile is not speciic to CLEO, and could equally well be used for any large-scale processing. Nile's modular object structure means that any special purpose algorithms, such as scheduling, can easily be dropped in. Nile is currently being tested by other HEP collaborations and in other target application domains. Nile must outlive its development phase, be able to adapt to and scale with changes in computing needs, be easily maintained, and be able to incorporate new software components as they become available. These concerns led us to the CORBA standard. 1 Pragmatic issues such as cost, scalability, and institutional autonomy led us to a widely distributed global architecture, hierarchically composed of tens of geographically proximate sites. Each site implements the fault-tolerant aspects required of Nile, but inter-site coordination is loosely-coupled. Unfortunately, …