首页 > 最新文献

Distributed Syst. Eng.最新文献

英文 中文
Disciplined approach towards the design of distributed systems 分布式系统设计的规范方法
Pub Date : 2001-11-27 DOI: 10.1088/0967-1846/2/2/004
M. Nikolaidou, D. Lelis, D. Mouzakis, P. Georgiadis
As the use of Distributed Systems is spreading widely and relevant applications become more demanding, efficient design of Distributed Systems has turned to be a critical issue. For achieving the desirable integration of Distributed System components, knowledge from different areas must be combined leading to increasing complexity. Construction and provision of the appropriate software tools may facilitate the design and evaluation of Distributed Systems architectures. In this paper the architecture and functionality of the Intelligent Distributed System Design tool (IDIS) are presented. IDIS integrates methodologies and techniques from the Artificial Intelligence and Simulation domain, in order to provide a uniform environment for proposing alternative architectural solutions and evaluating their performance.
随着分布式系统的广泛应用和对相关应用的要求越来越高,分布式系统的高效设计已成为一个关键问题。为了实现分布式系统组件的理想集成,必须将来自不同领域的知识结合起来,从而增加复杂性。构建和提供适当的软件工具可以促进分布式系统体系结构的设计和评估。本文介绍了智能分布式系统设计工具(IDIS)的体系结构和功能。IDIS集成了人工智能和仿真领域的方法和技术,以便为提出可选择的体系结构解决方案和评估其性能提供统一的环境。
{"title":"Disciplined approach towards the design of distributed systems","authors":"M. Nikolaidou, D. Lelis, D. Mouzakis, P. Georgiadis","doi":"10.1088/0967-1846/2/2/004","DOIUrl":"https://doi.org/10.1088/0967-1846/2/2/004","url":null,"abstract":"As the use of Distributed Systems is spreading widely and relevant applications become more demanding, efficient design of Distributed Systems has turned to be a critical issue. For achieving the desirable integration of Distributed System components, knowledge from different areas must be combined leading to increasing complexity. Construction and provision of the appropriate software tools may facilitate the design and evaluation of Distributed Systems architectures. In this paper the architecture and functionality of the Intelligent Distributed System Design tool (IDIS) are presented. IDIS integrates methodologies and techniques from the Artificial Intelligence and Simulation domain, in order to provide a uniform environment for proposing alternative architectural solutions and evaluating their performance.","PeriodicalId":404872,"journal":{"name":"Distributed Syst. Eng.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133853257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A comprehensive distributed shared memory system that is easy to use and program 一个全面的分布式共享内存系统,易于使用和编程
Pub Date : 1999-12-01 DOI: 10.1088/0967-1846/6/4/301
J. Silcock, A. Goscinski
An analysis of the distributed shared memory (DSM) work carried out by other researchers shows that it has been able to improve the performance of applications, at the expense of ease of programming and use. Many implementations require application programmers to write code to explicitly associate shared variables with synchronization variables or to label the variables according to their access patterns. Programmers are required to explicitly initialize parallel applications and, in particular, to create DSM parallel processes on a number of workstations in the cluster of workstations. The aim of this research has been to improve the ease of programming and use of a DSM system while not compromising its performance. RHODOS' DSM allows programmers to write shared memory code exploiting their sequential programming skills without the need to learn the DSM concepts. The placement of DSM within the operating system allows the DSM environment to be automatically initialized and transparent. The results of running two applications demonstrate that our DSM, despite paying attention to ease of programming and use, achieves high performance.
其他研究人员对分布式共享内存(DSM)工作进行的分析表明,它能够以牺牲编程和使用的便利性为代价,提高应用程序的性能。许多实现要求应用程序程序员编写代码来显式地将共享变量与同步变量关联起来,或者根据变量的访问模式对其进行标记。程序员需要显式地初始化并行应用程序,特别是在工作站集群中的许多工作站上创建DSM并行进程。这项研究的目的是提高易于编程和使用的DSM系统,同时不损害其性能。RHODOS的DSM允许程序员利用他们的顺序编程技能编写共享内存代码,而不需要学习DSM概念。在操作系统中放置DSM允许自动初始化DSM环境并使其透明。运行两个应用程序的结果表明,尽管我们的DSM注重易于编程和使用,但仍然实现了高性能。
{"title":"A comprehensive distributed shared memory system that is easy to use and program","authors":"J. Silcock, A. Goscinski","doi":"10.1088/0967-1846/6/4/301","DOIUrl":"https://doi.org/10.1088/0967-1846/6/4/301","url":null,"abstract":"An analysis of the distributed shared memory (DSM) work carried out by other researchers shows that it has been able to improve the performance of applications, at the expense of ease of programming and use. Many implementations require application programmers to write code to explicitly associate shared variables with synchronization variables or to label the variables according to their access patterns. Programmers are required to explicitly initialize parallel applications and, in particular, to create DSM parallel processes on a number of workstations in the cluster of workstations. The aim of this research has been to improve the ease of programming and use of a DSM system while not compromising its performance. RHODOS' DSM allows programmers to write shared memory code exploiting their sequential programming skills without the need to learn the DSM concepts. The placement of DSM within the operating system allows the DSM environment to be automatically initialized and transparent. The results of running two applications demonstrate that our DSM, despite paying attention to ease of programming and use, achieves high performance.","PeriodicalId":404872,"journal":{"name":"Distributed Syst. Eng.","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131302183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An approach to interoperation between autonomous database systems 自治数据库系统之间的互操作方法
Pub Date : 1999-12-01 DOI: 10.1088/0967-1846/6/4/303
A. Zisman, J. Kramer
In this paper we present an approach to support interoperation between autonomous database systems. In particular, we concentrate on distributed information discovery and access for systems with a large number of databases. We avoid the need for integrated global schemas or centralized structures containing information on the available data and its location. We instead provide an architecture that supports data distribution, autonomy and heterogeneity. The architecture also supports system evolution by the addition and removal of databases. A distributed information discovery algorithm is provided to perform data requests, database location and data access. A feature of our approach is to distribute the information about database contents using simple hierarchical information structures composed of special terms. A prototype has been developed to demonstrate and evaluate the approach. A hospital case study is used to illustrate its feasibility and applicability.
本文提出了一种支持自治数据库系统间互操作的方法。我们特别关注具有大量数据库的系统的分布式信息发现和访问。我们避免了对集成的全局模式或包含可用数据及其位置信息的集中式结构的需要。相反,我们提供了一个支持数据分布、自治和异构的体系结构。该体系结构还通过添加和删除数据库来支持系统演进。提供了一种分布式信息发现算法来执行数据请求、数据库定位和数据访问。我们方法的一个特点是使用由特殊术语组成的简单分层信息结构来分发关于数据库内容的信息。已经开发了一个原型来演示和评估该方法。以某医院为例,说明了该方法的可行性和适用性。
{"title":"An approach to interoperation between autonomous database systems","authors":"A. Zisman, J. Kramer","doi":"10.1088/0967-1846/6/4/303","DOIUrl":"https://doi.org/10.1088/0967-1846/6/4/303","url":null,"abstract":"In this paper we present an approach to support interoperation between autonomous database systems. In particular, we concentrate on distributed information discovery and access for systems with a large number of databases. We avoid the need for integrated global schemas or centralized structures containing information on the available data and its location. We instead provide an architecture that supports data distribution, autonomy and heterogeneity. The architecture also supports system evolution by the addition and removal of databases. A distributed information discovery algorithm is provided to perform data requests, database location and data access. A feature of our approach is to distribute the information about database contents using simple hierarchical information structures composed of special terms. A prototype has been developed to demonstrate and evaluate the approach. A hospital case study is used to illustrate its feasibility and applicability.","PeriodicalId":404872,"journal":{"name":"Distributed Syst. Eng.","volume":"110 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114121603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Scalability evaluation of a distributed agent system 分布式代理系统的可伸缩性评估
Pub Date : 1999-12-01 DOI: 10.1088/0967-1846/6/4/302
L. Burness, Richard Titmuss, C. Lebre, K. Brown, A. Brookland
The use of new computing paradigms is intended to ease the design of complex systems. However, the non-functional aspects of a system, including performance, reliability and scalability, remain significant issues. It is hard to detect and correct many scalability problems through system testing alone - especially when the problems are rooted in the higher levels of the system design. Late corrections to the system can have serious implications for the clarity of the design and code. We have analysed the design of a system of multiple near-identical, `reactive' agents for scalability. We believe that the approach taken is readily applicable to many object oriented systems, and may form the basis of a rigorous design methodology. It is a simple, yet scientific extension to current design techniques using message sequence charts, enabling design options to be compared quantitatively rather than qualitatively. Our experience suggests that such analysis should be used to consider the effect of artificial intelligence, to ensure that autonomous behaviour has an overall beneficial effect for system performance.
使用新的计算范式是为了简化复杂系统的设计。然而,系统的非功能方面,包括性能、可靠性和可伸缩性,仍然是重要的问题。仅通过系统测试很难检测和纠正许多可伸缩性问题——特别是当问题根植于系统设计的更高层次时。对系统的后期更正可能会对设计和代码的清晰度产生严重影响。我们分析了一个由多个几乎相同的“反应性”代理组成的系统的设计,以实现可扩展性。我们相信所采用的方法很容易适用于许多面向对象的系统,并且可以形成严格的设计方法的基础。它是对使用消息序列图的当前设计技术的一种简单而科学的扩展,使设计选项能够定量地而不是定性地进行比较。我们的经验表明,这种分析应该用于考虑人工智能的影响,以确保自主行为对系统性能具有总体有益的影响。
{"title":"Scalability evaluation of a distributed agent system","authors":"L. Burness, Richard Titmuss, C. Lebre, K. Brown, A. Brookland","doi":"10.1088/0967-1846/6/4/302","DOIUrl":"https://doi.org/10.1088/0967-1846/6/4/302","url":null,"abstract":"The use of new computing paradigms is intended to ease the design of complex systems. However, the non-functional aspects of a system, including performance, reliability and scalability, remain significant issues. It is hard to detect and correct many scalability problems through system testing alone - especially when the problems are rooted in the higher levels of the system design. Late corrections to the system can have serious implications for the clarity of the design and code. We have analysed the design of a system of multiple near-identical, `reactive' agents for scalability. We believe that the approach taken is readily applicable to many object oriented systems, and may form the basis of a rigorous design methodology. It is a simple, yet scientific extension to current design techniques using message sequence charts, enabling design options to be compared quantitatively rather than qualitatively. Our experience suggests that such analysis should be used to consider the effect of artificial intelligence, to ensure that autonomous behaviour has an overall beneficial effect for system performance.","PeriodicalId":404872,"journal":{"name":"Distributed Syst. Eng.","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132626583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Hierarchical, competitive scheduling of multiple DAGs in a dynamic heterogeneous environment 动态异构环境中多个dag的分层竞争调度
Pub Date : 1999-09-01 DOI: 10.1088/0967-1846/6/3/303
Michael A. Iverson, F. Özgüner
With the advent of large-scale heterogeneous environments, there is a need for matching and scheduling algorithms which can allow multiple, directed acyclic graph structured applications to share the computational resources of the network. This paper presents a hierarchical matching and scheduling framework where multiple applications compete for the computational resources on the network. In this environment, each application makes its own scheduling decisions. Thus, no centralized scheduling resource is required. Applications do not need direct knowledge of the other applications - knowledge of other applications arrives indirectly through load estimates (like queue lengths). This paper presents an algorithm, called the dynamic hierarchical scheduling algorithm, which schedules tasks within this framework. A series of simulations are presented to examine the performance of these algorithms in this environment, compared with a more conventional, single-user environment.
随着大规模异构环境的出现,需要匹配和调度算法来允许多个有向无环图结构应用程序共享网络的计算资源。本文提出了一种多应用程序竞争网络计算资源的分层匹配和调度框架。在这种环境中,每个应用程序都做出自己的调度决策。因此,不需要集中调度资源。应用程序不需要直接了解其他应用程序——其他应用程序的信息是通过负载估计间接获得的(比如队列长度)。本文提出了一种动态分层调度算法,在此框架内对任务进行调度。与传统的单用户环境相比,提出了一系列模拟来检查这些算法在这种环境中的性能。
{"title":"Hierarchical, competitive scheduling of multiple DAGs in a dynamic heterogeneous environment","authors":"Michael A. Iverson, F. Özgüner","doi":"10.1088/0967-1846/6/3/303","DOIUrl":"https://doi.org/10.1088/0967-1846/6/3/303","url":null,"abstract":"With the advent of large-scale heterogeneous environments, there is a need for matching and scheduling algorithms which can allow multiple, directed acyclic graph structured applications to share the computational resources of the network. This paper presents a hierarchical matching and scheduling framework where multiple applications compete for the computational resources on the network. In this environment, each application makes its own scheduling decisions. Thus, no centralized scheduling resource is required. Applications do not need direct knowledge of the other applications - knowledge of other applications arrives indirectly through load estimates (like queue lengths). This paper presents an algorithm, called the dynamic hierarchical scheduling algorithm, which schedules tasks within this framework. A series of simulations are presented to examine the performance of these algorithms in this environment, compared with a more conventional, single-user environment.","PeriodicalId":404872,"journal":{"name":"Distributed Syst. Eng.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116461703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Supporting customized failure models for distributed software 支持为分布式软件定制故障模型
Pub Date : 1999-09-01 DOI: 10.1088/0967-1846/6/3/302
M. Hiltunen, Vijaykumar Immanuel, R. Schlichting
The cost of employing software fault tolerance techniques in distributed systems is strongly related to the type of failures to be tolerated. For example, in terms of the amount of redundancy required and execution time, tolerating a processor crash is much cheaper than tolerating arbitrary (or Byzantine) failures. This paper describes an approach to constructing configurable services for distributed systems that allows easy customization of the type of failures to tolerate. Using this approach, it is possible to configure custom services across a spectrum of possibilities, from a very efficient but unreliable server group that does not tolerate any failures, to a less efficient but reliable group that tolerates crash, omission, timing, or arbitrary failures. The approach is based on building configurable services as collections of software modules called micro-protocols. Each micro-protocol implements a different semantic property or property variant, and interacts with other micro-protocols using an event-driven model provided by a runtime system. In addition to facilitating the choice of failure model, the approach allows service properties such as message ordering and delivery atomicity to be customized for each application.
在分布式系统中采用软件容错技术的成本与要容忍的故障类型密切相关。例如,就所需的冗余量和执行时间而言,容忍处理器崩溃要比容忍任意(或拜占庭式)故障便宜得多。本文描述了一种为分布式系统构造可配置服务的方法,该方法允许轻松定制可容忍的故障类型。使用这种方法,可以跨各种可能性配置自定义服务,从非常高效但不可靠的服务器组(不能容忍任何故障)到效率较低但可靠的组(可以容忍崩溃、遗漏、定时或任意故障)。该方法基于将可配置服务构建为称为微协议的软件模块集合。每个微协议实现不同的语义属性或属性变体,并使用运行时系统提供的事件驱动模型与其他微协议进行交互。除了方便选择故障模型之外,该方法还允许为每个应用程序定制消息排序和交付原子性等服务属性。
{"title":"Supporting customized failure models for distributed software","authors":"M. Hiltunen, Vijaykumar Immanuel, R. Schlichting","doi":"10.1088/0967-1846/6/3/302","DOIUrl":"https://doi.org/10.1088/0967-1846/6/3/302","url":null,"abstract":"The cost of employing software fault tolerance techniques in distributed systems is strongly related to the type of failures to be tolerated. For example, in terms of the amount of redundancy required and execution time, tolerating a processor crash is much cheaper than tolerating arbitrary (or Byzantine) failures. This paper describes an approach to constructing configurable services for distributed systems that allows easy customization of the type of failures to tolerate. Using this approach, it is possible to configure custom services across a spectrum of possibilities, from a very efficient but unreliable server group that does not tolerate any failures, to a less efficient but reliable group that tolerates crash, omission, timing, or arbitrary failures. The approach is based on building configurable services as collections of software modules called micro-protocols. Each micro-protocol implements a different semantic property or property variant, and interacts with other micro-protocols using an event-driven model provided by a runtime system. In addition to facilitating the choice of failure model, the approach allows service properties such as message ordering and delivery atomicity to be customized for each application.","PeriodicalId":404872,"journal":{"name":"Distributed Syst. Eng.","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121091988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Group membership failure detection: a simple protocol and its probabilistic analysis 组成员故障检测:一种简单协议及其概率分析
Pub Date : 1999-09-01 DOI: 10.1088/0967-1846/6/3/301
M. Raynal, F. Tronel
A group membership failure (in short, a group failure) occurs when one of the group members crashes. A group failure detection protocol has to inform all the non-crashed members of the group that this group entity has crashed. Ideally, such a protocol should be live (if a process crashes, then the group failure has to be detected) and safe (if a group failure is claimed, then at least one process has crashed). Unreliable asynchronous distributed systems are characterized by the impossibility for a process to get an accurate view of the system state. Consequently, the design of a group failure detection protocol that is both safe and live is a problem that cannot be solved in all runs of an asynchronous distributed system. This paper analyses a group failure detection protocol whose design naturally ensures its liveness. We show that by appropriately tuning some of its duration-related parameters, the safety property can be guaranteed with a probability as close to one as desired. This analysis shows that, in real distributed systems, it is possible to achieve failure detection with a negligible probability of wrong suspicions.
当一个组成员崩溃时,就会发生组成员失败(简而言之,组失败)。组故障检测协议必须通知组中所有未崩溃的成员该组实体已经崩溃。理想情况下,这样的协议应该是活动的(如果进程崩溃,则必须检测到组故障)和安全的(如果声称组故障,则至少有一个进程崩溃)。不可靠的异步分布式系统的特点是进程不可能获得系统状态的准确视图。因此,设计一种既安全又有效的组故障检测协议是一个不可能在异步分布式系统的所有运行中都能解决的问题。本文分析了一种组故障检测协议,该协议的设计自然保证了协议的活动性。我们表明,通过适当地调整一些与持续时间相关的参数,可以保证安全属性的概率接近于期望的1。该分析表明,在真实的分布式系统中,可以实现故障检测,而错误怀疑的概率可以忽略不计。
{"title":"Group membership failure detection: a simple protocol and its probabilistic analysis","authors":"M. Raynal, F. Tronel","doi":"10.1088/0967-1846/6/3/301","DOIUrl":"https://doi.org/10.1088/0967-1846/6/3/301","url":null,"abstract":"A group membership failure (in short, a group failure) occurs when one of the group members crashes. A group failure detection protocol has to inform all the non-crashed members of the group that this group entity has crashed. Ideally, such a protocol should be live (if a process crashes, then the group failure has to be detected) and safe (if a group failure is claimed, then at least one process has crashed). Unreliable asynchronous distributed systems are characterized by the impossibility for a process to get an accurate view of the system state. Consequently, the design of a group failure detection protocol that is both safe and live is a problem that cannot be solved in all runs of an asynchronous distributed system. This paper analyses a group failure detection protocol whose design naturally ensures its liveness. We show that by appropriately tuning some of its duration-related parameters, the safety property can be guaranteed with a probability as close to one as desired. This analysis shows that, in real distributed systems, it is possible to achieve failure detection with a negligible probability of wrong suspicions.","PeriodicalId":404872,"journal":{"name":"Distributed Syst. Eng.","volume":"681 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131857361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Guest Editor's Introduction: Special section on dependable distributed systems 客座编辑简介:关于可靠分布式系统的特别部分
Pub Date : 1999-09-01 DOI: 10.1088/0967-1846/6/6/93
C. Fetzer
We rely more and more on computers. For example, the Internet reshapes the way we do business. A `computer outage' can cost a company a substantial amount of money. Not only with respect to the business lost during an outage, but also with respect to the negative publicity the company receives. This is especially true for Internet companies. After recent computer outages of Internet companies, we have seen a drastic fall of the shares of the affected companies. There are multiple causes for computer outages. Although computer hardware becomes more reliable, hardware related outages remain an important issue. For example, some of the recent computer outages of companies were caused by failed memory and system boards, and even by crashed disks - a failure type which can easily be masked using disk mirroring. Transient hardware failures might also look like software failures and, hence, might be incorrectly classified as such. However, many outages are software related. Faulty system software, middleware, and application software can crash a system. Dependable computing systems are systems we can rely on. Dependable systems are, by definition, reliable, available, safe and secure [3]. This special section focuses on issues related to dependable distributed systems. Distributed systems have the potential to be more dependable than a single computer because the probability that all computers in a distributed system fail is smaller than the probability that a single computer fails. However, if a distributed system is not built well, it is potentially less dependable than a single computer since the probability that at least one computer in a distributed system fails is higher than the probability that one computer fails. For example, if the crash of any computer in a distributed system can bring the complete system to a halt, the system is less dependable than a single-computer system. Building dependable distributed systems is an extremely difficult task. There is no silver bullet solution. Instead one has to apply a variety of engineering techniques [2]: fault-avoidance (minimize the occurrence of faults, e.g. by using a proper design process), fault-removal (remove faults before they occur, e.g. by testing), fault-evasion (predict faults by monitoring and reconfigure the system before failures occur), and fault-tolerance (mask and/or contain failures). Building a system from scratch is an expensive and time consuming effort. To reduce the cost of building dependable distributed systems, one would choose to use commercial off-the-shelf (COTS) components whenever possible. The usage of COTS components has several potential advantages beyond minimizing costs. For example, through the widespread usage of a COTS component, design failures might be detected and fixed before the component is used in a dependable system. Custom-designed components have to mature without the widespread in-field testing of COTS components. COTS components have various potenti
组成员检测问题由活动条件(L)和安全属性(S)指定:(L)如果进程p崩溃,则最终每个未崩溃的进程q都必须怀疑p已经崩溃;(S)如果进程q怀疑p,那么p确实崩溃了。可以证明(L)或(S)是可实现的,但不能在异步系统中同时实现(L)和(S)。在实践中,只需要实现(L)和(S),使得违反(L)或(S)的概率变得可以忽略不计。Raynal和Tronel提出并分析了一个协议,该协议可以确定地实现(L),并且可以进行调整,使(S)被违反的可能性变得可以忽略不计。为异步系统设计和实现分布式容错协议是一项困难但并非不可能完成的任务。容错协议必须检测和屏蔽某些故障类,例如崩溃故障和消息遗漏故障。在容错协议的性能和协议可以容忍的故障类别之间存在权衡。人们希望容忍尽可能多的故障类,以满足协议[1]的随机要求,同时仍然保持足够的性能。由于协议的客户端在性能/容错权衡方面有不同的需求,因此希望能够定制协议,以便选择适当的性能/容错权衡。在这个特殊的章节中,Hiltunen等人描述了如何在Cactus系统中使用微协议组成协议。它们展示了如何根据客户端的需求定制组RPC系统。特别是,它们展示了考虑额外的故障类如何影响组RPC系统的性能。参考文献[1]Cristian F 1991理解容错分布式系统ACM通信34 (2)56-78 [2]Heimerdinger W L和Weinstock C B 1992系统容错的概念框架技术报告92-TR-33, CMU/SEI [3] Laprie J C(编)1992可靠性:基本概念和术语(维也纳:施普林格)
{"title":"Guest Editor's Introduction: Special section on dependable distributed systems","authors":"C. Fetzer","doi":"10.1088/0967-1846/6/6/93","DOIUrl":"https://doi.org/10.1088/0967-1846/6/6/93","url":null,"abstract":"We rely more and more on computers. For example, the Internet reshapes the way we do business. A `computer outage' can cost a company a substantial amount of money. Not only with respect to the business lost during an outage, but also with respect to the negative publicity the company receives. This is especially true for Internet companies. After recent computer outages of Internet companies, we have seen a drastic fall of the shares of the affected companies. There are multiple causes for computer outages. Although computer hardware becomes more reliable, hardware related outages remain an important issue. For example, some of the recent computer outages of companies were caused by failed memory and system boards, and even by crashed disks - a failure type which can easily be masked using disk mirroring. Transient hardware failures might also look like software failures and, hence, might be incorrectly classified as such. However, many outages are software related. Faulty system software, middleware, and application software can crash a system. Dependable computing systems are systems we can rely on. Dependable systems are, by definition, reliable, available, safe and secure [3]. This special section focuses on issues related to dependable distributed systems. Distributed systems have the potential to be more dependable than a single computer because the probability that all computers in a distributed system fail is smaller than the probability that a single computer fails. However, if a distributed system is not built well, it is potentially less dependable than a single computer since the probability that at least one computer in a distributed system fails is higher than the probability that one computer fails. For example, if the crash of any computer in a distributed system can bring the complete system to a halt, the system is less dependable than a single-computer system. Building dependable distributed systems is an extremely difficult task. There is no silver bullet solution. Instead one has to apply a variety of engineering techniques [2]: fault-avoidance (minimize the occurrence of faults, e.g. by using a proper design process), fault-removal (remove faults before they occur, e.g. by testing), fault-evasion (predict faults by monitoring and reconfigure the system before failures occur), and fault-tolerance (mask and/or contain failures). Building a system from scratch is an expensive and time consuming effort. To reduce the cost of building dependable distributed systems, one would choose to use commercial off-the-shelf (COTS) components whenever possible. The usage of COTS components has several potential advantages beyond minimizing costs. For example, through the widespread usage of a COTS component, design failures might be detected and fixed before the component is used in a dependable system. Custom-designed components have to mature without the widespread in-field testing of COTS components. COTS components have various potenti","PeriodicalId":404872,"journal":{"name":"Distributed Syst. Eng.","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131901492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An adaptive architecture for causally consistent distributed services 用于因果一致的分布式服务的自适应体系结构
Pub Date : 1999-06-01 DOI: 10.1088/0967-1846/6/2/301
M. Ahamad, M. Raynal, G. Thia-Kime
This paper explores causally consistent distributed services when multiple related services are replicated to meet performance and availability requirements. This consistency criterion is particularly well suited for distributed services such as cooperative document sharing, and it is attractive because of the efficient implementations that are allowed by it. A new protocol for implementing causally consistent services is presented. It allows service instances to be created and deleted dynamically according to service access patterns in the distributed system. It also handles the case where different but related services are replicated independently. Another novel aspect of this protocol lies in its ability to use both push and pull mechanisms for disseminating updates to objects that encapsulate service state.
本文探讨了当复制多个相关服务以满足性能和可用性需求时,因果一致的分布式服务。这种一致性标准特别适合于分布式服务,如协作式文档共享,并且由于它所允许的高效实现而具有吸引力。提出了一种实现因果一致服务的新协议。它允许根据分布式系统中的服务访问模式动态地创建和删除服务实例。它还可以处理独立复制不同但相关的服务的情况。该协议的另一个新颖之处在于它能够使用推拉机制将更新传播到封装服务状态的对象。
{"title":"An adaptive architecture for causally consistent distributed services","authors":"M. Ahamad, M. Raynal, G. Thia-Kime","doi":"10.1088/0967-1846/6/2/301","DOIUrl":"https://doi.org/10.1088/0967-1846/6/2/301","url":null,"abstract":"This paper explores causally consistent distributed services when multiple related services are replicated to meet performance and availability requirements. This consistency criterion is particularly well suited for distributed services such as cooperative document sharing, and it is attractive because of the efficient implementations that are allowed by it. A new protocol for implementing causally consistent services is presented. It allows service instances to be created and deleted dynamically according to service access patterns in the distributed system. It also handles the case where different but related services are replicated independently. Another novel aspect of this protocol lies in its ability to use both push and pull mechanisms for disseminating updates to objects that encapsulate service state.","PeriodicalId":404872,"journal":{"name":"Distributed Syst. Eng.","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122999958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
CORBA and RM-ODP: parallel or divergent? CORBA和RM-ODP:平行还是分散?
Pub Date : 1999-06-01 DOI: 10.1088/0967-1846/6/2/303
Nicole Dunlop, J. Indulska, K. Raymond
Modern architectures for distributed object environments (or distributed `middleware') are revealing an increasing trend towards standardization. The recent emergence of a standard for open distributed processing, the ISO/IEC Reference Model for Open Distributed Processing (RM-ODP) (ITU-T Recommendation X.901) and the coincidence of the development of the Object Management Group's Common Object Request Broker Architecture (CORBA), has prompted us to explore the relationship between these architectures. This paper analyses the CORBA architecture as a support environment for open distributed processing by comparing the business requirements for ODP, RM-ODP viewpoints, functions and distribution transparencies as specified in RM-ODP (ITU-T Recommendations X.901-4) with the CORBA architecture. Through this examination it is evident that despite distinctly divergent terminology, there exist significant parallels between CORBA and RM-ODP.
分布式对象环境(或分布式“中间件”)的现代体系结构正呈现出日益增长的标准化趋势。最近出现的开放分布式处理标准,即ISO/IEC开放分布式处理参考模型(RM-ODP) (ITU-T建议X.901),以及对象管理组的公共对象请求代理体系结构(CORBA)的发展巧合,促使我们探索这些体系结构之间的关系。本文通过比较RM-ODP (ITU-T建议X.901-4)中规定的ODP、RM-ODP视点、功能和分布透明度的业务需求与CORBA体系结构,分析了CORBA体系结构作为开放分布式处理的支持环境。通过这一考察,可以明显看出,尽管术语有明显的差异,但CORBA和RM-ODP之间存在显著的相似之处。
{"title":"CORBA and RM-ODP: parallel or divergent?","authors":"Nicole Dunlop, J. Indulska, K. Raymond","doi":"10.1088/0967-1846/6/2/303","DOIUrl":"https://doi.org/10.1088/0967-1846/6/2/303","url":null,"abstract":"Modern architectures for distributed object environments (or distributed `middleware') are revealing an increasing trend towards standardization. The recent emergence of a standard for open distributed processing, the ISO/IEC Reference Model for Open Distributed Processing (RM-ODP) (ITU-T Recommendation X.901) and the coincidence of the development of the Object Management Group's Common Object Request Broker Architecture (CORBA), has prompted us to explore the relationship between these architectures. This paper analyses the CORBA architecture as a support environment for open distributed processing by comparing the business requirements for ODP, RM-ODP viewpoints, functions and distribution transparencies as specified in RM-ODP (ITU-T Recommendations X.901-4) with the CORBA architecture. Through this examination it is evident that despite distinctly divergent terminology, there exist significant parallels between CORBA and RM-ODP.","PeriodicalId":404872,"journal":{"name":"Distributed Syst. Eng.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131212304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Distributed Syst. Eng.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1