首页 > 最新文献

Proceedings. 14th Symposium on Reliable Distributed Systems最新文献

英文 中文
Self diagnosis of processor arrays using a comparison model 使用比较模型的处理器阵列的自诊断
Pub Date : 1995-09-13 DOI: 10.1109/RELDIS.1995.526229
P. Maestrini, P. Santi
This paper introduces a diagnosing algorithm for bidimensional processor arrays, where processors are interconnected in horizontal and vertical meshes. For the purpose of diagnosis, the array is considered to be partitioned in square clusters of processors. The algorithm is based on interprocessor tests, using a comparison model. The algorithm, which is divided in four steps, called intracluster diagnosis, interluster diagnosis, fault-free core identification and augmentation, identifies a set of non-faulty and a set of faulty units. The diagnosis is proved to be correct in the worst case, assuming that the actual number of faulty processors is no more that T(N), an increasing function of the number N of processors. It is shown that T(N) is O(N/sup 2/3/). Although correct, the diagnosis is generally incomplete. However, using probabilistic techniques, it is shown that the diagnosis is very likely to be complete under the same limitations which ensure correctness in the worst case.
本文介绍了一种二维处理器阵列的诊断算法,其中处理器在水平和垂直网格中相互连接。为了便于诊断,将阵列划分为处理器的方形集群。该算法基于处理器间测试,采用比较模型。该算法分为簇内诊断、簇间诊断、无故障核识别和增强四个步骤,分别对一组非故障单元和一组故障单元进行识别。在最坏的情况下,假设故障处理器的实际数量不超过T(N),即处理器数量N的递增函数,证明诊断是正确的。结果表明,T(N) = 0 (N/sup 2/3/)。虽然正确,但诊断通常是不完整的。然而,使用概率技术表明,在保证最坏情况下的正确性的相同限制下,诊断很可能是完整的。
{"title":"Self diagnosis of processor arrays using a comparison model","authors":"P. Maestrini, P. Santi","doi":"10.1109/RELDIS.1995.526229","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526229","url":null,"abstract":"This paper introduces a diagnosing algorithm for bidimensional processor arrays, where processors are interconnected in horizontal and vertical meshes. For the purpose of diagnosis, the array is considered to be partitioned in square clusters of processors. The algorithm is based on interprocessor tests, using a comparison model. The algorithm, which is divided in four steps, called intracluster diagnosis, interluster diagnosis, fault-free core identification and augmentation, identifies a set of non-faulty and a set of faulty units. The diagnosis is proved to be correct in the worst case, assuming that the actual number of faulty processors is no more that T(N), an increasing function of the number N of processors. It is shown that T(N) is O(N/sup 2/3/). Although correct, the diagnosis is generally incomplete. However, using probabilistic techniques, it is shown that the diagnosis is very likely to be complete under the same limitations which ensure correctness in the worst case.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"42 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132728735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Membership and system diagnosis 成员和系统诊断
Pub Date : 1995-09-13 DOI: 10.1109/RELDIS.1995.526228
M. Hiltunen
A membership service is a service in a distributed system that maintains and provides information about which sites are functioning and which have failed at any given time. System diagnosis, on the other hand, is a method for detecting faulty processing elements and distributing this information to non-faulty elements. In spite of the apparent similarity of goals, these two fields have been considered separately from their beginnings. In this paper, we attempt to compare these fields and show the fundamental differences and the similarities. We demonstrate that the problems are closely related with the major differences being the assumptions made about the failure model, the testing methods, and the type of service guarantees provided to the application. Furthermore, we demonstrate that the fields are closely enough related that some algorithms utilized in one field can easily be transformed into algorithms in the other. As examples, we derive new membership algorithms from a distributed system diagnosis algorithm and new system diagnosis algorithms from membership algorithms.
成员服务是分布式系统中的一种服务,它在任何给定时间维护并提供有关哪些站点正在运行和哪些站点已失败的信息。另一方面,系统诊断是一种检测故障处理元件并将此信息分发给非故障元件的方法。尽管目标明显相似,但这两个领域从一开始就被分开考虑。在本文中,我们试图对这些领域进行比较,并指出它们的根本区别和相似之处。我们证明,这些问题与主要差异密切相关,主要差异是对故障模型、测试方法和提供给应用程序的服务保证类型的假设。此外,我们证明了这些领域是密切相关的,在一个领域中使用的一些算法可以很容易地转化为另一个领域的算法。作为例子,我们从分布式系统诊断算法中推导出新的隶属度算法,从隶属度算法中推导出新的系统诊断算法。
{"title":"Membership and system diagnosis","authors":"M. Hiltunen","doi":"10.1109/RELDIS.1995.526228","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526228","url":null,"abstract":"A membership service is a service in a distributed system that maintains and provides information about which sites are functioning and which have failed at any given time. System diagnosis, on the other hand, is a method for detecting faulty processing elements and distributing this information to non-faulty elements. In spite of the apparent similarity of goals, these two fields have been considered separately from their beginnings. In this paper, we attempt to compare these fields and show the fundamental differences and the similarities. We demonstrate that the problems are closely related with the major differences being the assumptions made about the failure model, the testing methods, and the type of service guarantees provided to the application. Furthermore, we demonstrate that the fields are closely enough related that some algorithms utilized in one field can easily be transformed into algorithms in the other. As examples, we derive new membership algorithms from a distributed system diagnosis algorithm and new system diagnosis algorithms from membership algorithms.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132899118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
System support for robust collaborative applications 对健壮的协作应用程序的系统支持
Pub Date : 1995-09-13 DOI: 10.1109/RELDIS.1995.526214
M. Chelliah, M. Ahamad
Traditional transaction models ensure robustness for distributed applications through the properties of view and failure atomicity. It has generally been felt that such atomicity properties are restrictive for a wide range of application domains; this is particularly true for robust, collaborative applications because such applications have concurrent components that are inherently long-lived and that cooperate. Recent advances in extended transaction models can be exploited to structure long-lived and cooperative computations. Applications can use a combination of such models to achieve the desired degree of robustness; hence, we develop a system which can support a number of flexible transaction models, with correctness criteria that extend or relax serializability. We analyze two concrete CSCW applications-collaborative editor and meeting scheduler. We show how a combination of two extended transaction models, that promote split and cooperating actions, facilitates robust implementations of these collaborative applications. Thus, we conclude that a system that implements multiple transaction models provides flexible support for building robust collaborative applications.
传统事务模型通过视图和故障原子性的属性确保分布式应用程序的健壮性。人们普遍认为,这种原子性特性对广泛的应用领域是有限制的;对于健壮的协作应用程序尤其如此,因为这类应用程序具有并发组件,这些组件本质上是长期存在的,并且相互协作。可以利用扩展事务模型的最新进展来构建长期的协作计算。应用程序可以使用这些模型的组合来达到所需的鲁棒性程度;因此,我们开发了一个系统,它可以支持许多灵活的事务模型,并具有扩展或放松序列化性的正确性标准。我们分析了两个具体的CSCW应用——协同编辑器和会议调度器。我们将展示两个扩展事务模型的组合如何促进分离和协作操作,从而促进这些协作应用程序的健壮实现。因此,我们得出结论,实现多个事务模型的系统为构建健壮的协作应用程序提供了灵活的支持。
{"title":"System support for robust collaborative applications","authors":"M. Chelliah, M. Ahamad","doi":"10.1109/RELDIS.1995.526214","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526214","url":null,"abstract":"Traditional transaction models ensure robustness for distributed applications through the properties of view and failure atomicity. It has generally been felt that such atomicity properties are restrictive for a wide range of application domains; this is particularly true for robust, collaborative applications because such applications have concurrent components that are inherently long-lived and that cooperate. Recent advances in extended transaction models can be exploited to structure long-lived and cooperative computations. Applications can use a combination of such models to achieve the desired degree of robustness; hence, we develop a system which can support a number of flexible transaction models, with correctness criteria that extend or relax serializability. We analyze two concrete CSCW applications-collaborative editor and meeting scheduler. We show how a combination of two extended transaction models, that promote split and cooperating actions, facilitates robust implementations of these collaborative applications. Thus, we conclude that a system that implements multiple transaction models provides flexible support for building robust collaborative applications.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130461469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TMR processing without explicit clock synchronisation 没有显式时钟同步的TMR处理
Pub Date : 1995-09-13 DOI: 10.1109/RELDIS.1995.526226
F. Brasileiro, P. Ezhilchelvan, N. Speirs
Replicated processing with majority voting is a well known method for achieving fault tolerance. Triple Modular Redundant (TMR) processing is the most commonly used version of that method. Replicated processing requires that the replicas reach agreement on the order in which messages are to be processed. Synchronous and deterministic ordering protocols published in the literature require that the replicas maintain an abstraction of clocks that are kept in known and bounded synchronism. We present a protocol for TMR systems that does not require this abstraction of synchronised clocks. We analyse the protocol performance and show that this protocol in practice can be at least as fast as any synchronised clock based ordering protocol. We also derive a faster protocol that has an improved performance in the absence of processor failures. We then build a TMR node and measure its performance to illustrate that the protocols developed here provide faster ordering and are easier to implement.
多数投票的复制处理是实现容错的一种众所周知的方法。三模冗余(TMR)处理是该方法最常用的版本。复制处理要求副本在处理消息的顺序上达成一致。文献中发布的同步和确定性排序协议要求副本维护时钟的抽象,这些时钟保持在已知和有限的同步中。我们提出了一种TMR系统协议,它不需要这种同步时钟的抽象。我们分析了协议的性能,并表明该协议在实践中至少可以与任何基于同步时钟的排序协议一样快。我们还推导出了一个更快的协议,在没有处理器故障的情况下提高了性能。然后,我们构建一个TMR节点并测量其性能,以说明这里开发的协议提供更快的排序并且更容易实现。
{"title":"TMR processing without explicit clock synchronisation","authors":"F. Brasileiro, P. Ezhilchelvan, N. Speirs","doi":"10.1109/RELDIS.1995.526226","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526226","url":null,"abstract":"Replicated processing with majority voting is a well known method for achieving fault tolerance. Triple Modular Redundant (TMR) processing is the most commonly used version of that method. Replicated processing requires that the replicas reach agreement on the order in which messages are to be processed. Synchronous and deterministic ordering protocols published in the literature require that the replicas maintain an abstraction of clocks that are kept in known and bounded synchronism. We present a protocol for TMR systems that does not require this abstraction of synchronised clocks. We analyse the protocol performance and show that this protocol in practice can be at least as fast as any synchronised clock based ordering protocol. We also derive a faster protocol that has an improved performance in the absence of processor failures. We then build a TMR node and measure its performance to illustrate that the protocols developed here provide faster ordering and are easier to implement.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131337564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A method for the construction and interpretation of high level models for distributed fault-tolerant systems 分布式容错系统高层模型的构建和解释方法
Pub Date : 1995-09-13 DOI: 10.1109/RELDIS.1995.526215
K. Tilly, István Kiss, G. Román, T. Dobrowiecki, A. Várkonyi-Kóczy
Traditional solutions for achieving fault-tolerance are intended for use at design time and they generally capture system information at a very low (hardware or machine instruction) level. Increasing reliability of complex information systems containing many (perhaps many thousands) of autonomous components requires different solutions. This article presents a new methodology for the implementation of large scale, distributed fault-tolerant systems. System models are formed of objects describing requirements, services and resources organized into high level top-down hierarchical decomposition structures. Since redundancy is a natural property of any large scale system, by using such models it is possible to achieve fault tolerant behaviour by finding multiple appropriate mappings between requirements and available services, and to support the required services by available resources. The distributed system is extended with dedicated components, called diagnostic centres, which manage distinct parts of the system model, continuously observe the operation of the distributed system, and find alternative requirement-service mappings, if some services fail to fulfil their associated requirements. The elements and the structure of the proposed system modelling method are presented, an appropriate fault model is defined, and the algorithms for model interpretation are described.
实现容错的传统解决方案旨在在设计时使用,它们通常在非常低的级别(硬件或机器指令)捕获系统信息。增加包含许多(可能数千个)自治组件的复杂信息系统的可靠性需要不同的解决方案。本文提出了一种实现大规模分布式容错系统的新方法。系统模型由描述需求、服务和资源的对象组成,这些对象被组织成高层自顶向下的分层分解结构。由于冗余是任何大规模系统的自然属性,通过使用这样的模型,可以通过在需求和可用服务之间找到多个适当的映射来实现容错行为,并通过可用资源支持所需的服务。分布式系统扩展了称为诊断中心的专用组件,这些组件管理系统模型的不同部分,持续观察分布式系统的运行,并在某些服务未能满足其相关需求时找到替代的需求-服务映射。提出了系统建模方法的基本原理和结构,定义了合适的故障模型,并描述了模型解释算法。
{"title":"A method for the construction and interpretation of high level models for distributed fault-tolerant systems","authors":"K. Tilly, István Kiss, G. Román, T. Dobrowiecki, A. Várkonyi-Kóczy","doi":"10.1109/RELDIS.1995.526215","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526215","url":null,"abstract":"Traditional solutions for achieving fault-tolerance are intended for use at design time and they generally capture system information at a very low (hardware or machine instruction) level. Increasing reliability of complex information systems containing many (perhaps many thousands) of autonomous components requires different solutions. This article presents a new methodology for the implementation of large scale, distributed fault-tolerant systems. System models are formed of objects describing requirements, services and resources organized into high level top-down hierarchical decomposition structures. Since redundancy is a natural property of any large scale system, by using such models it is possible to achieve fault tolerant behaviour by finding multiple appropriate mappings between requirements and available services, and to support the required services by available resources. The distributed system is extended with dedicated components, called diagnostic centres, which manage distinct parts of the system model, continuously observe the operation of the distributed system, and find alternative requirement-service mappings, if some services fail to fulfil their associated requirements. The elements and the structure of the proposed system modelling method are presented, an appropriate fault model is defined, and the algorithms for model interpretation are described.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127160731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Non blocking atomic commitment with an unreliable failure detector 具有不可靠故障检测器的非阻塞原子提交
Pub Date : 1995-09-13 DOI: 10.1109/RELDIS.1995.518722
R. Guerraoui, M. Larrea, A. Schiper
In a transactional system, an atomic commitment protocol ensures that for any transaction, all data manager processes agree on the same outcome (commit or abort). A non-blocking atomic commitment protocol enables an outcome to be decided at every correct process despite the failure of others. In this paper we apply, for the first time, the fundamental result of T. Chandra and S. Toueg (1991) on solving the abstract consensus problem, to non-blocking atomic commitment. More precisely, we present a non-blocking atomic commitment protocol in an asynchronous system augmented with an unreliable failure detector that can make an infinity of false failure suspicions. If no process is suspected to have failed, then our protocol is similar to a three phase commit protocol. In the case where processes are suspected, our protocol does not require any additional termination protocol: failure scenarios are handled within our regular protocol and are thus much simpler to manage.
在事务性系统中,原子提交协议确保对于任何事务,所有数据管理器进程都同意相同的结果(提交或中止)。非阻塞原子提交协议允许在每个正确的进程中决定结果,而不管其他进程是否失败。在本文中,我们首次将T. Chandra和S. Toueg(1991)关于解决抽象共识问题的基本结果应用于非阻塞原子承诺。更准确地说,我们提出了一种异步系统中的非阻塞原子提交协议,该协议增加了一个不可靠的故障检测器,可以产生无限个错误的故障怀疑。如果没有进程被怀疑失败,那么我们的协议类似于三阶段提交协议。在怀疑进程的情况下,我们的协议不需要任何额外的终止协议:故障场景在我们的常规协议中处理,因此更容易管理。
{"title":"Non blocking atomic commitment with an unreliable failure detector","authors":"R. Guerraoui, M. Larrea, A. Schiper","doi":"10.1109/RELDIS.1995.518722","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.518722","url":null,"abstract":"In a transactional system, an atomic commitment protocol ensures that for any transaction, all data manager processes agree on the same outcome (commit or abort). A non-blocking atomic commitment protocol enables an outcome to be decided at every correct process despite the failure of others. In this paper we apply, for the first time, the fundamental result of T. Chandra and S. Toueg (1991) on solving the abstract consensus problem, to non-blocking atomic commitment. More precisely, we present a non-blocking atomic commitment protocol in an asynchronous system augmented with an unreliable failure detector that can make an infinity of false failure suspicions. If no process is suspected to have failed, then our protocol is similar to a three phase commit protocol. In the case where processes are suspected, our protocol does not require any additional termination protocol: failure scenarios are handled within our regular protocol and are thus much simpler to manage.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121816814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
Experimental evaluation of the impact of processor faults on parallel applications 处理器故障对并行应用影响的实验评估
Pub Date : 1995-09-13 DOI: 10.1109/RELDIS.1995.518719
D. Costa, F. Moreira, H. Madeira, M. Z. Rela, J. G. Silva
This paper addresses the problem of processor faults in distributed memory parallel systems. It shows that transient faults injected at the processor pins of one node of a commercial parallel computer, without any particular fault-tolerant techniques, can cause erroneous application results for up to 43% of the injected faults (depending on the application). In addition to these very subtle faults, up to 19% of the injected faults (almost independent on the application) caused the system to hang up. These results show that fault-tolerant techniques are absolutely required in parallel systems, not only to ensure the completion of long-run applications but, and more important, to achieve confidence in the application results. The benefits of including some fairly simple behaviour based error detection mechanisms in the system were evaluated together with Algorithm Based Fault Tolerance (ABFT) techniques. The inclusion of such Mechanisms in parallel systems seems to be very important for detecting most of those subtle errors without greatly affecting the performance and the cost of these systems.
本文研究了分布式存储并行系统中的处理器故障问题。它表明,在商用并行计算机的一个节点的处理器引脚处注入的瞬态故障,如果没有任何特定的容错技术,可能导致高达43%的注入故障(取决于应用程序)导致错误的应用结果。除了这些非常细微的故障外,高达19%的注入故障(几乎与应用无关)导致系统挂起。这些结果表明,在并行系统中,容错技术是绝对需要的,不仅是为了确保长期运行的应用程序的完成,更重要的是,为了实现对应用程序结果的信任。在系统中加入一些相当简单的基于行为的错误检测机制的好处与基于算法的容错(ABFT)技术一起进行了评估。在并行系统中包含这样的机制似乎对于检测大多数这些细微的错误而不严重影响这些系统的性能和成本非常重要。
{"title":"Experimental evaluation of the impact of processor faults on parallel applications","authors":"D. Costa, F. Moreira, H. Madeira, M. Z. Rela, J. G. Silva","doi":"10.1109/RELDIS.1995.518719","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.518719","url":null,"abstract":"This paper addresses the problem of processor faults in distributed memory parallel systems. It shows that transient faults injected at the processor pins of one node of a commercial parallel computer, without any particular fault-tolerant techniques, can cause erroneous application results for up to 43% of the injected faults (depending on the application). In addition to these very subtle faults, up to 19% of the injected faults (almost independent on the application) caused the system to hang up. These results show that fault-tolerant techniques are absolutely required in parallel systems, not only to ensure the completion of long-run applications but, and more important, to achieve confidence in the application results. The benefits of including some fairly simple behaviour based error detection mechanisms in the system were evaluated together with Algorithm Based Fault Tolerance (ABFT) techniques. The inclusion of such Mechanisms in parallel systems seems to be very important for detecting most of those subtle errors without greatly affecting the performance and the cost of these systems.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133725197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A synchronization strategy for a time-triggered multicluster real-time system 时间触发多集群实时系统的同步策略
Pub Date : 1995-09-13 DOI: 10.1109/RELDIS.1995.526223
H. Kopetz, A. Krüger, D. Millinger, A. Schedl
The provision of a system-wide global time base with a good precision and sufficient accuracy is a fundamental prerequisite for the design of a multicluster distributed real-time system. We investigate the issues of clock synchronization in a multicluster system, where every node can have a different oscillator. Based on the parameter of a typical automotive distributed system we show that a precision and accuracy in the second range is achievable without undue effort.
提供具有良好精度和足够精度的全系统全局时基是设计多集群分布式实时系统的基本前提。我们研究了多集群系统中的时钟同步问题,其中每个节点都可以有不同的振荡器。基于一个典型的汽车分布式系统的参数,我们证明了在第二个范围内的精度和准确度是可以实现的。
{"title":"A synchronization strategy for a time-triggered multicluster real-time system","authors":"H. Kopetz, A. Krüger, D. Millinger, A. Schedl","doi":"10.1109/RELDIS.1995.526223","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526223","url":null,"abstract":"The provision of a system-wide global time base with a good precision and sufficient accuracy is a fundamental prerequisite for the design of a multicluster distributed real-time system. We investigate the issues of clock synchronization in a multicluster system, where every node can have a different oscillator. Based on the parameter of a typical automotive distributed system we show that a precision and accuracy in the second range is achievable without undue effort.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122789521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Configurable highly available distributed services 可配置的高可用分布式服务
Pub Date : 1995-09-13 DOI: 10.1109/RELDIS.1995.526219
C. Karamanolis, J. Magee
The paper addresses the problem of providing highly available services in distributed systems. In particular, we examine the situation where a service may be used by a large continuously changing set of clients. The requirements for providing services in this environment are analysed and an architecture and partial implementation for a replicated server group meeting a range of client requirements is presented. The architecture facilitates the dynamic configuration management of the replicated server group, while maintaining the service. Dynamic configuration management is required in order to replace failed replicas, upgrade the server implementation, or change the availability characteristics of the service. The paper reports on initial implementation results.
本文讨论了在分布式系统中提供高可用性服务的问题。特别地,我们将研究服务可能由一组不断变化的大型客户端使用的情况。分析了在此环境中提供服务的需求,并给出了满足一系列客户端需求的复制服务器组的体系结构和部分实现。该体系结构便于在维护服务的同时对复制的服务器组进行动态配置管理。为了替换失败的副本、升级服务器实现或更改服务的可用性特征,需要进行动态配置管理。该文件报告了初步实施结果。
{"title":"Configurable highly available distributed services","authors":"C. Karamanolis, J. Magee","doi":"10.1109/RELDIS.1995.526219","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526219","url":null,"abstract":"The paper addresses the problem of providing highly available services in distributed systems. In particular, we examine the situation where a service may be used by a large continuously changing set of clients. The requirements for providing services in this environment are analysed and an architecture and partial implementation for a replicated server group meeting a range of client requirements is presented. The architecture facilitates the dynamic configuration management of the replicated server group, while maintaining the service. Dynamic configuration management is required in order to replace failed replicas, upgrade the server implementation, or change the availability characteristics of the service. The paper reports on initial implementation results.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"601 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123193195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Designing masking fault-tolerance via nonmasking fault-tolerance 通过非掩模容错设计掩模容错
Pub Date : 1995-09-13 DOI: 10.1109/RELDIS.1995.526225
A. Arora, S. Kulkarni
Masking fault-tolerance guarantees that programs continually satisfy their specification in the presence of faults. By way of contrast, nonmasking fault-tolerance does not guarantee as much: it merely guarantees that when faults stop occurring, program executions converge to states from where programs continually (re)satisfy their specification. In this paper, we show that a practical method to design masking fault-tolerance is to first design nonmasking fault-tolerance and to then transform the nonmasking fault-tolerant program minimally so as to achieve masking fault-tolerance. We demonstrate this method by designing novel fully distributed programs for termination detection, mutual exclusion, and leader election, that are masking tolerant of any finite number of process fail-stops and/or repairs.
屏蔽容错保证了程序在出现错误的情况下持续满足它们的规范。相比之下,非屏蔽容错不能保证这么多:它只是保证当错误停止发生时,程序执行收敛到程序不断(重新)满足其规范的状态。本文提出了一种实用的屏蔽容错设计方法,即首先设计非屏蔽容错,然后对非屏蔽容错程序进行最小程度的变换,从而实现屏蔽容错。我们通过设计新颖的完全分布式程序来证明这种方法,用于终止检测,互斥和领导者选举,这些程序可以屏蔽任何有限数量的进程故障停止和/或修复。
{"title":"Designing masking fault-tolerance via nonmasking fault-tolerance","authors":"A. Arora, S. Kulkarni","doi":"10.1109/RELDIS.1995.526225","DOIUrl":"https://doi.org/10.1109/RELDIS.1995.526225","url":null,"abstract":"Masking fault-tolerance guarantees that programs continually satisfy their specification in the presence of faults. By way of contrast, nonmasking fault-tolerance does not guarantee as much: it merely guarantees that when faults stop occurring, program executions converge to states from where programs continually (re)satisfy their specification. In this paper, we show that a practical method to design masking fault-tolerance is to first design nonmasking fault-tolerance and to then transform the nonmasking fault-tolerant program minimally so as to achieve masking fault-tolerance. We demonstrate this method by designing novel fully distributed programs for termination detection, mutual exclusion, and leader election, that are masking tolerant of any finite number of process fail-stops and/or repairs.","PeriodicalId":275219,"journal":{"name":"Proceedings. 14th Symposium on Reliable Distributed Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129303136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
期刊
Proceedings. 14th Symposium on Reliable Distributed Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1