Pub Date : 1989-06-05DOI: 10.1109/ICDCS.1989.37957
W. Peng, S. Iyer
The problem of testing two processes (specified as finite-state machines) communicating asynchronously with each other using send and receive commands over a set of message types is considered for two forms of nonprogress: deadlock and unspecified reception. Since the nonprogress problem is undecidable, a dataflow approach is used to obtain sufficient conditions under which the two processes are free of deadlock and unspecified reception. The approximation analysis is based on weakening the receive operation. Polynomial time algorithms are presented to perform the analysis. This problem arises in the context of dataflow analysis of the processes that communicate by message passing and in the context of showing correctness of protocol specifications. Diagrams are provided for some networks that can be certified to be free of unspecified receptions using the algorithms. The problem of testing for deadlock in more than two processes still remains open.<>
{"title":"Analysis of communicating processes for non-progress","authors":"W. Peng, S. Iyer","doi":"10.1109/ICDCS.1989.37957","DOIUrl":"https://doi.org/10.1109/ICDCS.1989.37957","url":null,"abstract":"The problem of testing two processes (specified as finite-state machines) communicating asynchronously with each other using send and receive commands over a set of message types is considered for two forms of nonprogress: deadlock and unspecified reception. Since the nonprogress problem is undecidable, a dataflow approach is used to obtain sufficient conditions under which the two processes are free of deadlock and unspecified reception. The approximation analysis is based on weakening the receive operation. Polynomial time algorithms are presented to perform the analysis. This problem arises in the context of dataflow analysis of the processes that communicate by message passing and in the context of showing correctness of protocol specifications. Diagrams are provided for some networks that can be certified to be free of unspecified receptions using the algorithms. The problem of testing for deadlock in more than two processes still remains open.<<ETX>>","PeriodicalId":266544,"journal":{"name":"[1989] Proceedings. The 9th International Conference on Distributed Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1989-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132457764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1989-06-05DOI: 10.1109/ICDCS.1989.37961
Peng Liu, Y. Kiyoki, T. Masuda
Several effective algorithms are presented for the optimal allocation of computer resources in a proposed stream-oriented parallel-processing scheme for database operations. These algorithms can be utilized to obtain the optimal allocation of memory resources for every type of query in sequential-processing environments, parallel-processing environments with shared-memory multiprocessors, and distributed-processing environments. The computation complexities of the proposed algorithms are analyzed and used to clarify the effectiveness of those algorithms.<>
{"title":"Efficient algorithms for resource allocation in distributed and parallel query processing environments","authors":"Peng Liu, Y. Kiyoki, T. Masuda","doi":"10.1109/ICDCS.1989.37961","DOIUrl":"https://doi.org/10.1109/ICDCS.1989.37961","url":null,"abstract":"Several effective algorithms are presented for the optimal allocation of computer resources in a proposed stream-oriented parallel-processing scheme for database operations. These algorithms can be utilized to obtain the optimal allocation of memory resources for every type of query in sequential-processing environments, parallel-processing environments with shared-memory multiprocessors, and distributed-processing environments. The computation complexities of the proposed algorithms are analyzed and used to clarify the effectiveness of those algorithms.<<ETX>>","PeriodicalId":266544,"journal":{"name":"[1989] Proceedings. The 9th International Conference on Distributed Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1989-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131581310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1989-06-05DOI: 10.1109/ICDCS.1989.37944
S. Yemini, G. Goldszmidt, A. Stoyen, Yi-Hsiu Wei, Langdon W. Beeck
Concert, a high-level-language approach to programming heterogeneous distributed systems, is described. The Concert model introduces a small set of language extensions into conventional procedural languages. These language extensions support a cooperative peer process model which addresses in the distributed environment the same issues addressed by language semantics in the conventional environment. The Concert implementation provides layered support for these language extensions, bridging a different source of heterogeneity at each layer. A prototype Concert system currently includes C programs running on OS/2 on multiple PS/2s communicating via calls with one another as well as with PL/I programs running on VM/370.<>
{"title":"CONCERT: a high-level-language approach to heterogeneous distributed systems","authors":"S. Yemini, G. Goldszmidt, A. Stoyen, Yi-Hsiu Wei, Langdon W. Beeck","doi":"10.1109/ICDCS.1989.37944","DOIUrl":"https://doi.org/10.1109/ICDCS.1989.37944","url":null,"abstract":"Concert, a high-level-language approach to programming heterogeneous distributed systems, is described. The Concert model introduces a small set of language extensions into conventional procedural languages. These language extensions support a cooperative peer process model which addresses in the distributed environment the same issues addressed by language semantics in the conventional environment. The Concert implementation provides layered support for these language extensions, bridging a different source of heterogeneity at each layer. A prototype Concert system currently includes C programs running on OS/2 on multiple PS/2s communicating via calls with one another as well as with PL/I programs running on VM/370.<<ETX>>","PeriodicalId":266544,"journal":{"name":"[1989] Proceedings. The 9th International Conference on Distributed Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1989-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131672779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1989-06-05DOI: 10.1109/ICDCS.1989.37948
S. Davidson, Insup Lee, V. Wolfe
A model and correctness criteria for timed atomic commitment (TAC) are presented which require the processes to be functionally consistent, but allow the outcome to include an exceptional state, indicating that timing constraints have been violated. Correct TAC behavior is defined by presenting an abstract description of the processes involved in the commitment and minimal correctness criteria for their behavior. The correctness criteria capture the intuitive notion that an exception outcome should only occur in the presence of faults, and an aborted outcome should only occur if faults occur or some process votes no. A centralized two-phase commit protocol was modified to meet the correctness criteria by introducing deadlines on the various stages the participants go through (voting and performing), and on the decision phase for the coordinator. The deadlines are derived using several system parameters: maximum message delay, clock drift, and execution time. The protocol is then shown to be correct.<>
{"title":"A protocol for timed atomic commitment","authors":"S. Davidson, Insup Lee, V. Wolfe","doi":"10.1109/ICDCS.1989.37948","DOIUrl":"https://doi.org/10.1109/ICDCS.1989.37948","url":null,"abstract":"A model and correctness criteria for timed atomic commitment (TAC) are presented which require the processes to be functionally consistent, but allow the outcome to include an exceptional state, indicating that timing constraints have been violated. Correct TAC behavior is defined by presenting an abstract description of the processes involved in the commitment and minimal correctness criteria for their behavior. The correctness criteria capture the intuitive notion that an exception outcome should only occur in the presence of faults, and an aborted outcome should only occur if faults occur or some process votes no. A centralized two-phase commit protocol was modified to meet the correctness criteria by introducing deadlines on the various stages the participants go through (voting and performing), and on the decision phase for the coordinator. The deadlines are derived using several system parameters: maximum message delay, clock drift, and execution time. The protocol is then shown to be correct.<<ETX>>","PeriodicalId":266544,"journal":{"name":"[1989] Proceedings. The 9th International Conference on Distributed Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1989-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115454916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1989-06-05DOI: 10.1109/ICDCS.1989.37942
A. Farrag, R. Dawson
The authors studied the design of a fault-tolerant extension for a graph G which can survive at most m node failures, and which contains the minimum number of nodes and the fewest possible edges when the nonredundant graph (G) is a complete multipartite graph. After developing a characterization for m-fault-tolerant extensions and for optimal m-fault-tolerant extensions of a complete multipartite graph, this characterization is used to develop a procedure to construct an optimal m-fault-tolerant extension of any complete multipartite graph, for any m>or=0. The procedure is only useful when the size of the graph is relatively small, since the search time required is exponential. Several necessary conditions on any (optimal) m-fault-tolerant extension of a complete multipartite graph are proved. These conditions allow identification of some optimal m-fault-tolerant extensions of several special cases of a complete multipartite graph without performing any search.<>
{"title":"Fault-tolerant extensions of complete multipartite networks","authors":"A. Farrag, R. Dawson","doi":"10.1109/ICDCS.1989.37942","DOIUrl":"https://doi.org/10.1109/ICDCS.1989.37942","url":null,"abstract":"The authors studied the design of a fault-tolerant extension for a graph G which can survive at most m node failures, and which contains the minimum number of nodes and the fewest possible edges when the nonredundant graph (G) is a complete multipartite graph. After developing a characterization for m-fault-tolerant extensions and for optimal m-fault-tolerant extensions of a complete multipartite graph, this characterization is used to develop a procedure to construct an optimal m-fault-tolerant extension of any complete multipartite graph, for any m>or=0. The procedure is only useful when the size of the graph is relatively small, since the search time required is exponential. Several necessary conditions on any (optimal) m-fault-tolerant extension of a complete multipartite graph are proved. These conditions allow identification of some optimal m-fault-tolerant extensions of several special cases of a complete multipartite graph without performing any search.<<ETX>>","PeriodicalId":266544,"journal":{"name":"[1989] Proceedings. The 9th International Conference on Distributed Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1989-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116144027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1989-06-05DOI: 10.1109/ICDCS.1989.37958
F. Cristian
A probabilistic method is proposed for reading remote clocks in distributed systems subject to unbounded random communication delays. The method can achieve clock synchronization precisions superior to those attainable by previously published clock synchronization algorithms. The method can be used to improve the precision of both internal and external synchronization algorithms. The approach is probabilistic because it does not guarantee that a processor can always read a remote clock with an a priori specified precision; however, by retrying a sufficient number of times, a process can read the clock of another process with a given precision with a probability as close to one as desired. An important characteristic of the method is that, when a process succeeds in reading a remote clock, it knows the actual reading precision achieved. The use of the remote clock reading methods is illustrated by presenting a time service which maintains externally (and, hence, internally) synchronized clocks in the presence of process, communication, and clock failures.<>
{"title":"A probabilistic approach to distributed clock synchronization","authors":"F. Cristian","doi":"10.1109/ICDCS.1989.37958","DOIUrl":"https://doi.org/10.1109/ICDCS.1989.37958","url":null,"abstract":"A probabilistic method is proposed for reading remote clocks in distributed systems subject to unbounded random communication delays. The method can achieve clock synchronization precisions superior to those attainable by previously published clock synchronization algorithms. The method can be used to improve the precision of both internal and external synchronization algorithms. The approach is probabilistic because it does not guarantee that a processor can always read a remote clock with an a priori specified precision; however, by retrying a sufficient number of times, a process can read the clock of another process with a given precision with a probability as close to one as desired. An important characteristic of the method is that, when a process succeeds in reading a remote clock, it knows the actual reading precision achieved. The use of the remote clock reading methods is illustrated by presenting a time service which maintains externally (and, hence, internally) synchronized clocks in the presence of process, communication, and clock failures.<<ETX>>","PeriodicalId":266544,"journal":{"name":"[1989] Proceedings. The 9th International Conference on Distributed Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1989-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130712241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1989-06-05DOI: 10.1109/ICDCS.1989.37950
Y. Ofek
A technique is described for constructing a fault-tolerant global clock in a point-to-point distributed system with an arbitrary topology, which constitutes a wide-area network. It is assumed that the network is constructed of optical links with very high transmission rates. The approach used is to generate a global clock from the ensemble of the local transmission clocks, and not to synchronize these high-speed clocks directly. The steady-state algorithm which generates the global system clock is executed in hardware by the network interface of each node. As a result, it is possible to estimate accurately intermodal delays and thereby to achieve a much tighter synchronization than with other methods. The basic synchronization time step is proportional to the error or uncertainty in the measurement of the end-to-end network delay rather than to the actual value of the end-to-end network delay. Node and network models are presented, and the synchronization condition is defined. The synchronization algorithm, its bound, and its correctness proof are presented. A procedure is described for detecting and isolating a faulty component, while maintaining the integrity of the global clock.<>
{"title":"Generating a fault tolerant global clock in a high speed distributed system","authors":"Y. Ofek","doi":"10.1109/ICDCS.1989.37950","DOIUrl":"https://doi.org/10.1109/ICDCS.1989.37950","url":null,"abstract":"A technique is described for constructing a fault-tolerant global clock in a point-to-point distributed system with an arbitrary topology, which constitutes a wide-area network. It is assumed that the network is constructed of optical links with very high transmission rates. The approach used is to generate a global clock from the ensemble of the local transmission clocks, and not to synchronize these high-speed clocks directly. The steady-state algorithm which generates the global system clock is executed in hardware by the network interface of each node. As a result, it is possible to estimate accurately intermodal delays and thereby to achieve a much tighter synchronization than with other methods. The basic synchronization time step is proportional to the error or uncertainty in the measurement of the end-to-end network delay rather than to the actual value of the end-to-end network delay. Node and network models are presented, and the synchronization condition is defined. The synchronization algorithm, its bound, and its correctness proof are presented. A procedure is described for detecting and isolating a faulty component, while maintaining the integrity of the global clock.<<ETX>>","PeriodicalId":266544,"journal":{"name":"[1989] Proceedings. The 9th International Conference on Distributed Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1989-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125564631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1989-06-05DOI: 10.1109/ICDCS.1989.37943
Yuan-Bao Shieh, D. Ghosal, P. R. Chintamaneni, S. Tripathi
Analytical models are presented that use Petri nets for fault-tolerant schemes used in distributed systems. These models are used in the quantitative evaluation and selection of good fault-tolerant schemes for specific system configurations. Several different fault-tolerant schemes that can be modeled using Petri nets are discussed in detail. These schemes include rollback recovery with checkpointing, recovery blocks, N-version programming, and conversations. After a brief review of Petri net models, extension of the Petri net models to incorporate fault-tolerant schemes is considered. A methodology for evaluating a fault-tolerant scheme for a specific system configuration and the steps involved in building a Petri net model of a fault-tolerant system are described. The subnet primitives involved in building these models are identified and an algorithm for building the models automatically is described. Examples illustrating this extended Petri net model are discussed and numerical results are presented to show the applicability of the models.<>
{"title":"Application of Petri net models for the evaluation of fault-tolerant techniques in distributed systems","authors":"Yuan-Bao Shieh, D. Ghosal, P. R. Chintamaneni, S. Tripathi","doi":"10.1109/ICDCS.1989.37943","DOIUrl":"https://doi.org/10.1109/ICDCS.1989.37943","url":null,"abstract":"Analytical models are presented that use Petri nets for fault-tolerant schemes used in distributed systems. These models are used in the quantitative evaluation and selection of good fault-tolerant schemes for specific system configurations. Several different fault-tolerant schemes that can be modeled using Petri nets are discussed in detail. These schemes include rollback recovery with checkpointing, recovery blocks, N-version programming, and conversations. After a brief review of Petri net models, extension of the Petri net models to incorporate fault-tolerant schemes is considered. A methodology for evaluating a fault-tolerant scheme for a specific system configuration and the steps involved in building a Petri net model of a fault-tolerant system are described. The subnet primitives involved in building these models are identified and an algorithm for building the models automatically is described. Examples illustrating this extended Petri net model are discussed and numerical results are presented to show the applicability of the models.<<ETX>>","PeriodicalId":266544,"journal":{"name":"[1989] Proceedings. The 9th International Conference on Distributed Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1989-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126568788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1989-06-05DOI: 10.1109/ICDCS.1989.37991
Ching-Liang Huang, V. Li
A replication control protocol utilizing dynamic voting is presented for ensuring database correctness so that the system behaves like a one-copy database to the users. The protocol dynamically adjusts vote assignment of data items in response to failures and recoveries, thus maintaining higher data availability than static voting schemes in the event of network partitioning. Unlike existing dynamic voting schemes, it supports inexpensive read operations which access one copy, rather than all copies, of each data item read. Since read operations outnumber write operations in most applications, this protocol enjoys better performance. With this protocol, transactions run in one of three modes: normal mode, missing-partition mode, or pseudo-normal mode. Because a partition number and a last current copy cardinality are associated with each copy, read operations only require one copy of a data item when run in the normal mode.<>
{"title":"Missing-partition dynamic voting scheme for replicated database systems","authors":"Ching-Liang Huang, V. Li","doi":"10.1109/ICDCS.1989.37991","DOIUrl":"https://doi.org/10.1109/ICDCS.1989.37991","url":null,"abstract":"A replication control protocol utilizing dynamic voting is presented for ensuring database correctness so that the system behaves like a one-copy database to the users. The protocol dynamically adjusts vote assignment of data items in response to failures and recoveries, thus maintaining higher data availability than static voting schemes in the event of network partitioning. Unlike existing dynamic voting schemes, it supports inexpensive read operations which access one copy, rather than all copies, of each data item read. Since read operations outnumber write operations in most applications, this protocol enjoys better performance. With this protocol, transactions run in one of three modes: normal mode, missing-partition mode, or pseudo-normal mode. Because a partition number and a last current copy cardinality are associated with each copy, read operations only require one copy of a data item when run in the normal mode.<<ETX>>","PeriodicalId":266544,"journal":{"name":"[1989] Proceedings. The 9th International Conference on Distributed Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1989-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129178331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1989-06-05DOI: 10.1109/ICDCS.1989.37968
D. Kotz, C. Ellis
Performance considerations affecting the design of a mechanism that preserves locality and avoids high-latency remote references called the concurrent pools data structure are explored. The effectiveness of three different implementations of concurrent pools is evaluated. Experiments performed on a BBN Butterfly multiprocessor under a variety of workloads shown that the three implementations perform similarly well for light workloads, but that with stressful workloads it appears that a simple algorithm can provide better performance than a complex algorithm, designed to keep remote accesses to a minimum. Implementations can benefit by taking into account information on the nature of the operations performed by each process to help balance the elements among processes that need them.<>
{"title":"Evaluation of concurrent pools","authors":"D. Kotz, C. Ellis","doi":"10.1109/ICDCS.1989.37968","DOIUrl":"https://doi.org/10.1109/ICDCS.1989.37968","url":null,"abstract":"Performance considerations affecting the design of a mechanism that preserves locality and avoids high-latency remote references called the concurrent pools data structure are explored. The effectiveness of three different implementations of concurrent pools is evaluated. Experiments performed on a BBN Butterfly multiprocessor under a variety of workloads shown that the three implementations perform similarly well for light workloads, but that with stressful workloads it appears that a simple algorithm can provide better performance than a complex algorithm, designed to keep remote accesses to a minimum. Implementations can benefit by taking into account information on the nature of the operations performed by each process to help balance the elements among processes that need them.<<ETX>>","PeriodicalId":266544,"journal":{"name":"[1989] Proceedings. The 9th International Conference on Distributed Computing Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1989-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129223744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}