Pub Date : 1995-06-27DOI: 10.1109/FTCS.1995.466986
S. Rangarajan, K. Ratnam, A. Dahbura
In this paper, we present a fault-tolerant protocol for maintaining location directories in mobile networks. The protocol tolerates base station failures and also allows for consistent location information to be maintained about mobile hosts that switch off and arbitrarily reappear in some other part of the network. Further, the protocol tolerates the corruption of a logical time stamp that is part of any protocol where new location information has to be distinguished from old location information when a location directory is updated. We formally show that the protocol maintains consistent location information and does not overwrite new location information with old location information. The protocol can be hierarchically organized to reduce the message overhead incurred by location directory updates.<>
{"title":"A fault-tolerant protocol for location directory maintenance in mobile networks","authors":"S. Rangarajan, K. Ratnam, A. Dahbura","doi":"10.1109/FTCS.1995.466986","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466986","url":null,"abstract":"In this paper, we present a fault-tolerant protocol for maintaining location directories in mobile networks. The protocol tolerates base station failures and also allows for consistent location information to be maintained about mobile hosts that switch off and arbitrarily reappear in some other part of the network. Further, the protocol tolerates the corruption of a logical time stamp that is part of any protocol where new location information has to be distinguished from old location information when a location directory is updated. We formally show that the protocol maintains consistent location information and does not overwrite new location information with old location information. The protocol can be hierarchically organized to reduce the message overhead incurred by location directory updates.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131159968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-06-27DOI: 10.1109/FTCS.1995.466945
Christopher P. Dingman, Joe Marshall, D. Siewiorek
In commercial literature, the meaning of the term fault tolerant has become vague. We describe a system used to measure the robustness of a fault tolerant aerospace system developed at IBM, present the data collected during the project, and report conclusions and areas for future work.<>
{"title":"Measuring robustness of a fault tolerant aerospace system","authors":"Christopher P. Dingman, Joe Marshall, D. Siewiorek","doi":"10.1109/FTCS.1995.466945","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466945","url":null,"abstract":"In commercial literature, the meaning of the term fault tolerant has become vague. We describe a system used to measure the robustness of a fault tolerant aerospace system developed at IBM, present the data collected during the project, and report conclusions and areas for future work.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115330395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-06-27DOI: 10.1109/FTCS.1995.466996
S. Poledna
The high availability and safety requirements for automotive electronics are currently almost exclusively addressed by application specific engineering solutions to fault tolerance rather than by systematic approaches. Currently, systematic approaches are ruled out because of cost. The reason for this is that a systematic approach to fault tolerance requires: replication of components; and communication between replicated components to achieve agreement despite nondeterminism. While replicated components become more and more available with the connection of different control units by means of a multiplex bus, it is shown that the cost of agreement on sensor inputs will become the limiting factor for systematic approaches to fault tolerance. For that reason a new agreement algorithm is introduced which considers the problem of agreement and sensor inputs in an integrated fashion. This algorithm takes advantage of the a priori knowledge on the maximum deviation of replicated sensor inputs. Optimality of this algorithm is shown with respect to the minimum number of bits for agreement. This algorithm allows broader application of systematic fault tolerance to automotive applications. The result of this work will be used for a prototype implementation of a safety critical automotive application.<>
{"title":"Fault tolerance in safety critical automotive applications: cost of agreement as a limiting factor","authors":"S. Poledna","doi":"10.1109/FTCS.1995.466996","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466996","url":null,"abstract":"The high availability and safety requirements for automotive electronics are currently almost exclusively addressed by application specific engineering solutions to fault tolerance rather than by systematic approaches. Currently, systematic approaches are ruled out because of cost. The reason for this is that a systematic approach to fault tolerance requires: replication of components; and communication between replicated components to achieve agreement despite nondeterminism. While replicated components become more and more available with the connection of different control units by means of a multiplex bus, it is shown that the cost of agreement on sensor inputs will become the limiting factor for systematic approaches to fault tolerance. For that reason a new agreement algorithm is introduced which considers the problem of agreement and sensor inputs in an integrated fashion. This algorithm takes advantage of the a priori knowledge on the maximum deviation of replicated sensor inputs. Optimality of this algorithm is shown with respect to the minimum number of bits for agreement. This algorithm allows broader application of systematic fault tolerance to automotive applications. The result of this work will be used for a prototype implementation of a safety critical automotive application.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123379617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-06-27DOI: 10.1109/FTCS.1995.466978
Jens Güthoff, V. Sieh
Fault/error injection has emerged as a valuable means for evaluating the dependability of a system. In particular, software-based techniques (which can be described as software-implemented and simulation-based techniques) have become very popular because of the relative simplicity of injecting faults. After discussing the advantages and drawbacks of these techniques, two approaches are introduced which try to overcome crucial problems when using software-based fault injection techniques. The first one improves the accuracy of software-implemented fault injection experiments. The second one offers detailed insights into the system dynamics in the presence of faults. With this knowledge, the number of fault injections (a major concern in simulation-based fault injection) can be significantly reduced. These approaches can be joined together, offering accuracy of fault injection results as well as transparency of the system dynamics in the presence of faults. A case study is shown in which the de facto dependability properties of a standard component, a Motorola MC88100 RISC processor, are evaluated.<>
{"title":"Combining software-implemented and simulation-based fault injection into a single fault injection method","authors":"Jens Güthoff, V. Sieh","doi":"10.1109/FTCS.1995.466978","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466978","url":null,"abstract":"Fault/error injection has emerged as a valuable means for evaluating the dependability of a system. In particular, software-based techniques (which can be described as software-implemented and simulation-based techniques) have become very popular because of the relative simplicity of injecting faults. After discussing the advantages and drawbacks of these techniques, two approaches are introduced which try to overcome crucial problems when using software-based fault injection techniques. The first one improves the accuracy of software-implemented fault injection experiments. The second one offers detailed insights into the system dynamics in the presence of faults. With this knowledge, the number of fault injections (a major concern in simulation-based fault injection) can be significantly reduced. These approaches can be joined together, offering accuracy of fault injection results as well as transparency of the system dynamics in the presence of faults. A case study is shown in which the de facto dependability properties of a standard component, a Motorola MC88100 RISC processor, are evaluated.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129603888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-06-27DOI: 10.1109/FTCS.1995.466994
T. Krol
This paper presents a new class of synchronous deterministic non authenticated algorithms for reaching interactive consistency (Byzantine agreement). The algorithms are based on voting and error correcting codes and require considerably less data communication than the original algorithm, whereas the number of rounds and the number of modules meet the minimum bounds. These algorithms based on voting and coding are defined and proved on the basis of a class of algorithms, called the dispersed joined communication algorithms.<>
{"title":"Interactive consistency algorithms based on voting and error-correcting codes","authors":"T. Krol","doi":"10.1109/FTCS.1995.466994","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466994","url":null,"abstract":"This paper presents a new class of synchronous deterministic non authenticated algorithms for reaching interactive consistency (Byzantine agreement). The algorithms are based on voting and error correcting codes and require considerably less data communication than the original algorithm, whereas the number of rounds and the number of modules meet the minimum bounds. These algorithms based on voting and coding are defined and proved on the basis of a class of algorithms, called the dispersed joined communication algorithms.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129076185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-06-27DOI: 10.1109/FTCS.1995.466952
N. Saxena, Chien Chen, R. Swami, H. Osone, Shalesh Thusoo, D. Lyon, D. Chang, Anand Dharmaraj, N. Patkar, Yizhi Lu, Ben-Hau Chia
The HaL SPARC64 Processor, the first 64-bit SPARC-V9 architecture implementation, uses several techniques to ensure a high degree of system reliability, error detection, and error recovery. The CPU of the multi-chip module processor has a superscalar, speculative issue unit, and an out-of-order execution datapath. These two processor components complicate the maintenance of precise state in the event of errors. By exploiting the SPARC-V9 architectural features, and the micro-architecture for speculative execution, SPARC64 maintains precise state in the event of exceptions and errors, logs and reports errors, and facilitates error detection during full system bringup. The paper presents details of error detection and handling in the CPU, the cache system, and the Memory Management Unit(MMU). The HaL R1 system also implements a fault-secure memory system design. The memory system corrects all single-bit errors, detects double bit errors, detects single address line failures, and detects all single dynamic RAM (DRAM) chip failures. Certain debug features have been added to the system that are useful during system bring-up.<>
HaL SPARC64处理器是第一个64位SPARC-V9架构实现,它使用了几种技术来确保高度的系统可靠性、错误检测和错误恢复。多片模块处理器的CPU具有超标量、推测问题单元和乱序执行数据路径。这两个处理器组件使在发生错误时精确状态的维护复杂化。通过利用SPARC-V9体系结构特性和用于推测执行的微体系结构,SPARC64在发生异常和错误时保持精确的状态,记录和报告错误,并在整个系统启动期间促进错误检测。本文详细介绍了CPU、缓存系统和内存管理单元(MMU)的错误检测和处理。HaL R1系统还实现了故障安全存储系统设计。内存系统可以纠正所有的单比特错误,检测双比特错误,检测单地址线故障,以及检测所有的单动态RAM (DRAM)芯片故障。某些调试功能已添加到系统中,这些功能在系统启动期间很有用。
{"title":"Error detection and handling in a superscalar, speculative out-of-order execution processor system","authors":"N. Saxena, Chien Chen, R. Swami, H. Osone, Shalesh Thusoo, D. Lyon, D. Chang, Anand Dharmaraj, N. Patkar, Yizhi Lu, Ben-Hau Chia","doi":"10.1109/FTCS.1995.466952","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466952","url":null,"abstract":"The HaL SPARC64 Processor, the first 64-bit SPARC-V9 architecture implementation, uses several techniques to ensure a high degree of system reliability, error detection, and error recovery. The CPU of the multi-chip module processor has a superscalar, speculative issue unit, and an out-of-order execution datapath. These two processor components complicate the maintenance of precise state in the event of errors. By exploiting the SPARC-V9 architectural features, and the micro-architecture for speculative execution, SPARC64 maintains precise state in the event of exceptions and errors, logs and reports errors, and facilitates error detection during full system bringup. The paper presents details of error detection and handling in the CPU, the cache system, and the Memory Management Unit(MMU). The HaL R1 system also implements a fault-secure memory system design. The memory system corrects all single-bit errors, detects double bit errors, detects single address line failures, and detects all single dynamic RAM (DRAM) chip failures. Certain debug features have been added to the system that are useful during system bring-up.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130307156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-06-27DOI: 10.1109/FTCS.1995.466975
S. Piestrak, T. Nanya
Considers designing quasi-delay-insensitive (QDI) combinational circuits (CCs), a class of self-timed (asynchronous) circuits. The necessity of coding both inputs and outputs of any QDI CC by using unordered codes naturally leads to inverter-free realization. The analysis of behavior of a QDI CC with input errors leads to the observation that it is impossible to avoid the so-called late detection problem. The new set of correct definitions of the code-disjoint QDI CC and of the totally self-checking (TSC) QDI CC is introduced. The detailed analysis of the behavior of a faulty QDI system with internal permanent faults shows that: (1) late detection, (2) the possibility of occurrence of invalid transitions, and (3) premature completion, seem to be the inherent properties of any QDI CC, which preclude its fault-secure (hence TSC) implementation for some single stuck-at faults. The first ever self-testing code-disjoint completion checker is proposed. Finally, an extensive study of designing self-testing code-disjoint QDI CCs is presented.<>
{"title":"Towards totally self-checking delay-insensitive systems","authors":"S. Piestrak, T. Nanya","doi":"10.1109/FTCS.1995.466975","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466975","url":null,"abstract":"Considers designing quasi-delay-insensitive (QDI) combinational circuits (CCs), a class of self-timed (asynchronous) circuits. The necessity of coding both inputs and outputs of any QDI CC by using unordered codes naturally leads to inverter-free realization. The analysis of behavior of a QDI CC with input errors leads to the observation that it is impossible to avoid the so-called late detection problem. The new set of correct definitions of the code-disjoint QDI CC and of the totally self-checking (TSC) QDI CC is introduced. The detailed analysis of the behavior of a faulty QDI system with internal permanent faults shows that: (1) late detection, (2) the possibility of occurrence of invalid transitions, and (3) premature completion, seem to be the inherent properties of any QDI CC, which preclude its fault-secure (hence TSC) implementation for some single stuck-at faults. The first ever self-testing code-disjoint completion checker is proposed. Finally, an extensive study of designing self-testing code-disjoint QDI CCs is presented.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"12 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120848332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-06-27DOI: 10.1109/FTCS.1995.466979
D. Blough, F. Kurdahi, S. Ohm
The paper considers the problem of automatic insertion of recovery points in recoverable microarchitectures. Previous work on this problem provided heuristic algorithms that attempted either to minimize computation time with a bounded hardware overhead or to minimize hardware overhead with a bounded computation time. We present efficient algorithms that provide provably optimal solutions for both of these formulations of the problem. These algorithms take as their input a scheduled control-data flow graph describing the behavior of the system and they output either a minimum-time or a minimum-cost set of recovery point locations. We demonstrate the performance of our algorithms using some well-known benchmark control-data flow graphs. Over all parameter values for each of these benchmarks, our optimal algorithms are shown to perform as well as, and in many cases better than, the previously proposed heuristics.<>
{"title":"Optimal recovery point insertion for high-level synthesis of recoverable microarchitectures","authors":"D. Blough, F. Kurdahi, S. Ohm","doi":"10.1109/FTCS.1995.466979","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466979","url":null,"abstract":"The paper considers the problem of automatic insertion of recovery points in recoverable microarchitectures. Previous work on this problem provided heuristic algorithms that attempted either to minimize computation time with a bounded hardware overhead or to minimize hardware overhead with a bounded computation time. We present efficient algorithms that provide provably optimal solutions for both of these formulations of the problem. These algorithms take as their input a scheduled control-data flow graph describing the behavior of the system and they output either a minimum-time or a minimum-cost set of recovery point locations. We demonstrate the performance of our algorithms using some well-known benchmark control-data flow graphs. Over all parameter values for each of these benchmarks, our optimal algorithms are shown to perform as well as, and in many cases better than, the previously proposed heuristics.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134157141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-06-27DOI: 10.1109/FTCS.1995.466997
M. Russinovich, Z. Segall
The concept of middleware provides a transparent way to augment and change the characteristics of a service provider as seen from a client. Fault tolerant policies are ideal candidates for middleware implementation. We have defined and implemented operating system based middleware support that provides the power and flexibility needed by diverse fault tolerant policies. This mechanism, called the sentry, has been built into the UNIX 4.3 BSD operating system server running on a Mach 3.0 kernel. To demonstrate the effectiveness of the mechanism several policies have been implemented using sentries including checkpointing and journaling. The implementation shows that complex fault tolerant policies can be efficiently and transparently implemented as middleware. Performance overhead of input journaling is less than 5% and application suspension during the checkpoint is typically under 10 seconds in length. A standard hard disk is used to store journal and checkpoint information with dedicated storage requirements of less than 20 MB.<>
{"title":"Fault-tolerance for off-the-shelf applications and hardware","authors":"M. Russinovich, Z. Segall","doi":"10.1109/FTCS.1995.466997","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466997","url":null,"abstract":"The concept of middleware provides a transparent way to augment and change the characteristics of a service provider as seen from a client. Fault tolerant policies are ideal candidates for middleware implementation. We have defined and implemented operating system based middleware support that provides the power and flexibility needed by diverse fault tolerant policies. This mechanism, called the sentry, has been built into the UNIX 4.3 BSD operating system server running on a Mach 3.0 kernel. To demonstrate the effectiveness of the mechanism several policies have been implemented using sentries including checkpointing and journaling. The implementation shows that complex fault tolerant policies can be efficiently and transparently implemented as middleware. Performance overhead of input journaling is less than 5% and application suspension during the checkpoint is typically under 10 seconds in length. A standard hard disk is used to store journal and checkpoint information with dedicated storage requirements of less than 20 MB.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128628626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-06-27DOI: 10.1109/FTCS.1995.466982
W. Baker, R. Horst, D. Sonnier, W. Watson
The paper introduces a new fault-tolerant architecture that combines the best attributes of the software fault-tolerant Tandem NonStop systems with the hardware fault-tolerant integrity systems. This architecture is based on the ServerNet System Area Network (SAN). ServerNet, formerly called TNet, is a packetized byte-serial multistage network that supports both I/O and interprocessor traffic in fault-tolerant systems. Dual-ported CPUs and VO controllers connect to independent subnetworks in a variety of different network topologies. Systems can expand either through shared or distributed memory multiprocessing. A separate maintenance system controls system initialization, online configuration changes, and error reporting. The architecture's flexibility makes it suitable for a wide range of environments with varying requirements for performance, fault tolerance, and software compatibility.<>
{"title":"A flexible ServerNet-based fault-tolerant architecture","authors":"W. Baker, R. Horst, D. Sonnier, W. Watson","doi":"10.1109/FTCS.1995.466982","DOIUrl":"https://doi.org/10.1109/FTCS.1995.466982","url":null,"abstract":"The paper introduces a new fault-tolerant architecture that combines the best attributes of the software fault-tolerant Tandem NonStop systems with the hardware fault-tolerant integrity systems. This architecture is based on the ServerNet System Area Network (SAN). ServerNet, formerly called TNet, is a packetized byte-serial multistage network that supports both I/O and interprocessor traffic in fault-tolerant systems. Dual-ported CPUs and VO controllers connect to independent subnetworks in a variety of different network topologies. Systems can expand either through shared or distributed memory multiprocessing. A separate maintenance system controls system initialization, online configuration changes, and error reporting. The architecture's flexibility makes it suitable for a wide range of environments with varying requirements for performance, fault tolerance, and software compatibility.<<ETX>>","PeriodicalId":309075,"journal":{"name":"Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121805290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}