This paper presents an efficient and fair fault-tolerant token-based algorithm for achieving mutual exclusion. It is an extension of the Naimi-Trehel algorithm that uses a distributed queue of token requests and a dynamic tree. In case of failures, our algorithm tries to recover the requests' queue by gathering intact portions of the one which existed just before the failure. Thus, fairness of token requests is preserved despite failures. Furthermore, the use of broadcast is minimized when rebuilding the dynamic tree. Experiment results with different fault injection scenarios show that our approach presents a fast failure recovery and low message broadcast overhead
{"title":"Performance evaluation of a fair fault-tolerant mutual exclusion algorithm","authors":"Julien Sopena, L. Arantes, Pierre Sens","doi":"10.1109/SRDS.2006.35","DOIUrl":"https://doi.org/10.1109/SRDS.2006.35","url":null,"abstract":"This paper presents an efficient and fair fault-tolerant token-based algorithm for achieving mutual exclusion. It is an extension of the Naimi-Trehel algorithm that uses a distributed queue of token requests and a dynamic tree. In case of failures, our algorithm tries to recover the requests' queue by gathering intact portions of the one which existed just before the failure. Thus, fairness of token requests is preserved despite failures. Furthermore, the use of broadcast is minimized when rebuilding the dynamic tree. Experiment results with different fault injection scenarios show that our approach presents a fast failure recovery and low message broadcast overhead","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114424225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As the disks typically found in personal computers grow larger, protecting data by replicating it on a collection of "peer" systems rather than on dedicated high performance storage systems can provide comparable reliability and availability guarantees but at reduced cost and complexity. In order to be adopted, peer-to-peer storage systems must be able to replicate data on hosts that are trusted, secure, and available. However, recent research has shown that the traditional model, where nodes are assumed to have identical levels of trust, to behave independently, and to have similar failure modes, is over simplified. Thus, there is a need for a mechanism that automatically and efficiently selects replica nodes from a large number of available hosts with varying capabilities and trust levels. In this paper we present an algorithm to handle replica node selection either for new replica groups or to replace failed replicas in a peer-to-peer storage system. We show through simulation that our algorithm maintains the node inter-connection topology minimizing the cost of recovery from a failed replica, measured by the number of nodes affected by the failure and the number of inter-node messages
{"title":"Topology Sensitive Replica Selection","authors":"D. Brodsky, M. Feeley, N. Hutchinson","doi":"10.1109/SRDS.2006.46","DOIUrl":"https://doi.org/10.1109/SRDS.2006.46","url":null,"abstract":"As the disks typically found in personal computers grow larger, protecting data by replicating it on a collection of \"peer\" systems rather than on dedicated high performance storage systems can provide comparable reliability and availability guarantees but at reduced cost and complexity. In order to be adopted, peer-to-peer storage systems must be able to replicate data on hosts that are trusted, secure, and available. However, recent research has shown that the traditional model, where nodes are assumed to have identical levels of trust, to behave independently, and to have similar failure modes, is over simplified. Thus, there is a need for a mechanism that automatically and efficiently selects replica nodes from a large number of available hosts with varying capabilities and trust levels. In this paper we present an algorithm to handle replica node selection either for new replica groups or to replace failed replicas in a peer-to-peer storage system. We show through simulation that our algorithm maintains the node inter-connection topology minimizing the cost of recovery from a failed replica, measured by the number of nodes affected by the failure and the number of inter-node messages","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132600654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes three enhancements to the TCP splicing mechanism: (1) Enable a TCP connection to be simultaneously spliced through multiple machines for higher scalability; (2) Make a spliced connection fault-tolerant to proxy failures; and (3) Provide flexibility of splitting a TCP splice between a proxy and a backend server for further increasing the scalability of a Web server system. A Web server architecture based on this enhanced TCP splicing is proposed. This architecture provides a highly scalable, seamless service to the users with minimal disruption during server failures. In addition to the traditional Web services in which users download Web pages, multimedia files and other types of data from a Web server, the proposed architecture supports newly emerging Web services that are highly interactive, and involve relatively longer, stateful client-server sessions. A prototype of this architecture has been implemented as a Linux 2.6 kernel module, and the paper presents important performance results measured from this implementation
{"title":"Fault-tolerant and scalable TCP splice and web server architecture","authors":"M. Marwah, Shivakant Mishra, C. Fetzer","doi":"10.1109/SRDS.2006.21","DOIUrl":"https://doi.org/10.1109/SRDS.2006.21","url":null,"abstract":"This paper describes three enhancements to the TCP splicing mechanism: (1) Enable a TCP connection to be simultaneously spliced through multiple machines for higher scalability; (2) Make a spliced connection fault-tolerant to proxy failures; and (3) Provide flexibility of splitting a TCP splice between a proxy and a backend server for further increasing the scalability of a Web server system. A Web server architecture based on this enhanced TCP splicing is proposed. This architecture provides a highly scalable, seamless service to the users with minimal disruption during server failures. In addition to the traditional Web services in which users download Web pages, multimedia files and other types of data from a Web server, the proposed architecture supports newly emerging Web services that are highly interactive, and involve relatively longer, stateful client-server sessions. A prototype of this architecture has been implemented as a Linux 2.6 kernel module, and the paper presents important performance results measured from this implementation","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115554475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Database replication is widely used to improve both fault tolerance and DBMS performance. Non-diverse database replication has a significant limitation - it is effective against crash failures only. Diverse redundancy is an effective mechanism of tolerating a wider range of failures, including many non-crash failures. However it has not been adopted in practice because many see DBMS performance as the main concern. In this paper we show experimental evidence that diverse redundancy (diverse replication) can bring benefits in terms of DBMS performance, too. We report on experimental results with an optimistic architecture built with two diverse DBMSs under a load derived from TPC-C benchmark, which show that a diverse pair performs faster not only than non-diverse pairs but also than the individual copies of the DBMSs used. This result is important because it shows potential for DBMS performance better than anything achievable with the available off-the-shelf servers
{"title":"Improving DBMS Performance through Diverse Redundancy","authors":"Vladimir Stankovic, P. Popov","doi":"10.1109/SRDS.2006.27","DOIUrl":"https://doi.org/10.1109/SRDS.2006.27","url":null,"abstract":"Database replication is widely used to improve both fault tolerance and DBMS performance. Non-diverse database replication has a significant limitation - it is effective against crash failures only. Diverse redundancy is an effective mechanism of tolerating a wider range of failures, including many non-crash failures. However it has not been adopted in practice because many see DBMS performance as the main concern. In this paper we show experimental evidence that diverse redundancy (diverse replication) can bring benefits in terms of DBMS performance, too. We report on experimental results with an optimistic architecture built with two diverse DBMSs under a load derived from TPC-C benchmark, which show that a diverse pair performs faster not only than non-diverse pairs but also than the individual copies of the DBMSs used. This result is important because it shows potential for DBMS performance better than anything achievable with the available off-the-shelf servers","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116018797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessandro Daidone, F. Giandomenico, A. Bondavalli, S. Chiaradonna
In modern information infrastructures, diagnosis must be able to assess the status or the extent of the damage of individual components. Traditional one-shot diagnosis is not adequate, but streams of data on component behavior need to be collected and filtered over time as done by some existing heuristics. This paper proposes instead a general framework and a formalism to model such over-time diagnosis scenarios, and to find appropriate solutions. As such, it is very beneficial to system designers to support design choices. Taking advantage of the characteristics of the hidden Markov models formalism, widely used in pattern recognition, the paper proposes a formalization of the diagnosis process, addressing the complete chain constituted by monitored component, deviation detection and state diagnosis. Hidden Markov models are well suited to represent problems where the internal state of a certain entity is not known and can only be inferred from external observations of what this entity emits. Such over-time diagnosis is a first class representative of this category of problems. The accuracy of diagnosis carried out through the proposed formalization is then discussed, as well as how to concretely use it to perform state diagnosis and allow direct comparison of alternative solutions
{"title":"Hidden Markov Models as a Support for Diagnosis: Formalization of the Problem and Synthesis of the Solution","authors":"Alessandro Daidone, F. Giandomenico, A. Bondavalli, S. Chiaradonna","doi":"10.1109/SRDS.2006.24","DOIUrl":"https://doi.org/10.1109/SRDS.2006.24","url":null,"abstract":"In modern information infrastructures, diagnosis must be able to assess the status or the extent of the damage of individual components. Traditional one-shot diagnosis is not adequate, but streams of data on component behavior need to be collected and filtered over time as done by some existing heuristics. This paper proposes instead a general framework and a formalism to model such over-time diagnosis scenarios, and to find appropriate solutions. As such, it is very beneficial to system designers to support design choices. Taking advantage of the characteristics of the hidden Markov models formalism, widely used in pattern recognition, the paper proposes a formalization of the diagnosis process, addressing the complete chain constituted by monitored component, deviation detection and state diagnosis. Hidden Markov models are well suited to represent problems where the internal state of a certain entity is not known and can only be inferred from external observations of what this entity emits. Such over-time diagnosis is a first class representative of this category of problems. The accuracy of diagnosis carried out through the proposed formalization is then discussed, as well as how to concretely use it to perform state diagnosis and allow direct comparison of alternative solutions","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129679873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a strong and efficient scheme for protecting against buffer overflow attacks. The basic approach of this scheme is pointer copying: copies of code pointers are stored in a safe memory area to detect and prevent the manipulation of code pointers. In order to protect the copied code pointers from data-pointer modification attacks, this scheme exploits the segmentation hardware of IA-32 (Intel x86) processors. This scheme provides as strong protection as write-protecting the memory area via system calls. On the other hand, this scheme involves a modest overhead because copying a code pointer requires only a few user-level instructions and there is no penalty of entering the kernel. The experimental results show that the performance overhead in OpenSSL ranges from 0.9% to 4.3%
{"title":"SegmentShield: Exploiting Segmentation Hardware for Protecting against Buffer Overflow Attacks","authors":"Takahiro Shinagawa","doi":"10.1109/SRDS.2006.43","DOIUrl":"https://doi.org/10.1109/SRDS.2006.43","url":null,"abstract":"This paper presents a strong and efficient scheme for protecting against buffer overflow attacks. The basic approach of this scheme is pointer copying: copies of code pointers are stored in a safe memory area to detect and prevent the manipulation of code pointers. In order to protect the copied code pointers from data-pointer modification attacks, this scheme exploits the segmentation hardware of IA-32 (Intel x86) processors. This scheme provides as strong protection as write-protecting the memory area via system calls. On the other hand, this scheme involves a modest overhead because copying a code pointer requires only a few user-level instructions and there is no penalty of entering the kernel. The experimental results show that the performance overhead in OpenSSL ranges from 0.9% to 4.3%","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127747465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present the SNMP-FD service, a novel failure detection service entirely based on the Simple Network Management Protocol (SNMP). This approach promises better interoperability with external tools and failure information sources, including network equipment and cluster management tools. We first show how the SNMP standard can be used to build a failure detection service. We describe the already standardized interfaces that can be reused and introduce the interfaces that need to be added. SNMP is used extensively in the service for messaging, process status description, configuration, services statistics and delivering failure detection information to applications. We then present our implementation and an evaluation of performance and quality of service
{"title":"An SNMP based failure detection service","authors":"M. Wiesmann, P. Urbán, X. Défago","doi":"10.1109/SRDS.2006.9","DOIUrl":"https://doi.org/10.1109/SRDS.2006.9","url":null,"abstract":"In this paper, we present the SNMP-FD service, a novel failure detection service entirely based on the Simple Network Management Protocol (SNMP). This approach promises better interoperability with external tools and failure information sources, including network equipment and cluster management tools. We first show how the SNMP standard can be used to build a failure detection service. We describe the already standardized interfaces that can be reused and introduce the interfaces that need to be added. SNMP is used extensively in the service for messaging, process status description, configuration, services statistics and delivering failure detection information to applications. We then present our implementation and an evaluation of performance and quality of service","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130188027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}