Pub Date : 2004-06-28DOI: 10.1109/DSN.2004.1311927
P. Martins, Paulo Sousa, A. Casimiro, P. Veríssimo
This paper describes and discusses the work carried on in the context of the CORTEX project, for the development of adaptive real-time applications in wormhole based systems. The architecture of CORTEX relies on the existence of a timeliness wormhole, called timely computing base (TCB), which we have described in previous papers. Here we focus on the practical demonstration of the wormhole concept, through a demo with two complementary facets. The objective is to illustrate the effectiveness of the concept from a practical, yet rigorous, perspective, which is done with the help of an emulation framework that we present in the paper. Furthermore, the paper also describes two different ways of implementing timeliness wormholes on top of both wired and wireless infrastructures.
{"title":"Dependable adaptive real-time applications in wormhole-based systems","authors":"P. Martins, Paulo Sousa, A. Casimiro, P. Veríssimo","doi":"10.1109/DSN.2004.1311927","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311927","url":null,"abstract":"This paper describes and discusses the work carried on in the context of the CORTEX project, for the development of adaptive real-time applications in wormhole based systems. The architecture of CORTEX relies on the existence of a timeliness wormhole, called timely computing base (TCB), which we have described in previous papers. Here we focus on the practical demonstration of the wormhole concept, through a demo with two complementary facets. The objective is to illustrate the effectiveness of the concept from a practical, yet rigorous, perspective, which is done with the help of an emulation framework that we present in the paper. Furthermore, the paper also describes two different ways of implementing timeliness wormholes on top of both wired and wireless infrastructures.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117339113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-06-28DOI: 10.1109/DSN.2004.1311895
Carolos Livadas, I. Keidar
We present the caching-enhanced scalable reliable multicast (CESRM) protocol. CESRM augments the scalable reliable multicast (SRM) protocol (S. Floyd et al., 1995 and 1997) with a caching-based expedited recovery scheme. CESRM exploits the packet loss locality occurring in IP multicast transmissions in order to expeditiously recover from losses in the manner in which recent losses were recovered. Trace-driven simulations show that CESRM reduces the average recovery latency of SRM by roughly 50% and, moreover, drastically reduces the overhead in terms of recovery traffic and control messages.
{"title":"Caching-enhanced scalable reliable multicast","authors":"Carolos Livadas, I. Keidar","doi":"10.1109/DSN.2004.1311895","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311895","url":null,"abstract":"We present the caching-enhanced scalable reliable multicast (CESRM) protocol. CESRM augments the scalable reliable multicast (SRM) protocol (S. Floyd et al., 1995 and 1997) with a caching-based expedited recovery scheme. CESRM exploits the packet loss locality occurring in IP multicast transmissions in order to expeditiously recover from losses in the manner in which recent losses were recovered. Trace-driven simulations show that CESRM reduces the average recovery latency of SRM by roughly 50% and, moreover, drastically reduces the overhead in terms of recovery traffic and control messages.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115024260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-06-28DOI: 10.1109/DSN.2004.1311932
Jun He, M. Hiltunen, R. Schlichting
Mobile service platforms are used to facilitate access to enterprise services such as email, product inventory, or design drawing databases by a wide range of mobile devices using a variety of access protocols. This paper presents a quality of service (QoS) architecture that allows flexible combinations of dependability attributes such as reliability, timeliness, and security to be enforced on a per service request basis. In addition to components that implement the underlying dependability techniques, the architecture includes policy components that evaluate a request's requirements and dynamically determine an appropriate execution strategy. The architecture has been integrated into an experimental version of iMobile, a mobile service platform being developed at AT&T. This paper describes the design and implementation of the architecture, and gives initial experimental results for the iMobile prototype.
{"title":"Customizing dependability attributes for mobile service platforms","authors":"Jun He, M. Hiltunen, R. Schlichting","doi":"10.1109/DSN.2004.1311932","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311932","url":null,"abstract":"Mobile service platforms are used to facilitate access to enterprise services such as email, product inventory, or design drawing databases by a wide range of mobile devices using a variety of access protocols. This paper presents a quality of service (QoS) architecture that allows flexible combinations of dependability attributes such as reliability, timeliness, and security to be enforced on a per service request basis. In addition to components that implement the underlying dependability techniques, the architecture includes policy components that evaluate a request's requirements and dynamically determine an appropriate execution strategy. The architecture has been integrated into an experimental version of iMobile, a mobile service platform being developed at AT&T. This paper describes the design and implementation of the architecture, and gives initial experimental results for the iMobile prototype.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116219317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-06-28DOI: 10.1109/DSN.2004.1311910
Aaron B. Brown, Leonard Chung, William Kakes, C. Ling, D. Patterson
We describe an approach to quantitatively evaluating human-assisted failure-recovery tools and processes in the environment of modern Internetand enterprise-class server systems. Our approach can quantify the dependability impact of a single recovery system, and also enables comparisons between different recovery approaches. The approach combines aspects of dependability benchmarking with human user studies, incorporating human participants in the system evaluations yet still producing typical dependability-related metrics as results. We illustrate our methodology via a case study of a system-wide undo/redo recovery tool for e-mail services; our approach is able to expose the dependability benefits of the tool as well as point out areas where its behavior could use improvement.
{"title":"Experience with evaluating human-assisted recovery processes","authors":"Aaron B. Brown, Leonard Chung, William Kakes, C. Ling, D. Patterson","doi":"10.1109/DSN.2004.1311910","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311910","url":null,"abstract":"We describe an approach to quantitatively evaluating human-assisted failure-recovery tools and processes in the environment of modern Internetand enterprise-class server systems. Our approach can quantify the dependability impact of a single recovery system, and also enables comparisons between different recovery approaches. The approach combines aspects of dependability benchmarking with human user studies, incorporating human participants in the system evaluations yet still producing typical dependability-related metrics as results. We illustrate our methodology via a case study of a system-wide undo/redo recovery tool for e-mail services; our approach is able to expose the dependability benefits of the tool as well as point out areas where its behavior could use improvement.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126660240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-06-28DOI: 10.1109/DSN.2004.1311929
Nithin Nakka, Z. Kalbarczyk, R. Iyer, Jun Xu
This paper explores hardware-implemented error-detection and security mechanisms embedded as modules in a hardware-level framework called the reliability and security engine (RSE), which is implemented as an integral part of a modern microprocessor. The RSE interacts with the processor through an input/output interface. The CHECK instruction, a special extension of the instruction set architecture of the processor, is the interface of the application with the RSE. The detection mechanisms described here in detail are: (I) the memory layout randomization (MLR) module, which randomizes the memory layout of a process in order to foil attackers who assume a fixed system layout, (2) the data dependency tracking (DDT) module, which tracks the dependencies among threads of a process and maintains checkpoints of shared memory pages in order to rollback the threads when an offending (potentially malicious) thread is terminated, and (3) the instruction checker module (ICM), which checks an instruction for its validity or the control-flow of the program just as the instruction enters the pipeline for execution. Performance simulations for the studied modules indicate low overhead of the proposed solutions.
{"title":"An architectural framework for providing reliability and security support","authors":"Nithin Nakka, Z. Kalbarczyk, R. Iyer, Jun Xu","doi":"10.1109/DSN.2004.1311929","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311929","url":null,"abstract":"This paper explores hardware-implemented error-detection and security mechanisms embedded as modules in a hardware-level framework called the reliability and security engine (RSE), which is implemented as an integral part of a modern microprocessor. The RSE interacts with the processor through an input/output interface. The CHECK instruction, a special extension of the instruction set architecture of the processor, is the interface of the application with the RSE. The detection mechanisms described here in detail are: (I) the memory layout randomization (MLR) module, which randomizes the memory layout of a process in order to foil attackers who assume a fixed system layout, (2) the data dependency tracking (DDT) module, which tracks the dependencies among threads of a process and maintains checkpoints of shared memory pages in order to rollback the threads when an offending (potentially malicious) thread is terminated, and (3) the instruction checker module (ICM), which checks an instruction for its validity or the control-flow of the program just as the instruction enters the pipeline for execution. Performance simulations for the studied modules indicate low overhead of the proposed solutions.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133813531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-06-28DOI: 10.1109/DSN.2004.1311945
L. Cherkasova, Wenting Tang, S. Singhal
This paper addresses the problem of mapping the requirements of a known media service workload into the corresponding system resource requirements and accurately sizing a media server cluster to handle the workload. In this paper, we propose a new capacity planning framework for evaluating the resources needed for processing a given streaming media workload with specified performance requirements. The performance requirements are specified in a service level agreement (SLA) containing: i) basic capacity requirements that define the percentage of time the configuration is capable of processing the workload without performance degradation while satisfying bounds on system utilization; and ii) performability requirements that define the acceptable degradation of service performance during the remaining, non-compliant time and in case of node failures. Using a set of specially benchmarked media server configurations, the capacity planning tool matches the overall capacity requirements of the media service workload profile with the specified SLAs to identify the number of nodes necessary to support the required service performance.
{"title":"An SLA-oriented capacity planning tool for streaming media services","authors":"L. Cherkasova, Wenting Tang, S. Singhal","doi":"10.1109/DSN.2004.1311945","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311945","url":null,"abstract":"This paper addresses the problem of mapping the requirements of a known media service workload into the corresponding system resource requirements and accurately sizing a media server cluster to handle the workload. In this paper, we propose a new capacity planning framework for evaluating the resources needed for processing a given streaming media workload with specified performance requirements. The performance requirements are specified in a service level agreement (SLA) containing: i) basic capacity requirements that define the percentage of time the configuration is capable of processing the workload without performance degradation while satisfying bounds on system utilization; and ii) performability requirements that define the acceptable degradation of service performance during the remaining, non-compliant time and in case of node failures. Using a set of specially benchmarked media server configurations, the capacity planning tool matches the overall capacity requirements of the media service workload profile with the specified SLAs to identify the number of nodes necessary to support the required service performance.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114862793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-06-28DOI: 10.1109/DSN.2004.1311886
P. Boykin, V. Roychowdhury, T. Mor, F. Vatan
In ensemble (or bulk) quantum computation, all computations are performed on an ensemble of computers rather than on a single computer. Measurements of qubits in an individual computer cannot be performed; instead, only expectation values (over the complete ensemble of computers) can be measured. As a result of this limitation on the model of computation, many algorithms cannot be processed directly on such computers, and must be modified. We provide modification of the fault tolerant quantum computation protocols to enable processing on ensemble quantum computers.
{"title":"Fault tolerant computation on ensemble quantum computers","authors":"P. Boykin, V. Roychowdhury, T. Mor, F. Vatan","doi":"10.1109/DSN.2004.1311886","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311886","url":null,"abstract":"In ensemble (or bulk) quantum computation, all computations are performed on an ensemble of computers rather than on a single computer. Measurements of qubits in an individual computer cannot be performed; instead, only expectation values (over the complete ensemble of computers) can be measured. As a result of this limitation on the model of computation, many algorithms cannot be processed directly on such computers, and must be modified. We provide modification of the fault tolerant quantum computation protocols to enable processing on ensemble quantum computers.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115032900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-06-28DOI: 10.1109/DSN.2004.1311872
M. Castro, Manuel Costa, A. Rowstron
Structured peer-to-peer (P2P) overlay networks provide a useful substrate for building distributed applications. They map object keys to overlay nodes and offer a primitive to send a message to the node responsible for a key. They can implement, for example, distributed hash tables and multicast trees. However, there are concerns about the performance and dependability of these overlays in realistic environments. Several studies have shown that current P2P environments have high churn rates: nodes join and leave the overlay continuously. This paper presents techniques that continuously detect faults and repair the overlay to achieve high dependability and good performance in realistic environments. The techniques are evaluated using large-scale network simulation experiments with fault injection guided by real traces of node arrivals and departures. The results show that previous concerns are unfounded; our techniques can achieve dependable routing in realistic environments with an average delay stretch below two and a maintenance overhead of less than half a message per second per node.
{"title":"Performance and dependability of structured peer-to-peer overlays","authors":"M. Castro, Manuel Costa, A. Rowstron","doi":"10.1109/DSN.2004.1311872","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311872","url":null,"abstract":"Structured peer-to-peer (P2P) overlay networks provide a useful substrate for building distributed applications. They map object keys to overlay nodes and offer a primitive to send a message to the node responsible for a key. They can implement, for example, distributed hash tables and multicast trees. However, there are concerns about the performance and dependability of these overlays in realistic environments. Several studies have shown that current P2P environments have high churn rates: nodes join and leave the overlay continuously. This paper presents techniques that continuously detect faults and repair the overlay to achieve high dependability and good performance in realistic environments. The techniques are evaluated using large-scale network simulation experiments with fault injection guided by real traces of node arrivals and departures. The results show that previous concerns are unfounded; our techniques can achieve dependable routing in realistic environments with an average delay stretch below two and a maintenance overhead of less than half a message per second per node.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"1987 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120972449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-06-28DOI: 10.1109/DSN.2004.1311920
B. Garbinato, F. Pedone, R. Schmidt
In this paper, we propose a novel approach for solving the reliable broadcast problem in a probabilistic unreliable model. Our approach consists in first defining the optimality of probabilistic reliable broadcast algorithms and the adaptiveness of algorithms that aim at converging toward such optimality. Then, we propose an algorithm that precisely converges toward the optimal behavior, thanks to an adaptive strategy based on Bayesian statistical inference. We compare the performance of our algorithm with that of a typical gossip algorithm through simulation. Our results show, for example, that our adaptive algorithm quickly converges toward such exact knowledge.
{"title":"An adaptive algorithm for efficient message diffusion in unreliable environments","authors":"B. Garbinato, F. Pedone, R. Schmidt","doi":"10.1109/DSN.2004.1311920","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311920","url":null,"abstract":"In this paper, we propose a novel approach for solving the reliable broadcast problem in a probabilistic unreliable model. Our approach consists in first defining the optimality of probabilistic reliable broadcast algorithms and the adaptiveness of algorithms that aim at converging toward such optimality. Then, we propose an algorithm that precisely converges toward the optimal behavior, thanks to an adaptive strategy based on Bayesian statistical inference. We compare the performance of our algorithm with that of a typical gossip algorithm through simulation. Our results show, for example, that our adaptive algorithm quickly converges toward such exact knowledge.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126138749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-06-28DOI: 10.1109/DSN.2004.1311963
L. Breveglieri, I. Koren
Cryptographic devices are becoming increasingly ubiquitous and complex, making reliability an important design objective. Moreover, the diffusion of mobile, low-price consumer electronic equipment containing cryptographic components makes them more vulnerable to attack procedures, in particular to those based on injection of faults. This workshop aims at providing researchers in both the dependability and cryptography communities an opportunity to start bridging the gap between fault diagnosis and tolerance techniques, and cryptography.
{"title":"Workshop on fault diagnosis and tolerance in cryptography","authors":"L. Breveglieri, I. Koren","doi":"10.1109/DSN.2004.1311963","DOIUrl":"https://doi.org/10.1109/DSN.2004.1311963","url":null,"abstract":"Cryptographic devices are becoming increasingly ubiquitous and complex, making reliability an important design objective. Moreover, the diffusion of mobile, low-price consumer electronic equipment containing cryptographic components makes them more vulnerable to attack procedures, in particular to those based on injection of faults. This workshop aims at providing researchers in both the dependability and cryptography communities an opportunity to start bridging the gap between fault diagnosis and tolerance techniques, and cryptography.","PeriodicalId":436323,"journal":{"name":"International Conference on Dependable Systems and Networks, 2004","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128182712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}