Henrique Moniz, N. Neves, M. Correia, P. Veríssimo
The paper presents a comparative performance study of the two main classes of randomized binary consensus protocols: a local coin protocol, with an expected high communication complexity and cheap symmetric cryptography, and a shared coin protocol, with an expected low communication complexity and expensive asymmetric cryptography. The experimental evaluation was conducted on a LAN environment, by varying several system parameters, such as the fault types and number of processes. The analysis shows that there is a significant gap between the theoretical and the practical performance results of these protocols, and provides an important insight into what actually happens during their execution
{"title":"Experimental Comparison of Local and Shared Coin Randomized Consensus Protocols","authors":"Henrique Moniz, N. Neves, M. Correia, P. Veríssimo","doi":"10.1109/SRDS.2006.19","DOIUrl":"https://doi.org/10.1109/SRDS.2006.19","url":null,"abstract":"The paper presents a comparative performance study of the two main classes of randomized binary consensus protocols: a local coin protocol, with an expected high communication complexity and cheap symmetric cryptography, and a shared coin protocol, with an expected low communication complexity and expensive asymmetric cryptography. The experimental evaluation was conducted on a LAN environment, by varying several system parameters, such as the fault types and number of processes. The analysis shows that there is a significant gap between the theoretical and the practical performance results of these protocols, and provides an important insight into what actually happens during their execution","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117041906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Salas, R. Jiménez-Peris, M. Patiño-Martínez, Bettina Kemme
Middleware-based database replication approaches have emerged in the last few years as an alternative to traditional database replication implemented within the database kernel. A middleware approach enables third party vendors to provide high availability solutions, a growing practice nowadays in the software industry. However, middleware solutions often lack scalability and exhibit a number of consistency and performance issues. The reason is that in most cases the middleware has to handle the database as a black box, and hence, cannot take advantage of the many optimizations implemented in the database kernel. Thus, middleware solutions often reimplement key functionality but cannot achieve the same efficiency as a kernel implementation. Reflection has been proposed during the last decade as a fruitful paradigm to separate non-functional aspects from functional ones, simplifying software development and maintenance whilst fostering reuse. However, fully reflective databases are not feasible due to the high cost of reflection. Our claim is that by exposing some minimal database functionality through a lightweight reflective interface, efficient and scalable middleware database replication can be attained. In this paper we explore a wide variety of such lightweight reflective interfaces and discuss what kind of replication algorithms they enable. We also discuss implementation alternatives for some of these interfaces and evaluate their performance
{"title":"Lightweight Reflection for Middleware-based Database Replication","authors":"J. Salas, R. Jiménez-Peris, M. Patiño-Martínez, Bettina Kemme","doi":"10.1109/SRDS.2006.28","DOIUrl":"https://doi.org/10.1109/SRDS.2006.28","url":null,"abstract":"Middleware-based database replication approaches have emerged in the last few years as an alternative to traditional database replication implemented within the database kernel. A middleware approach enables third party vendors to provide high availability solutions, a growing practice nowadays in the software industry. However, middleware solutions often lack scalability and exhibit a number of consistency and performance issues. The reason is that in most cases the middleware has to handle the database as a black box, and hence, cannot take advantage of the many optimizations implemented in the database kernel. Thus, middleware solutions often reimplement key functionality but cannot achieve the same efficiency as a kernel implementation. Reflection has been proposed during the last decade as a fruitful paradigm to separate non-functional aspects from functional ones, simplifying software development and maintenance whilst fostering reuse. However, fully reflective databases are not feasible due to the high cost of reflection. Our claim is that by exposing some minimal database functionality through a lightweight reflective interface, efficient and scalable middleware database replication can be attained. In this paper we explore a wide variety of such lightweight reflective interfaces and discuss what kind of replication algorithms they enable. We also discuss implementation alternatives for some of these interfaces and evaluate their performance","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117128345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present and evaluate a generic approach to the repair of overlay networks which identifies general principles of overlay repair and embodies these as a reusable service. At the heart of our approach is an algorithm that discovers the extent of a failed section of any type of overlay, and assigns responsibility to carry out the repair. The repair strategy itself is 'pluggable' and can be tailored to the requirements of a specific overlay type or instance. Our approach is efficient in terms of the number of repair-related message exchanges it incurs; scalable in that it involves only nodes in the locality of the failed section of the overlay; and resilient in that it correctly handles cases in which multiple adjacent nodes fail simultaneously, and it tolerates new failures that occur while a repair is underway. The benefits of our approach are that: (i) it extracts and encapsulates best practice in repair for overlays; (ii) it simplifies the design and implementation of new overlays (because repair issues can be treated orthogonally to basic functionality); and (iii) it supports tailorable levels of dependability for overlays, including pluggable repair strategies
{"title":"Generalised Repair for Overlay Networks","authors":"Barry Porter, François Taïani, G. Coulson","doi":"10.1109/SRDS.2006.23","DOIUrl":"https://doi.org/10.1109/SRDS.2006.23","url":null,"abstract":"We present and evaluate a generic approach to the repair of overlay networks which identifies general principles of overlay repair and embodies these as a reusable service. At the heart of our approach is an algorithm that discovers the extent of a failed section of any type of overlay, and assigns responsibility to carry out the repair. The repair strategy itself is 'pluggable' and can be tailored to the requirements of a specific overlay type or instance. Our approach is efficient in terms of the number of repair-related message exchanges it incurs; scalable in that it involves only nodes in the locality of the failed section of the overlay; and resilient in that it correctly handles cases in which multiple adjacent nodes fail simultaneously, and it tolerates new failures that occur while a repair is underway. The benefits of our approach are that: (i) it extracts and encapsulates best practice in repair for overlays; (ii) it simplifies the design and implementation of new overlays (because repair issues can be treated orthogonally to basic functionality); and (iii) it supports tailorable levels of dependability for overlays, including pluggable repair strategies","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128468633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Web services and service oriented architectures are becoming the de facto standard for Internet computing. A main problem faced by users of such services is how to ensure that the service code is trusted. While methods that guarantee trusted service code execution before starting a client-service transaction exist, there is no solution for extending this assurance to the entire lifetime of the transaction. This paper presents Satem, a Service-aware trusted execution monitor that guarantees the trustworthiness of the service code across a whole transaction. The Satem architecture consists of an execution monitor residing in the operating system kernel on the service provider platform, a trust evaluator on the client platform, and a service commitment protocol. During this protocol, executed before every transaction, the client requests and verifies against its local policy a commitment from the service platform that promises trusted code execution. Subsequently, the monitor enforces this commitment for the duration of the transaction. To initialize the trust on the monitor, we use the Trusted Platform Module specified by the Trusted Computing Group. We implemented Satem under the Linux 2.6.12 kernel and tested it for a Web service and DNS. The experimental results demonstrate that Satem does not incur significant overhead to the protected services and does not impact the unprotected services
{"title":"Satem: Trusted Service Code Execution across Transactions","authors":"Gang Xu, C. Borcea, L. Iftode","doi":"10.1109/SRDS.2006.42","DOIUrl":"https://doi.org/10.1109/SRDS.2006.42","url":null,"abstract":"Web services and service oriented architectures are becoming the de facto standard for Internet computing. A main problem faced by users of such services is how to ensure that the service code is trusted. While methods that guarantee trusted service code execution before starting a client-service transaction exist, there is no solution for extending this assurance to the entire lifetime of the transaction. This paper presents Satem, a Service-aware trusted execution monitor that guarantees the trustworthiness of the service code across a whole transaction. The Satem architecture consists of an execution monitor residing in the operating system kernel on the service provider platform, a trust evaluator on the client platform, and a service commitment protocol. During this protocol, executed before every transaction, the client requests and verifies against its local policy a commitment from the service platform that promises trusted code execution. Subsequently, the monitor enforces this commitment for the duration of the transaction. To initialize the trust on the monitor, we use the Trusted Platform Module specified by the Trusted Computing Group. We implemented Satem under the Linux 2.6.12 kernel and tested it for a Web service and DNS. The experimental results demonstrate that Satem does not incur significant overhead to the protected services and does not impact the unprotected services","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129136484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data centers constructed as clusters of inexpensive machines have compelling cost-performance benefits, but developing services to run on them can be challenging. This paper reports on a new framework, the scalable services architecture (SSA), which helps developers develop scalable clustered applications. The work is focused on non-transactional high-performance applications; these are poorly supported in existing platforms. A primary goal was to keep the SSA as small and simple as possible. Key elements include a TCP-based "chain replication" mechanism and a gossip-based subsystem for managing configuration data and repairing inconsistencies after faults. Our experimental results confirm the effectiveness of the approach
{"title":"A Scalable Services Architecture","authors":"Tudor Marian, K. Birman, R. V. Renesse","doi":"10.1109/SRDS.2006.7","DOIUrl":"https://doi.org/10.1109/SRDS.2006.7","url":null,"abstract":"Data centers constructed as clusters of inexpensive machines have compelling cost-performance benefits, but developing services to run on them can be challenging. This paper reports on a new framework, the scalable services architecture (SSA), which helps developers develop scalable clustered applications. The work is focused on non-transactional high-performance applications; these are poorly supported in existing platforms. A primary goal was to keep the SSA as small and simple as possible. Key elements include a TCP-based \"chain replication\" mechanism and a gossip-based subsystem for managing configuration data and repairing inconsistencies after faults. Our experimental results confirm the effectiveness of the approach","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121968701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed snapshots are an important building block for distributed systems, and, among other applications, are useful for constructing efficient checkpointing protocols. In addition to the imposed overhead of the existing distributed snapshot protocols, those protocols are not trivially applicable (if at all) in many of today's distributed systems, e.g., grid, mobile, and sensors systems. After presenting the shortages and the inapplicability of the most popular existing distributed snapshot protocols, this paper discusses improvement directions for the protocols. In addition, it presents a new and an important improvement for the most popular distributed snapshot protocol, which was presented by Chandy and Lamport in 1985. Although the proposed improvement is simple and easy to implement, it has significant benefits in reducing the software and hardware overheads of distributed snapshots. Then, the paper presents proofs for the safety and progress of the new protocol. Lastly, it presents a performance analysis of the protocol using stochastic models
{"title":"Improvements and Reconsideration of Distributed Snapshot Protocols","authors":"A. Agbaria","doi":"10.1109/SRDS.2006.26","DOIUrl":"https://doi.org/10.1109/SRDS.2006.26","url":null,"abstract":"Distributed snapshots are an important building block for distributed systems, and, among other applications, are useful for constructing efficient checkpointing protocols. In addition to the imposed overhead of the existing distributed snapshot protocols, those protocols are not trivially applicable (if at all) in many of today's distributed systems, e.g., grid, mobile, and sensors systems. After presenting the shortages and the inapplicability of the most popular existing distributed snapshot protocols, this paper discusses improvement directions for the protocols. In addition, it presents a new and an important improvement for the most popular distributed snapshot protocol, which was presented by Chandy and Lamport in 1985. Although the proposed improvement is simple and easy to implement, it has significant benefits in reducing the software and hardware overheads of distributed snapshots. Then, the paper presents proofs for the safety and progress of the new protocol. Lastly, it presents a performance analysis of the protocol using stochastic models","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"428 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115945212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article proposes an original approach that applies the rollback-dependency trackability (RDT) property to implement a new non-blocking synchronous checkpointing protocol, called RDT-NBS, that takes mutable checkpoints and efficiently supports concurrent initiators. Mutable checkpoints can be saved in non-stable storage and make it possible for non-blocking synchronous checkpointing protocols to save a minimal number of checkpoints in stable storage during the construction of a consistent global checkpoint. We prove that this minimality property does not hold in presence of concurrent checkpointing initiations. Even though, RDT-NBS uses mutable checkpoints to reduce the use of stable memory assuring the existence of a consistent global checkpoint in stable storage. We also present simulation results that compare RDT-NBS to quasi-synchronous RDT
{"title":"Non-Blocking Synchronous Checkpointing Based on Rollback-Dependency Trackability","authors":"T. Sakata, Islene C. Garcia","doi":"10.1109/SRDS.2006.34","DOIUrl":"https://doi.org/10.1109/SRDS.2006.34","url":null,"abstract":"This article proposes an original approach that applies the rollback-dependency trackability (RDT) property to implement a new non-blocking synchronous checkpointing protocol, called RDT-NBS, that takes mutable checkpoints and efficiently supports concurrent initiators. Mutable checkpoints can be saved in non-stable storage and make it possible for non-blocking synchronous checkpointing protocols to save a minimal number of checkpoints in stable storage during the construction of a consistent global checkpoint. We prove that this minimality property does not hold in presence of concurrent checkpointing initiations. Even though, RDT-NBS uses mutable checkpoints to reduce the use of stable memory assuring the existence of a consistent global checkpoint in stable storage. We also present simulation results that compare RDT-NBS to quasi-synchronous RDT","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134094899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A failure detector is an important building block when constructing fault-tolerant distributed systems. In asynchronous distributed systems, failed processes are often indistinguishable from slow processes. A failure detector is an oracle that can intelligently suspect processes to have failed. Different classes of failure detectors have been proposed to solve different kinds of problems. Almost all of this work is focused on global failure detection, and moreover, in systems that do not contain mobile nodes or include dynamic topologies. In this paper, we present diamPm l - a local failure detector that can tolerate mobility and topology changes. This means that diamPm l can distinguish between a failed process and a process that has moved away from its original location. We also establish an upper bound on the duration for which a process wrongly suspects a node that has moved away from its neighborhood. We support our theoretical results with experimental findings from an implementation of this algorithm for sensor networks
{"title":"Decentralized Local Failure Detection in Dynamic Distributed Systems","authors":"Nigamanth Sridhar","doi":"10.1109/SRDS.2006.16","DOIUrl":"https://doi.org/10.1109/SRDS.2006.16","url":null,"abstract":"A failure detector is an important building block when constructing fault-tolerant distributed systems. In asynchronous distributed systems, failed processes are often indistinguishable from slow processes. A failure detector is an oracle that can intelligently suspect processes to have failed. Different classes of failure detectors have been proposed to solve different kinds of problems. Almost all of this work is focused on global failure detection, and moreover, in systems that do not contain mobile nodes or include dynamic topologies. In this paper, we present diamPm l - a local failure detector that can tolerate mobility and topology changes. This means that diamPm l can distinguish between a failed process and a process that has moved away from its original location. We also establish an upper bound on the duration for which a process wrongly suspects a node that has moved away from its neighborhood. We support our theoretical results with experimental findings from an implementation of this algorithm for sensor networks","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127825999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Web is a complicated graph, with millions of Web sites interlinked together. In this paper, we propose to use this Web sitegraph structure to mitigate flooding attacks on a Web site, using a new Web referral architecture for privileged service ("WRAPS"). WRAPS allows a legitimate client to obtain a privilege URL through a click on a referral hypher-link, from a Web site trusted by the target Web site. Using that URL, the client can get privileged access to the target Web site in a manner that is far less vulnerable to a DDoS flooding attack. WRAPS does not require changes to Web client software and is extremely lightweight for referrer Web sites, which eases its deployment. The massive scale of the Web sitegraph could deter attempts to isolate a Web site through blocking all referrers. We present the design of WRAPS, and the implementation of a prototype system used to evaluate our proposal. Our empirical study demonstrates that WRAPS enables legitimate clients to connect to a Web site smoothly in spite of an intensive flooding attack, at the cost of small overheads on the Web site's ISP's edge routers
{"title":"WRAPS: Denial-of-Service Defense through Web Referrals","authors":"Xiaofeng Wang, M. Reiter","doi":"10.1109/SRDS.2006.48","DOIUrl":"https://doi.org/10.1109/SRDS.2006.48","url":null,"abstract":"The Web is a complicated graph, with millions of Web sites interlinked together. In this paper, we propose to use this Web sitegraph structure to mitigate flooding attacks on a Web site, using a new Web referral architecture for privileged service (\"WRAPS\"). WRAPS allows a legitimate client to obtain a privilege URL through a click on a referral hypher-link, from a Web site trusted by the target Web site. Using that URL, the client can get privileged access to the target Web site in a manner that is far less vulnerable to a DDoS flooding attack. WRAPS does not require changes to Web client software and is extremely lightweight for referrer Web sites, which eases its deployment. The massive scale of the Web sitegraph could deter attempts to isolate a Web site through blocking all referrers. We present the design of WRAPS, and the implementation of a prototype system used to evaluate our proposal. Our empirical study demonstrates that WRAPS enables legitimate clients to connect to a Web site smoothly in spite of an intensive flooding attack, at the cost of small overheads on the Web site's ISP's edge routers","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"274 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122051759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edward Curley, J. Anderson, B. Ravindran, E. Jensen
We consider the problem of recovering from failures of distributable threads with assured timeliness. When a node hosting a portion of a distributable thread fails, it causes orphans - i.e., thread segments that are disconnected from the thread's root. We consider a termination model for recovering from such failures, where the orphans must be detected and aborted, and failure-exception notification must be delivered to the farthest, contiguous surviving thread segment for resuming thread execution. We present a realtime scheduling algorithm called AUA, and a distributable thread integrity protocol called TP-TR. We show that AUA and TP-TR bound the orphan cleanup and recovery time, thereby bounding thread starvation durations, and maximize the total thread accrued timeliness utility. We implement AUA and TP-TR in a real-time middleware that supports distributable threads. Our experimental studies with the implementation validate the algorithm/protocol's time-bounded recovery property and confirm their effectiveness
{"title":"Recovering from Distributable Thread Failures with Assured Timeliness in Real-Time Distributed Systems","authors":"Edward Curley, J. Anderson, B. Ravindran, E. Jensen","doi":"10.1109/SRDS.2006.38","DOIUrl":"https://doi.org/10.1109/SRDS.2006.38","url":null,"abstract":"We consider the problem of recovering from failures of distributable threads with assured timeliness. When a node hosting a portion of a distributable thread fails, it causes orphans - i.e., thread segments that are disconnected from the thread's root. We consider a termination model for recovering from such failures, where the orphans must be detected and aborted, and failure-exception notification must be delivered to the farthest, contiguous surviving thread segment for resuming thread execution. We present a realtime scheduling algorithm called AUA, and a distributable thread integrity protocol called TP-TR. We show that AUA and TP-TR bound the orphan cleanup and recovery time, thereby bounding thread starvation durations, and maximize the total thread accrued timeliness utility. We implement AUA and TP-TR in a real-time middleware that supports distributable threads. Our experimental studies with the implementation validate the algorithm/protocol's time-bounded recovery property and confirm their effectiveness","PeriodicalId":164765,"journal":{"name":"2006 25th IEEE Symposium on Reliable Distributed Systems (SRDS'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130598724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}