Pub Date : 2013-06-24DOI: 10.1109/DSN.2013.6575325
Qiushi Wang, Huaming Wu, K. Wolter
Offloading is a useful approach to save energy and time for mobile devices by migrating heavy computation to remote powerful servers. However, the unreliable wireless network constrains the implementation of offloading applications. The execution continuity is always interrupted by network failures. To deal with this problem, locally re-executing the pre-determined offloading task in the mobile device is a valid method. Challenges arise due to the best trade-off between costs and benefits of Local Re-execution. In this paper, using a Stochastic Activity Network model, we defined three metrics to investigate the performance of Local Re-execution, which is launched by different timeout values. Through comprehensively comparing the simulation results, we further explored the optimal timeout value for activating Local Re-execution, and reached the conclusion that the optimum is mainly controlled by the delay of network recovery.
{"title":"Model-based performance analysis of local re-execution scheme in offloading system","authors":"Qiushi Wang, Huaming Wu, K. Wolter","doi":"10.1109/DSN.2013.6575325","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575325","url":null,"abstract":"Offloading is a useful approach to save energy and time for mobile devices by migrating heavy computation to remote powerful servers. However, the unreliable wireless network constrains the implementation of offloading applications. The execution continuity is always interrupted by network failures. To deal with this problem, locally re-executing the pre-determined offloading task in the mobile device is a valid method. Challenges arise due to the best trade-off between costs and benefits of Local Re-execution. In this paper, using a Stochastic Activity Network model, we defined three metrics to investigate the performance of Local Re-execution, which is launched by different timeout values. Through comprehensively comparing the simulation results, we further explored the optimal timeout value for activating Local Re-execution, and reached the conclusion that the optimum is mainly controlled by the delay of network recovery.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121037498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-24DOI: 10.1109/DSN.2013.6575340
A. Avizienis
The resilience infrastructure is a physically and functionally separate add-on to a “Client” computing and/or communication system that provides resilience to the Client system. This short paper summarizes the main features of the architecture of a resilience infrastructure.
{"title":"The architecture of a resilience infrastructure for computing and communication systems","authors":"A. Avizienis","doi":"10.1109/DSN.2013.6575340","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575340","url":null,"abstract":"The resilience infrastructure is a physically and functionally separate add-on to a “Client” computing and/or communication system that provides resilience to the Client system. This short paper summarizes the main features of the architecture of a resilience infrastructure.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116493960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-24DOI: 10.1109/DSN.2013.6575321
Yang Liu, J. Muppala
Data center networks (DCNs) are inherently failure-prone owing to the existence of many links, switches and servers. Many times the failures of the components may be correlated resulting a set of connected components failing together. This correlated failure behaviour could be captured through the use of fault regions [1]. This paper explores the effect of such failures in DCNs, using four topologies, viz., Fat Tree, DCell, FlatNet and BCube. We used two categories of metrics for evaluation: connection-oriented metrics, including aggregated bottleneck throughput (ABT), average path length (APL) and routing failure rate (RFR); and network size-oriented metrics, including Component Decomposition Number (CDN) and Smallest/Largest Component Size (SCS/LCS).
{"title":"Fault-tolerance characteristics of data center network topologies using fault regions","authors":"Yang Liu, J. Muppala","doi":"10.1109/DSN.2013.6575321","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575321","url":null,"abstract":"Data center networks (DCNs) are inherently failure-prone owing to the existence of many links, switches and servers. Many times the failures of the components may be correlated resulting a set of connected components failing together. This correlated failure behaviour could be captured through the use of fault regions [1]. This paper explores the effect of such failures in DCNs, using four topologies, viz., Fat Tree, DCell, FlatNet and BCube. We used two categories of metrics for evaluation: connection-oriented metrics, including aggregated bottleneck throughput (ABT), average path length (APL) and routing failure rate (RFR); and network size-oriented metrics, including Component Decomposition Number (CDN) and Smallest/Largest Component Size (SCS/LCS).","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128119587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-24DOI: 10.1109/DSN.2013.6575360
Daniele Sciascia, F. Pedone
Many current online services are deployed over geographically distributed sites (i.e., datacenters). Such distributed services call for geo-replicated storage, that is, storage distributed and replicated among many sites. Geographical distribution and replication can improve locality and availability of a service. Locality is achieved by moving data closer to the users. High availability is attained by replicating data in multiple servers and sites. This paper considers a class of scalable replicated storage systems based on deferred update replication with transactional properties. The paper discusses different ways to deploy scalable deferred update replication in geographically distributed systems, considers the implications of these deployments on user-perceived latency, and proposes solutions. Our results are substantiated by a series of microbenchmarks and a social network application.
{"title":"Geo-replicated storage with scalable deferred update replication","authors":"Daniele Sciascia, F. Pedone","doi":"10.1109/DSN.2013.6575360","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575360","url":null,"abstract":"Many current online services are deployed over geographically distributed sites (i.e., datacenters). Such distributed services call for geo-replicated storage, that is, storage distributed and replicated among many sites. Geographical distribution and replication can improve locality and availability of a service. Locality is achieved by moving data closer to the users. High availability is attained by replicating data in multiple servers and sites. This paper considers a class of scalable replicated storage systems based on deferred update replication with transactional properties. The paper discusses different ways to deploy scalable deferred update replication in geographically distributed systems, considers the implications of these deployments on user-perceived latency, and proposes solutions. Our results are substantiated by a series of microbenchmarks and a social network application.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125138292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-24DOI: 10.1109/DSN.2013.6575312
Maxim Siniavine, Ashvin Goel
Kernel patches are released frequently to fix bugs and security vulnerabilities. However, users and system administrators often delay installing these updates because they require a system reboot, which results in disruption of service and the loss of application state. Unfortunately, the longer a system remains out-of-date, the higher is the likelihood of system failure or a successful attack. Approaches, such as dynamic patching and hot swapping, have been proposed for updating the kernel. All of them either limit the types of updates that are supported, or require significant programming effort to manage. We have designed a system that checkpoints application-visible state, updates the kernel, and restores the application state thus minimizing disruption of service. By checkpointing high-level state, our system no longer depends on the precise implementation of a patch and can apply all backward compatible patches. Our results show that updates to major releases of the Linux kernel can be applied with minimal effort and no observable overhead.
{"title":"Seamless kernel updates","authors":"Maxim Siniavine, Ashvin Goel","doi":"10.1109/DSN.2013.6575312","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575312","url":null,"abstract":"Kernel patches are released frequently to fix bugs and security vulnerabilities. However, users and system administrators often delay installing these updates because they require a system reboot, which results in disruption of service and the loss of application state. Unfortunately, the longer a system remains out-of-date, the higher is the likelihood of system failure or a successful attack. Approaches, such as dynamic patching and hot swapping, have been proposed for updating the kernel. All of them either limit the types of updates that are supported, or require significant programming effort to manage. We have designed a system that checkpoints application-visible state, updates the kernel, and restores the application state thus minimizing disruption of service. By checkpointing high-level state, our system no longer depends on the precise implementation of a patch and can apply all backward compatible patches. Our results show that updates to major releases of the Linux kernel can be applied with minimal effort and no observable overhead.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122906167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-24DOI: 10.1109/DSN.2013.6575357
Feng Tan, Yufei Wang, Qixin Wang, Lei Bu, Rong L. Zheng, N. Suri
Cyber-Physical Systems (CPS) integrate discrete-time computing and continuous-time physical-world entities, which are often wirelessly interlinked. The use of wireless safety critical CPS (control, healthcare etc.) requires safety guarantees despite communication faults. This paper focuses on one important set of such safety rules: Proper-Temporal-Embedding (PTE). Our solution introduces hybrid automata to formally describe and analyze CPS design patterns. We propose a novel lease based design pattern, along with closed-form configuration constraints, to guarantee PTE safety rules under arbitrary wireless communication faults. We propose a formal methodology to transform the design pattern hybrid automata into specific wireless CPS designs. This methodology can effectively isolate physical world parameters from affecting the PTE safety of the resultant specific designs. We conduct a case study on laser tracheotomy wireless CPS to show that the resulting system is safe and can withstand communication disruptions.
{"title":"Guaranteeing Proper-Temporal-Embedding safety rules in wireless CPS: A hybrid formal modeling approach","authors":"Feng Tan, Yufei Wang, Qixin Wang, Lei Bu, Rong L. Zheng, N. Suri","doi":"10.1109/DSN.2013.6575357","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575357","url":null,"abstract":"Cyber-Physical Systems (CPS) integrate discrete-time computing and continuous-time physical-world entities, which are often wirelessly interlinked. The use of wireless safety critical CPS (control, healthcare etc.) requires safety guarantees despite communication faults. This paper focuses on one important set of such safety rules: Proper-Temporal-Embedding (PTE). Our solution introduces hybrid automata to formally describe and analyze CPS design patterns. We propose a novel lease based design pattern, along with closed-form configuration constraints, to guarantee PTE safety rules under arbitrary wireless communication faults. We propose a formal methodology to transform the design pattern hybrid automata into specific wireless CPS designs. This methodology can effectively isolate physical world parameters from affecting the PTE safety of the resultant specific designs. We conduct a case study on laser tracheotomy wireless CPS to show that the resulting system is safe and can withstand communication disruptions.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122900865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-24DOI: 10.1109/DSN.2013.6575306
M. Biely, Pamela Delgado, Zarko Milosevic, A. Schiper
We introduce Distal, a new framework that simplifies turning pseudocode of fault tolerant distributed algorithms into efficient executable code. Without proper tool support, even small amounts of pseudocode normally ends up in several thousands of non-trivial lines of Java or C++. Distal is implemented as a library in Scala and consists of two main parts: a domain specific language (DSL) in which algorithms are expressed and an efficient messaging layer that deals with low level issues such as connection management, threading and (de)serialization. The DSL is designed such that implementations of distributed algorithms highly resemble the pseudocode found in research papers. By writing code that is close to the protocol description, one can be more convinced that the implemented system really reflects the protocol specification on paper. Distal does not only make it simple and intuitive to implement distributed algorithms but it also leads to efficient implementations.
{"title":"Distal: A framework for implementing fault-tolerant distributed algorithms","authors":"M. Biely, Pamela Delgado, Zarko Milosevic, A. Schiper","doi":"10.1109/DSN.2013.6575306","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575306","url":null,"abstract":"We introduce Distal, a new framework that simplifies turning pseudocode of fault tolerant distributed algorithms into efficient executable code. Without proper tool support, even small amounts of pseudocode normally ends up in several thousands of non-trivial lines of Java or C++. Distal is implemented as a library in Scala and consists of two main parts: a domain specific language (DSL) in which algorithms are expressed and an efficient messaging layer that deals with low level issues such as connection management, threading and (de)serialization. The DSL is designed such that implementations of distributed algorithms highly resemble the pseudocode found in research papers. By writing code that is close to the protocol description, one can be more convinced that the implemented system really reflects the protocol specification on paper. Distal does not only make it simple and intuitive to implement distributed algorithms but it also leads to efficient implementations.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114137807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-24DOI: 10.1109/DSN.2013.6575361
Xu Wang, Hailong Sun, Ting Deng, J. Huai
Existing theories like CAP and PACELC have claimed that there are tradeoffs between some pairs of performance measures in distributed replication systems, such as consistency and latency. However, current systems take a very vague view on how to balance those tradeoffs, e.g. eventual consistency. In this work, we are concerned with providing a quantitative analysis on consistency and latency for widely-used replicated state machines(RSMs). Based on our presented generic RSM model called RSM-d, probabilistic models are built to quantify consistency and latency. We show that both are affected by d, which is the number of ACKs received by the coordinator before committing a write request. And we further define a payoff model through combining the consistency and latency models. Finally, with Monte Carlo based simulation, we validate our presented models and show the effectiveness of our solutions in terms of how to obtain an optimal tradeoff between consistency and latency.
{"title":"Consistency or latency? A quantitative analysis of replication systems based on replicated state machines","authors":"Xu Wang, Hailong Sun, Ting Deng, J. Huai","doi":"10.1109/DSN.2013.6575361","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575361","url":null,"abstract":"Existing theories like CAP and PACELC have claimed that there are tradeoffs between some pairs of performance measures in distributed replication systems, such as consistency and latency. However, current systems take a very vague view on how to balance those tradeoffs, e.g. eventual consistency. In this work, we are concerned with providing a quantitative analysis on consistency and latency for widely-used replicated state machines(RSMs). Based on our presented generic RSM model called RSM-d, probabilistic models are built to quantify consistency and latency. We show that both are affected by d, which is the number of ACKs received by the coordinator before committing a write request. And we further define a payoff model through combining the consistency and latency models. Finally, with Monte Carlo based simulation, we validate our presented models and show the effectiveness of our solutions in terms of how to obtain an optimal tradeoff between consistency and latency.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130090532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-24DOI: 10.1109/DSN.2013.6575335
Yung-Li Hu, Wei-Bing Su, Li-ying Wu, Yennun Huang, S. Kuo
OpenFlow (OF) Network is a novel network architecture many famous cloud service providers have applied it to build their data center network. The difference between OF Network and traditional network architecture is the decoupling of controller planes and data planes for network management. Intrusion detection is very important in cloud computing to improve system security. Because OF network can improve the response time of an alert by efficiently configuring network flows, we design an event-based Intrusion Detection System (IDS) architecture on OF network.
{"title":"Design of event-based Intrusion Detection System on OpenFlow Network","authors":"Yung-Li Hu, Wei-Bing Su, Li-ying Wu, Yennun Huang, S. Kuo","doi":"10.1109/DSN.2013.6575335","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575335","url":null,"abstract":"OpenFlow (OF) Network is a novel network architecture many famous cloud service providers have applied it to build their data center network. The difference between OF Network and traditional network architecture is the decoupling of controller planes and data planes for network management. Intrusion detection is very important in cloud computing to improve system security. Because OF network can improve the response time of an alert by efficiently configuring network flows, we design an event-based Intrusion Detection System (IDS) architecture on OF network.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133313559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-06-24DOI: 10.1109/DSN.2013.6575346
G. Iacobelli, M. Tribastone
Fluid models have gained popularity in the performance modeling of computing systems and communication networks. When the model under study consists of many different types of agents, the size of the associated system of ordinary differential equations (ODEs) increases with the number of types, making the analysis more difficult. We study this problem for a class of models where heterogeneity is expressed as a perturbation of certain parameters of the ODE vector field. We provide an a-priori bound that relates the solutions of the original, heterogenous model with that of an ODE system of smaller size which arises from aggregating system variables concerning different types of agents. By showing that this bound grows linearly with the intensity of the perturbation, we provide a formal justification to the intuitive possibility of neglecting small differences in agents' behavior as a means to reducing the dimensionality of the original system.
{"title":"Lumpability of fluid models with heterogeneous agent types","authors":"G. Iacobelli, M. Tribastone","doi":"10.1109/DSN.2013.6575346","DOIUrl":"https://doi.org/10.1109/DSN.2013.6575346","url":null,"abstract":"Fluid models have gained popularity in the performance modeling of computing systems and communication networks. When the model under study consists of many different types of agents, the size of the associated system of ordinary differential equations (ODEs) increases with the number of types, making the analysis more difficult. We study this problem for a class of models where heterogeneity is expressed as a perturbation of certain parameters of the ODE vector field. We provide an a-priori bound that relates the solutions of the original, heterogenous model with that of an ODE system of smaller size which arises from aggregating system variables concerning different types of agents. By showing that this bound grows linearly with the intensity of the perturbation, we provide a formal justification to the intuitive possibility of neglecting small differences in agents' behavior as a means to reducing the dimensionality of the original system.","PeriodicalId":163407,"journal":{"name":"2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132741362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}