This paper focuses on fault tolerance of super-nodes in P2P-SIP systems. The large-scale environments such as P2P-SIP networks are characterized by high volatility (i.e. a high frequency of failures of super-nodes). Most fault-tolerant proposed solutions are only for physical defects. They do not take into account the timing faults that are very important for multimedia applications such as telephony. We propose HP2P-SIP which is a timing and physical fault tolerant approach based on a hierarchical approach for P2P-SIP systems. Using the Oversim simulator, we demonstrate the feasibility and the efficiency of HP2P-SIP. The obtained results show that our proposition reduces significantly the localization time of nodes, and increases the probability to find the called nodes. This optimization allows to improve the efficiency of applications that have a strong time constraints such as VoIP systems in dynamic P2P networks.
{"title":"A Hierarchical DHT for Fault Tolerant Management in P2P-SIP Networks","authors":"Ibrahima Diane, I. Niang, B. Gueye","doi":"10.1109/ICPADS.2010.43","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.43","url":null,"abstract":"This paper focuses on fault tolerance of super-nodes in P2P-SIP systems. The large-scale environments such as P2P-SIP networks are characterized by high volatility (i.e. a high frequency of failures of super-nodes). Most fault-tolerant proposed solutions are only for physical defects. They do not take into account the timing faults that are very important for multimedia applications such as telephony. We propose HP2P-SIP which is a timing and physical fault tolerant approach based on a hierarchical approach for P2P-SIP systems. Using the Oversim simulator, we demonstrate the feasibility and the efficiency of HP2P-SIP. The obtained results show that our proposition reduces significantly the localization time of nodes, and increases the probability to find the called nodes. This optimization allows to improve the efficiency of applications that have a strong time constraints such as VoIP systems in dynamic P2P networks.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131038305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Replication is a technique widely used in parallel and distributed systems to provide qualities such as performance, scalability, reliability and availability to their clients. These qualities comprise the non-functional requirements of the system. But the functional requirement consistency may also get affected as a side-effect of replication. Different replica control protocols provide different levels of consistency from the system. In this paper we present the middleware based McRep replication protocol that supports multiple consistency model in a distributed system with replicated data. Both correctness criteria and divergence aspects of a consistency model can be specified in the McRep configuration. Supported correctness criteria include linearizability, sequential consistency, serializability, snapshot isolation and causal consistency. Bounds on divergence can be specified in either version metric or delay metric. Our approach allows the same middleware to be used for applications requiring different consistency guarantees, eliminating the need for mastering a new replication middleware or framework for every application. We carried out experiments to compare the performance of various consistency requirements in terms of response time, concurrency conflict and bandwidth overhead. We demonstrate that in McRep workloads only pay for the consistency guarantees they actually need.
{"title":"Multi-consistency Data Replication","authors":"Raihan Al-Ekram, R. Holt","doi":"10.1109/ICPADS.2010.67","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.67","url":null,"abstract":"Replication is a technique widely used in parallel and distributed systems to provide qualities such as performance, scalability, reliability and availability to their clients. These qualities comprise the non-functional requirements of the system. But the functional requirement consistency may also get affected as a side-effect of replication. Different replica control protocols provide different levels of consistency from the system. In this paper we present the middleware based McRep replication protocol that supports multiple consistency model in a distributed system with replicated data. Both correctness criteria and divergence aspects of a consistency model can be specified in the McRep configuration. Supported correctness criteria include linearizability, sequential consistency, serializability, snapshot isolation and causal consistency. Bounds on divergence can be specified in either version metric or delay metric. Our approach allows the same middleware to be used for applications requiring different consistency guarantees, eliminating the need for mastering a new replication middleware or framework for every application. We carried out experiments to compare the performance of various consistency requirements in terms of response time, concurrency conflict and bandwidth overhead. We demonstrate that in McRep workloads only pay for the consistency guarantees they actually need.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127109236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, the CUDA technology has been used to accelerate many computation demanding tasks. For example, in our previous work we have shown how CUDA technology can be employed to accelerate the process of Linear Temporal Logic (LTL) Model Checking. While the raw computing power of a CUDA enabled device is tremendous, the applicability of the technology is quite often limited to small or middle-sized instances of the problems being solved. This is because the memory that a single device is equipped with, is simply not large enough to cope with large or realistic instances of the problem, which is also the case of our CUDA-aware LTL Model Checking solution. In this paper we suggest how to overcome this limitations by employing multiple (two in our case) CUDA devices for acceleration of our fine-grained communication-intensive parallel algorithm for LTL Model Checking.
{"title":"Employing Multiple CUDA Devices to Accelerate LTL Model Checking","authors":"J. Barnat, Petr Bauch, L. Brim, Milan Ceska","doi":"10.1109/ICPADS.2010.82","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.82","url":null,"abstract":"Recently, the CUDA technology has been used to accelerate many computation demanding tasks. For example, in our previous work we have shown how CUDA technology can be employed to accelerate the process of Linear Temporal Logic (LTL) Model Checking. While the raw computing power of a CUDA enabled device is tremendous, the applicability of the technology is quite often limited to small or middle-sized instances of the problems being solved. This is because the memory that a single device is equipped with, is simply not large enough to cope with large or realistic instances of the problem, which is also the case of our CUDA-aware LTL Model Checking solution. In this paper we suggest how to overcome this limitations by employing multiple (two in our case) CUDA devices for acceleration of our fine-grained communication-intensive parallel algorithm for LTL Model Checking.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123586226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Participatory Sensing is an emerging application paradigm that leverages the growing ubiquity of sensor-capable smart phones to allow communities carry out wide-area sensing tasks, as a side-effect of people's everyday lives and movements. This paper proposes a decentralized infrastructure for supporting Participatory Sensing applications. It describes an architecture and a domain specific programming language for modeling, prototyping and developing the distributed processing of participatory sensing data with the goal of allowing faster and easier development of these applications. Moreover, a case-study application is also presented as the basis for an experimental evaluation.
{"title":"4Sensing -- Decentralized Processing for Participatory Sensing Data","authors":"Heitor Ferreira, S. Duarte, Nuno M. Preguiça","doi":"10.1109/ICPADS.2010.20","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.20","url":null,"abstract":"Participatory Sensing is an emerging application paradigm that leverages the growing ubiquity of sensor-capable smart phones to allow communities carry out wide-area sensing tasks, as a side-effect of people's everyday lives and movements. This paper proposes a decentralized infrastructure for supporting Participatory Sensing applications. It describes an architecture and a domain specific programming language for modeling, prototyping and developing the distributed processing of participatory sensing data with the goal of allowing faster and easier development of these applications. Moreover, a case-study application is also presented as the basis for an experimental evaluation.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128602658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Environmental monitoring is an important application area for wireless sensor networks (WSNs). An important problem for environmental WSNs is the characterization of the dynamic behaviour of transient physical phenomena over space. In the case of mote-level WSNs, a solution that is computed inside the WSN is essential for energy efficiency. In this context, the main contributions of this paper to the literature on in network processing in WSNs are threefold. The paper further develops an algebraic framework with which one can express and evaluate complex topological relationships over geometrical representations of permanent features (e.g., buildings, or geographical features such as lakes and rivers) and of transient phenomena (e.g., areas of mist over a cultivated field). The paper then describes distributed implementations of spatial-algebraic operations over the regions represented by that framework, thereby enabling identification of topological relationships between regions. Finally, the paper presents experimental evidence that the techniques described lead to efficient runtime behaviour. Taken together, these contributions constitute a further step towards enabling the high-level specification of expressive spatial analyses for efficient execution inside a WSN.
{"title":"Distributed Spatial Analysis in Wireless Sensor Networks","authors":"Farhana Jabeen, A. Fernandes","doi":"10.1109/ICPADS.2010.58","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.58","url":null,"abstract":"Environmental monitoring is an important application area for wireless sensor networks (WSNs). An important problem for environmental WSNs is the characterization of the dynamic behaviour of transient physical phenomena over space. In the case of mote-level WSNs, a solution that is computed inside the WSN is essential for energy efficiency. In this context, the main contributions of this paper to the literature on in network processing in WSNs are threefold. The paper further develops an algebraic framework with which one can express and evaluate complex topological relationships over geometrical representations of permanent features (e.g., buildings, or geographical features such as lakes and rivers) and of transient phenomena (e.g., areas of mist over a cultivated field). The paper then describes distributed implementations of spatial-algebraic operations over the regions represented by that framework, thereby enabling identification of topological relationships between regions. Finally, the paper presents experimental evidence that the techniques described lead to efficient runtime behaviour. Taken together, these contributions constitute a further step towards enabling the high-level specification of expressive spatial analyses for efficient execution inside a WSN.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"70 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122691480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Powerful wireless devices carried by humans can form human contact-based networks. Such networks often suffer from intermittent connectivity. Thus, providing an effective information dissemination feature in such networks is very important. In this paper, we explore a cooperative user centric information dissemination scheme which allows published data items to be delivered to interested nodes efficiently. Our scheme uses fewer relays and allows each node to operate distributedly using locally gathered information. Our scheme is more effective than the epidemic scheme since it achieves comparable success ratio with a 45-60% reduction in storage requirement and 47-53% reduction in transmissions. We also compare our scheme with an ideal scheme which assumes one can analyze contact traces apriori to determine their dominating sets, and show that our scheme can be more efficient than this ideal scheme.
{"title":"Cooperative User Centric Information Dissemination in Human Content-Based Networks","authors":"M. Chuah, P. Yang, Pan Hui","doi":"10.1109/ICPADS.2010.77","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.77","url":null,"abstract":"Powerful wireless devices carried by humans can form human contact-based networks. Such networks often suffer from intermittent connectivity. Thus, providing an effective information dissemination feature in such networks is very important. In this paper, we explore a cooperative user centric information dissemination scheme which allows published data items to be delivered to interested nodes efficiently. Our scheme uses fewer relays and allows each node to operate distributedly using locally gathered information. Our scheme is more effective than the epidemic scheme since it achieves comparable success ratio with a 45-60% reduction in storage requirement and 47-53% reduction in transmissions. We also compare our scheme with an ideal scheme which assumes one can analyze contact traces apriori to determine their dominating sets, and show that our scheme can be more efficient than this ideal scheme.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122185268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As one of the components in iVCE software platform, iVCE/M devotes to the performance improvement of the I/O-intensive and memory-intensive applications with efficient aggregation of distributed memory resources. To facilitate the deployment of iVCE/M, the data locating algorithm with balanced time and space cost, as well as the transparent interface for the legacy applications without code modification, are both significant in the implementation of iVCE/M. We propose the logarithmic search tree based client-side metadata structure to accelerate the data locating using moderate memory consumption, the implicit I/O redirection mechanism, and the implementation of iVCE/M based disk cache system. The experiments with cross domain emulation prove that the scheme is applicable to exploit the distributed memory resources for applications with small granularity I/O accesses.
{"title":"Design and Practice on iVCE for Memory System","authors":"Rui Chu, Tian Tian, Zhenli Lin","doi":"10.1109/ICPADS.2010.62","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.62","url":null,"abstract":"As one of the components in iVCE software platform, iVCE/M devotes to the performance improvement of the I/O-intensive and memory-intensive applications with efficient aggregation of distributed memory resources. To facilitate the deployment of iVCE/M, the data locating algorithm with balanced time and space cost, as well as the transparent interface for the legacy applications without code modification, are both significant in the implementation of iVCE/M. We propose the logarithmic search tree based client-side metadata structure to accelerate the data locating using moderate memory consumption, the implicit I/O redirection mechanism, and the implementation of iVCE/M based disk cache system. The experiments with cross domain emulation prove that the scheme is applicable to exploit the distributed memory resources for applications with small granularity I/O accesses.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122003844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the good properties of NAND flash memory such as small size, shock resistance, and low-power consumption, large capacity SSD (Solid State Disk) is anticipated to replace hard disk in high-end systems. However, the cost of NAND flash memory is still high to substitute for hard disk entirely. Using hard disk and NAND flash memory together as secondary storage is an alternative solution to provide relatively low response time, large capacity, and reasonable cost. In this paper, we present a new buffer cache management scheme with data migration that is optimized to use both NAND flash memory and hard disk together as secondary storage. The proposed scheme has three salient features. First, it detects I/O access patterns from each storage, and allocates the buffer cache space for each pattern by computing marginal gain adaptively considering the I/O cost of storage. Second, it prefetches data selectively according to their access pattern and storage devices. Third, it moves the evicted data from the buffer cache to hard disk or NAND flash memory considering the access patterns of block references on the reclamation. Trace-driven simulations show that the proposed scheme improves the I/O performance significantly. It enhances the buffer cache hit ratio by up to 29.9% and reduces the total I/O elapsed time by up to 49.5% compared to the well-acknowledged UBM scheme.
由于NAND闪存具有体积小、耐冲击、低功耗等优点,大容量SSD (Solid State Disk)有望在高端系统中取代硬盘。然而,NAND闪存的成本仍然很高,无法完全取代硬盘。使用硬盘和NAND闪存作为辅助存储是一种替代解决方案,可以提供相对较低的响应时间、较大的容量和合理的成本。在本文中,我们提出了一种新的缓冲区高速缓存管理方案,该方案优化了NAND闪存和硬盘作为辅助存储的使用。提出的方案有三个显著特点。首先,它从每个存储中检测I/O访问模式,并根据存储的I/O成本自适应计算边际增益,为每个模式分配缓冲缓存空间。其次,根据数据的访问方式和存储设备选择性地预取数据。第三,考虑回收时块引用的访问模式,将被驱逐的数据从缓冲缓存移动到硬盘或NAND闪存。跟踪驱动仿真表明,该方案显著提高了I/O性能。与公认的UBM方案相比,它将缓冲区缓存命中率提高了29.9%,并将总I/O消耗时间减少了49.5%。
{"title":"Unifying Buffer Replacement and Prefetching with Data Migration for Heterogeneous Storage Devices","authors":"Sehwan Lee, K. Koh, H. Bahn","doi":"10.1109/ICPADS.2010.103","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.103","url":null,"abstract":"With the good properties of NAND flash memory such as small size, shock resistance, and low-power consumption, large capacity SSD (Solid State Disk) is anticipated to replace hard disk in high-end systems. However, the cost of NAND flash memory is still high to substitute for hard disk entirely. Using hard disk and NAND flash memory together as secondary storage is an alternative solution to provide relatively low response time, large capacity, and reasonable cost. In this paper, we present a new buffer cache management scheme with data migration that is optimized to use both NAND flash memory and hard disk together as secondary storage. The proposed scheme has three salient features. First, it detects I/O access patterns from each storage, and allocates the buffer cache space for each pattern by computing marginal gain adaptively considering the I/O cost of storage. Second, it prefetches data selectively according to their access pattern and storage devices. Third, it moves the evicted data from the buffer cache to hard disk or NAND flash memory considering the access patterns of block references on the reclamation. Trace-driven simulations show that the proposed scheme improves the I/O performance significantly. It enhances the buffer cache hit ratio by up to 29.9% and reduces the total I/O elapsed time by up to 49.5% compared to the well-acknowledged UBM scheme.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"226 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131445985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haifeng Fang, Yiqiang Zhao, Hongyong Zang, H. H. Huang, Ying Song, Yuzhong Sun, Zhiyong Liu
A cloud computing provider can dynamically allocate virtual machines (VM) based on the needs of the customers, while maintaining the privileged access to the Management Virtual Machine that directly manages the hardware and supports the guest VMs. The customers must trust the cloud providers to protect the confidentiality and integrity of their applications and data. However, as the VMs from different customers are running on the same host, an attack to the management virtual machine will easily lead to the compromise of the guest VMs. Therefore, it is critical for a cloud computing system to ensure the trustworthiness of management VMs. To this end, we propose VMGuard, an integrity monitoring and detecting system for management virtual machines in a distributed environment. VMGuard utilizes a special VM, Guard Domain, which runs on each physical node to monitor the co-resident management VMs. The integrity measurements collected by the Guard Domains are sent to the VMGuard server for safe store and independent analysis. The experimental evaluation of a Xen-based prototype shows that VMGuard can quickly detect the root kit attacks while the performance overhead is low.
{"title":"VMGuard: An Integrity Monitoring System for Management Virtual Machines","authors":"Haifeng Fang, Yiqiang Zhao, Hongyong Zang, H. H. Huang, Ying Song, Yuzhong Sun, Zhiyong Liu","doi":"10.1109/ICPADS.2010.44","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.44","url":null,"abstract":"A cloud computing provider can dynamically allocate virtual machines (VM) based on the needs of the customers, while maintaining the privileged access to the Management Virtual Machine that directly manages the hardware and supports the guest VMs. The customers must trust the cloud providers to protect the confidentiality and integrity of their applications and data. However, as the VMs from different customers are running on the same host, an attack to the management virtual machine will easily lead to the compromise of the guest VMs. Therefore, it is critical for a cloud computing system to ensure the trustworthiness of management VMs. To this end, we propose VMGuard, an integrity monitoring and detecting system for management virtual machines in a distributed environment. VMGuard utilizes a special VM, Guard Domain, which runs on each physical node to monitor the co-resident management VMs. The integrity measurements collected by the Guard Domains are sent to the VMGuard server for safe store and independent analysis. The experimental evaluation of a Xen-based prototype shows that VMGuard can quickly detect the root kit attacks while the performance overhead is low.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116162506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A virtual cluster consists of a multitude of virtual machines and software components that are doomed to fail eventually. In many environments, such failures can result in unanticipated, potentially devastating failure behavior and in service unavailability. The ability of failover is essential to the virtual cluster’s availability, reliability, and manageability. Most of the existing methods have several common disadvantages: requiring modifications to the target processes or their OSes, which is usually error prone and sometimes impractical; only targeting at taking checkpoints of processes, not whole entire OS images, which limits the areas to be applied. In this paper we present VirtCFT, an innovative and practical system of fault tolerance for virtual cluster. VirtCFT is a system-level, coordinated distributed checkpointing fault tolerant system. It coordinates the distributed VMs to periodically reach the globally consistent state and take the checkpoint of the whole virtual cluster including states of CPU, memory, disk of each VM as well as the network communications. When faults occur, VirtCFT will automatically recover the entire virtual cluster to the correct state within a few seconds and keep it running. Superior to all the existing fault tolerance mechanisms, VirtCFT provides a simpler and totally transparent fault tolerant platform that allows existing, unmodified software and operating system (version unawareness) to be protected from the failure of the physical machine on which it runs. We have implemented this system based on the Xen virtualization platform. Our experiments with real-world benchmarks demonstrate the effectiveness and correctness of VirtCFT.
{"title":"VirtCFT: A Transparent VM-Level Fault-Tolerant System for Virtual Clusters","authors":"Minjia Zhang, Hai Jin, Xuanhua Shi, Song Wu","doi":"10.1109/ICPADS.2010.125","DOIUrl":"https://doi.org/10.1109/ICPADS.2010.125","url":null,"abstract":"A virtual cluster consists of a multitude of virtual machines and software components that are doomed to fail eventually. In many environments, such failures can result in unanticipated, potentially devastating failure behavior and in service unavailability. The ability of failover is essential to the virtual cluster’s availability, reliability, and manageability. Most of the existing methods have several common disadvantages: requiring modifications to the target processes or their OSes, which is usually error prone and sometimes impractical; only targeting at taking checkpoints of processes, not whole entire OS images, which limits the areas to be applied. In this paper we present VirtCFT, an innovative and practical system of fault tolerance for virtual cluster. VirtCFT is a system-level, coordinated distributed checkpointing fault tolerant system. It coordinates the distributed VMs to periodically reach the globally consistent state and take the checkpoint of the whole virtual cluster including states of CPU, memory, disk of each VM as well as the network communications. When faults occur, VirtCFT will automatically recover the entire virtual cluster to the correct state within a few seconds and keep it running. Superior to all the existing fault tolerance mechanisms, VirtCFT provides a simpler and totally transparent fault tolerant platform that allows existing, unmodified software and operating system (version unawareness) to be protected from the failure of the physical machine on which it runs. We have implemented this system based on the Xen virtualization platform. Our experiments with real-world benchmarks demonstrate the effectiveness and correctness of VirtCFT.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129623909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}