Wireless Sensor and Actuator Networks (WSANs) are composed of sensors and actuators to perform distributed sensing and actuating tasks. Most WSAN applications (e.g., fire detection) demand that actuators rapidly respond to events under observation. Therefore, real-time and fault-tolerant transmission is a critical requirement in WSANs to enable sensed data to reach actuators reliably and quickly. Due to limited power resources, energy-efficiency is another crucial requirement. Such requirements become formidably challenging in large-scale WSANs. However, existing WSANs fall short in meeting these requirements. To this end, we first theoretically study the Kautz graph for its applicability in WSANs to meet these requirements. We then propose a Kautz-based Real-time, Fault-tolerant and Energy-efficient WSAN (REFER). REFER has a protocol that embeds Kautz graphs into the physical topology of a WSAN for real-time communication and connects the graphs using Distributed Hash Table (DHT) for high scalability. We also theoretically study routing paths in the Kautz graph, based on which we develop an efficient fault-tolerant routing protocol. It enables a relay node to quickly and efficiently identify the next shortest path from itself to the destination only based on node IDs upon routing failure. REFER is advantageous over previous Kautz graph based works in that it does not need an energy-consuming protocol to find the next shortest path and it can maintain the consistency between the overlay and physical topology. Experimental results demonstrate the superior performance of REFER in comparison with existing systems in terms of real-time communication, energy-efficiency, fault-tolerance and scalability.
{"title":"A Kautz-based Real-Time and Energy-Efficient Wireless Sensor and Actuator Network","authors":"Ze Li, Haiying Shen","doi":"10.1109/ICDCS.2012.43","DOIUrl":"https://doi.org/10.1109/ICDCS.2012.43","url":null,"abstract":"Wireless Sensor and Actuator Networks (WSANs) are composed of sensors and actuators to perform distributed sensing and actuating tasks. Most WSAN applications (e.g., fire detection) demand that actuators rapidly respond to events under observation. Therefore, real-time and fault-tolerant transmission is a critical requirement in WSANs to enable sensed data to reach actuators reliably and quickly. Due to limited power resources, energy-efficiency is another crucial requirement. Such requirements become formidably challenging in large-scale WSANs. However, existing WSANs fall short in meeting these requirements. To this end, we first theoretically study the Kautz graph for its applicability in WSANs to meet these requirements. We then propose a Kautz-based Real-time, Fault-tolerant and Energy-efficient WSAN (REFER). REFER has a protocol that embeds Kautz graphs into the physical topology of a WSAN for real-time communication and connects the graphs using Distributed Hash Table (DHT) for high scalability. We also theoretically study routing paths in the Kautz graph, based on which we develop an efficient fault-tolerant routing protocol. It enables a relay node to quickly and efficiently identify the next shortest path from itself to the destination only based on node IDs upon routing failure. REFER is advantageous over previous Kautz graph based works in that it does not need an energy-consuming protocol to find the next shortest path and it can maintain the consistency between the overlay and physical topology. Experimental results demonstrate the superior performance of REFER in comparison with existing systems in terms of real-time communication, energy-efficiency, fault-tolerance and scalability.","PeriodicalId":6300,"journal":{"name":"2012 IEEE 32nd International Conference on Distributed Computing Systems","volume":"29 1","pages":"62-71"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79500382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Das, P. Flocchini, G. Prencipe, N. Santoro, M. Yamashita
In this paper we study the power of using lights, i.e. visible external memory, for distributed computation by autonomous robots moving in LookCompute-Move (LCM) cycles. With respect to the LCM cycles, the most common models studied in the literature are the fully-synchronous (FSYNC), the semisynchronous (SSYNC), and the asynchronous (ASYNC). In this paper we introduce in the ASYNC model, the weakest of the three, the availability of visible external memory: each robot is equipped with a light bulb that is visible to all other robots, and that can display a constant numbers of different colors; the colors are persistent, that is they are not automatically reset at the end of each cycle. We first study the relationship between ASYNC with visible bits and SSYNC. We prove hat asynchronous robots, when equipped with a constant number of colors, are strictly more powerful than traditional semisynchronous robots. We also show that, when enhanced with visible lights, the difference between asynchrony and semi-synchrony disappears; this result must be contrasted with the strict dominance ASYNC <;SSYNC between the models without lights. We then study the relationship between ASYNC with visible bits and FSYNC. We prove that asynchronous robots with a constant number of visible bits, if they can remember a single snapshot, are strictly more powerful than fully-synchronous robots. This is to be contrasted with the fact that, without lights, ASYNC robots are not even as powerful as SSYNC, even if they remember an unlimited number of previous snapshots. These results demonstrate the power of using visible external memory for distributed computation with autonomous robots. In particular, asynchrony can be overcome with the power of lights.
{"title":"The Power of Lights: Synchronizing Asynchronous Robots Using Visible Bits","authors":"S. Das, P. Flocchini, G. Prencipe, N. Santoro, M. Yamashita","doi":"10.1109/ICDCS.2012.71","DOIUrl":"https://doi.org/10.1109/ICDCS.2012.71","url":null,"abstract":"In this paper we study the power of using lights, i.e. visible external memory, for distributed computation by autonomous robots moving in LookCompute-Move (LCM) cycles. With respect to the LCM cycles, the most common models studied in the literature are the fully-synchronous (FSYNC), the semisynchronous (SSYNC), and the asynchronous (ASYNC). In this paper we introduce in the ASYNC model, the weakest of the three, the availability of visible external memory: each robot is equipped with a light bulb that is visible to all other robots, and that can display a constant numbers of different colors; the colors are persistent, that is they are not automatically reset at the end of each cycle. We first study the relationship between ASYNC with visible bits and SSYNC. We prove hat asynchronous robots, when equipped with a constant number of colors, are strictly more powerful than traditional semisynchronous robots. We also show that, when enhanced with visible lights, the difference between asynchrony and semi-synchrony disappears; this result must be contrasted with the strict dominance ASYNC <;SSYNC between the models without lights. We then study the relationship between ASYNC with visible bits and FSYNC. We prove that asynchronous robots with a constant number of visible bits, if they can remember a single snapshot, are strictly more powerful than fully-synchronous robots. This is to be contrasted with the fact that, without lights, ASYNC robots are not even as powerful as SSYNC, even if they remember an unlimited number of previous snapshots. These results demonstrate the power of using visible external memory for distributed computation with autonomous robots. In particular, asynchrony can be overcome with the power of lights.","PeriodicalId":6300,"journal":{"name":"2012 IEEE 32nd International Conference on Distributed Computing Systems","volume":"60 1","pages":"506-515"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79474743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pu and Singaravelu presented Fine-Grain Mixing, an adaptive compression system which aimed to maximize CPU and network utilization simultaneously by splitting a network stream into a mixture of compressed and uncompressed blocks. Blocks were compressed opportunistically in a send buffer, they compressed as many blocks as they could without becoming a bottleneck. They successfully utilized all available CPU and network bandwidth even on high speed connections. In addition, they noted much greater throughput than previous adaptive compression systems. Here, we take a different view of FG-Mixing than was taken by Pu and Singaravelu and give another explanation for its high performance: that fine-grain mixing of compressed and uncompressed blocks enables off-the-shelf compressors to scale down their degree of compression linearly with decreasing CPU usage. Exploring the scaling behavior in-depth allows us to make a variety of improvements to fine-grain mixed compression: better compression ratios for a given level of CPU consumption, a wider range of data reduction and CPU cost options, and parallelized compression to take advantage of multi-core CPUs. We make full compatibility with the ubiquitous deflate decompress or (as used in many network protocols directly, or as the back-end of the gzip and Zip formats) a primary goal, rather than using a special, incompatible protocol as in the original implementation of FG-Mixing. Moreover, we show that the benefits of fine-grain mixing are retained by our compatible version.
{"title":"Scaling Down Off-the-Shelf Data Compression: Backwards-Compatible Fine-Grain Mixing","authors":"Michael Gray, P. Peterson, P. Reiher","doi":"10.1109/ICDCS.2012.21","DOIUrl":"https://doi.org/10.1109/ICDCS.2012.21","url":null,"abstract":"Pu and Singaravelu presented Fine-Grain Mixing, an adaptive compression system which aimed to maximize CPU and network utilization simultaneously by splitting a network stream into a mixture of compressed and uncompressed blocks. Blocks were compressed opportunistically in a send buffer, they compressed as many blocks as they could without becoming a bottleneck. They successfully utilized all available CPU and network bandwidth even on high speed connections. In addition, they noted much greater throughput than previous adaptive compression systems. Here, we take a different view of FG-Mixing than was taken by Pu and Singaravelu and give another explanation for its high performance: that fine-grain mixing of compressed and uncompressed blocks enables off-the-shelf compressors to scale down their degree of compression linearly with decreasing CPU usage. Exploring the scaling behavior in-depth allows us to make a variety of improvements to fine-grain mixed compression: better compression ratios for a given level of CPU consumption, a wider range of data reduction and CPU cost options, and parallelized compression to take advantage of multi-core CPUs. We make full compatibility with the ubiquitous deflate decompress or (as used in many network protocols directly, or as the back-end of the gzip and Zip formats) a primary goal, rather than using a special, incompatible protocol as in the original implementation of FG-Mixing. Moreover, we show that the benefits of fine-grain mixing are retained by our compatible version.","PeriodicalId":6300,"journal":{"name":"2012 IEEE 32nd International Conference on Distributed Computing Systems","volume":"42 1","pages":"112-121"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75233881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a promising communication paradigm, Cognitive Radio Networks (CRNs) have paved a road for Secondary Users (SUs) to opportunistically exploit unused licensed spectrum without causing unacceptable interference to Primary Users (PUs). In this paper, we study the distributed data collection problem for asynchronous CRNs, which has not been addressed before. First, we study the Proper Carrier-sensing Range (PCR) for SUs. By working with this PCR, an SU can successfully conduct data transmission without disturbing the activities of PUs and other SUs. Subsequently, based on the PCR, we propose an Asynchronous Distributed Data Collection (ADDC) algorithm with fairness consideration for CRNs. ADDC collects data of a snapshot to the base station in a distributed manner without any time synchronization requirement. The algorithm is scalable and more practical compared with centralized and synchronized algorithms. Through comprehensive theoretical analysis, we show that ADDC is order-optimal in terms of delay and capacity, as long as an SU has a positive probability to access the spectrum. Finally, extensive simulation results indicate that ADDC can effectively finish a data collection task and significantly reduce data collection delay.
{"title":"Optimal Distributed Data Collection for Asynchronous Cognitive Radio Networks","authors":"Zhipeng Cai, S. Ji, Jing He, A. Bourgeois","doi":"10.1109/ICDCS.2012.29","DOIUrl":"https://doi.org/10.1109/ICDCS.2012.29","url":null,"abstract":"As a promising communication paradigm, Cognitive Radio Networks (CRNs) have paved a road for Secondary Users (SUs) to opportunistically exploit unused licensed spectrum without causing unacceptable interference to Primary Users (PUs). In this paper, we study the distributed data collection problem for asynchronous CRNs, which has not been addressed before. First, we study the Proper Carrier-sensing Range (PCR) for SUs. By working with this PCR, an SU can successfully conduct data transmission without disturbing the activities of PUs and other SUs. Subsequently, based on the PCR, we propose an Asynchronous Distributed Data Collection (ADDC) algorithm with fairness consideration for CRNs. ADDC collects data of a snapshot to the base station in a distributed manner without any time synchronization requirement. The algorithm is scalable and more practical compared with centralized and synchronized algorithms. Through comprehensive theoretical analysis, we show that ADDC is order-optimal in terms of delay and capacity, as long as an SU has a positive probability to access the spectrum. Finally, extensive simulation results indicate that ADDC can effectively finish a data collection task and significantly reduce data collection delay.","PeriodicalId":6300,"journal":{"name":"2012 IEEE 32nd International Conference on Distributed Computing Systems","volume":"57 1","pages":"245-254"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74336980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might treat the content match as a special kind of content search, and resort to the classic algorithm [5]. However, due to blind flooding, [5] cannot be simply adapted for scalable content match. To increase the throughput of scalable match, we propose an adaptive approach to allocate (i.e, replicate and partition) filters. The allocation is based on our observation on real datasets: most users prefer to use short queries, consisting of around 2-3 terms per query, and web content typically contains tens and even thousands of terms per article. Thus, by reducing the number of processed documents, we can reduce the latency of matching large articles with filters, and have chance to achieve higher throughput. We implement our approach on an open source project, Apache Cassandra. The experiment with real datasets shows that our approach can achieve around folds of better throughput than two counterpart state-of-the-arts solutions.
Web 2.0时代的特点是出现了大量的实时内容。实时和细粒度的内容过滤方法可以精确地让用户了解他们感兴趣的最新信息。该方法的关键是提供可伸缩的匹配算法。人们可能会将内容匹配视为一种特殊的内容搜索,并采用经典算法[5]。然而,由于盲目泛洪,[5]不能简单地适应可扩展的内容匹配。为了提高可扩展匹配的吞吐量,我们提出了一种自适应的方法来分配(即复制和分区)过滤器。分配是基于我们对真实数据集的观察:大多数用户更喜欢使用简短的查询,每个查询由大约2-3个术语组成,而web内容通常每篇文章包含数十甚至数千个术语。因此,通过减少处理文档的数量,我们可以减少与过滤器匹配大文章的延迟,并有机会实现更高的吞吐量。我们在一个开源项目Apache Cassandra上实现了我们的方法。对真实数据集的实验表明,我们的方法可以实现比两种最先进的解决方案高出约两倍的吞吐量。
{"title":"MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System","authors":"Weixiong Rao, Lei Chen, P. Hui, S. Tarkoma","doi":"10.1109/ICDCS.2012.32","DOIUrl":"https://doi.org/10.1109/ICDCS.2012.32","url":null,"abstract":"The Web 2.0 era is characterized by the emergence of a very large amount of live content. A real time and fine grained content filtering approach can precisely keep users up-to-date the information that they are interested. The key of the approach is to offer a scalable match algorithm. One might treat the content match as a special kind of content search, and resort to the classic algorithm [5]. However, due to blind flooding, [5] cannot be simply adapted for scalable content match. To increase the throughput of scalable match, we propose an adaptive approach to allocate (i.e, replicate and partition) filters. The allocation is based on our observation on real datasets: most users prefer to use short queries, consisting of around 2-3 terms per query, and web content typically contains tens and even thousands of terms per article. Thus, by reducing the number of processed documents, we can reduce the latency of matching large articles with filters, and have chance to achieve higher throughput. We implement our approach on an open source project, Apache Cassandra. The experiment with real datasets shows that our approach can achieve around folds of better throughput than two counterpart state-of-the-arts solutions.","PeriodicalId":6300,"journal":{"name":"2012 IEEE 32nd International Conference on Distributed Computing Systems","volume":"24 1","pages":"445-454"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73216578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cipher text-Policy Attribute-base Encryption (CP-ABE) is regarded as one of the most suitable technologies for data access control in cloud storage. In almost all existing CP-ABE schemes, it is assumed that there is only one authority in the system responsible for issuing attributes to the users. However, in many applications, there are multiple authorities co-exist in a system and each authority is able to issue attributes independently. In this paper, we design an access control framework for multi-authority systems and propose an efficient and secure multi-authority access control scheme for cloud storage. We first design an efficient multi-authority CP-ABE scheme that does not require a global authority and can support any LSSS access structure. Then, we prove its security in the random oracle model. We also propose a new technique to solve the attribute revocation problem in multi-authority CP-ABE systems. The analysis and simulation results show that our multi-authority access control scheme is scalable and efficient.
{"title":"Attributed-Based Access Control for Multi-authority Systems in Cloud Storage","authors":"Kan Yang, X. Jia","doi":"10.1109/ICDCS.2012.42","DOIUrl":"https://doi.org/10.1109/ICDCS.2012.42","url":null,"abstract":"Cipher text-Policy Attribute-base Encryption (CP-ABE) is regarded as one of the most suitable technologies for data access control in cloud storage. In almost all existing CP-ABE schemes, it is assumed that there is only one authority in the system responsible for issuing attributes to the users. However, in many applications, there are multiple authorities co-exist in a system and each authority is able to issue attributes independently. In this paper, we design an access control framework for multi-authority systems and propose an efficient and secure multi-authority access control scheme for cloud storage. We first design an efficient multi-authority CP-ABE scheme that does not require a global authority and can support any LSSS access structure. Then, we prove its security in the random oracle model. We also propose a new technique to solve the attribute revocation problem in multi-authority CP-ABE systems. The analysis and simulation results show that our multi-authority access control scheme is scalable and efficient.","PeriodicalId":6300,"journal":{"name":"2012 IEEE 32nd International Conference on Distributed Computing Systems","volume":"26 1","pages":"536-545"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73428840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Total energy minimization in data centers (including both computing and cooling energy) requires modeling the interactions between computing decisions (such as load distribution) and heat transfer in the room, since load acts as heat sources whose distribution in space affects cooling energy. This paper presents the first closed-form analytic optimal solution for load distribution in a machine rack that minimizes the sum of computing and cooling energy. We show that by considering actuation knobs on both computing and cooling sides, it is possible to reduce energy cost comparing to state of the art solutions that do not offer holistic energy optimization. The above can be achieved while meeting both throughput requirements and maximum CPU temperature constraints. Using a thorough evaluation on a real test bed of 20 machines, we demonstrate that our simple model adequately captures the thermal behavior and energy consumption of the system. We further show that our approach saves more energy compared to the state of the art in the field.
{"title":"Joint Optimization of Computing and Cooling Energy: Analytic Model and a Machine Room Case Study","authors":"Shen Li, H. Le, N. Pham, Jin Heo, T. Abdelzaher","doi":"10.1109/ICDCS.2012.64","DOIUrl":"https://doi.org/10.1109/ICDCS.2012.64","url":null,"abstract":"Total energy minimization in data centers (including both computing and cooling energy) requires modeling the interactions between computing decisions (such as load distribution) and heat transfer in the room, since load acts as heat sources whose distribution in space affects cooling energy. This paper presents the first closed-form analytic optimal solution for load distribution in a machine rack that minimizes the sum of computing and cooling energy. We show that by considering actuation knobs on both computing and cooling sides, it is possible to reduce energy cost comparing to state of the art solutions that do not offer holistic energy optimization. The above can be achieved while meeting both throughput requirements and maximum CPU temperature constraints. Using a thorough evaluation on a real test bed of 20 machines, we demonstrate that our simple model adequately captures the thermal behavior and energy consumption of the system. We further show that our approach saves more energy compared to the state of the art in the field.","PeriodicalId":6300,"journal":{"name":"2012 IEEE 32nd International Conference on Distributed Computing Systems","volume":"89 1","pages":"396-405"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75197150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James Elliott, Kishor Kharbas, David Fiala, F. Mueller, Kurt B. Ferreira, C. Engelmann
Today's largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years. But reliability is becoming one of the major challenges faced by exascale computing. With billion-core parallelism, the mean time to failure is projected to be in the range of minutes or hours instead of days. Failures are becoming the norm rather than the exception during execution of HPC applications. Current fault tolerance techniques in HPC focus on reactive ways to mitigate faults, namely via checkpoint and restart (C/R). Apart from storage overheads, C/R-based fault recovery comes at an additional cost in terms of application performance because normal execution is disrupted when checkpoints are taken. Studies have shown that applications running at a large scale spend more than 50% of their total time saving checkpoints, restarting and redoing lost work. Redundancy is another fault tolerance technique, which employs redundant processes performing the same task. If a process fails, a replica of it can take over its execution. Thus, redundant copies can decrease the overall failure rate. The downside of redundancy is that extra resources are required and there is an additional overhead on communication and synchronization. This work contributes a model and analyzes the benefit of C/R in coordination with redundancy at different degrees to minimize the total wallclock time and resources utilization of HPC applications. We further conduct experiments with an implementation of redundancy within the MPI layer on a cluster. Our experimental results confirm the benefit of dual and triple redundancy - but not for partial redundancy - and show a close fit to the model. At ≈ 80, 000 processes, dual redundancy requires twice the number of processing resources for an application but allows two jobs of 128 hours wallclock time to finish within the time of just one job without redundancy. For narrow ranges of processor counts, partial redundancy results in the lowest time. Once the count exceeds ≈ 770, 000, triple redundancy has the lowest overall cost. Thus, redundancy allows one to trade-off additional resource requirements against wallclock time, which provides a tuning knob for users to adapt to resource availabilities.
{"title":"Combining Partial Redundancy and Checkpointing for HPC","authors":"James Elliott, Kishor Kharbas, David Fiala, F. Mueller, Kurt B. Ferreira, C. Engelmann","doi":"10.1109/ICDCS.2012.56","DOIUrl":"https://doi.org/10.1109/ICDCS.2012.56","url":null,"abstract":"Today's largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years. But reliability is becoming one of the major challenges faced by exascale computing. With billion-core parallelism, the mean time to failure is projected to be in the range of minutes or hours instead of days. Failures are becoming the norm rather than the exception during execution of HPC applications. Current fault tolerance techniques in HPC focus on reactive ways to mitigate faults, namely via checkpoint and restart (C/R). Apart from storage overheads, C/R-based fault recovery comes at an additional cost in terms of application performance because normal execution is disrupted when checkpoints are taken. Studies have shown that applications running at a large scale spend more than 50% of their total time saving checkpoints, restarting and redoing lost work. Redundancy is another fault tolerance technique, which employs redundant processes performing the same task. If a process fails, a replica of it can take over its execution. Thus, redundant copies can decrease the overall failure rate. The downside of redundancy is that extra resources are required and there is an additional overhead on communication and synchronization. This work contributes a model and analyzes the benefit of C/R in coordination with redundancy at different degrees to minimize the total wallclock time and resources utilization of HPC applications. We further conduct experiments with an implementation of redundancy within the MPI layer on a cluster. Our experimental results confirm the benefit of dual and triple redundancy - but not for partial redundancy - and show a close fit to the model. At ≈ 80, 000 processes, dual redundancy requires twice the number of processing resources for an application but allows two jobs of 128 hours wallclock time to finish within the time of just one job without redundancy. For narrow ranges of processor counts, partial redundancy results in the lowest time. Once the count exceeds ≈ 770, 000, triple redundancy has the lowest overall cost. Thus, redundancy allows one to trade-off additional resource requirements against wallclock time, which provides a tuning knob for users to adapt to resource availabilities.","PeriodicalId":6300,"journal":{"name":"2012 IEEE 32nd International Conference on Distributed Computing Systems","volume":"26 1","pages":"615-626"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77704392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Total ordering is a messaging guarantee increasingly required of content-based pub/sub systems, which are traditionally focused on performance. The main challenge is the uniform ordering of streams of publications from multiple publishers within an overlay broker network to be delivered to multiple subscribers. Our solution integrates total ordering into the pub/sub logic instead of offloading it as an external service. We show that our solution is fully distributed and relies only on local broker knowledge and overlay links. We can identify and isolate specific publications and subscribers where synchronization is required: the overhead is therefore contained to the affected subscribers. Our solution remains safe under the presence of failure, where we show total order to be impossible to maintain. Our experiments demonstrate that our solution scales with the number of subscriptions and has limited overhead for the non-conflicting cases. A holistic comparison with group communication systems is offered to evaluate their relative scalability.
{"title":"Total Order in Content-Based Publish/Subscribe Systems","authors":"Kaiwen Zhang, Vinod Muthusamy, H. Jacobsen","doi":"10.1109/ICDCS.2012.17","DOIUrl":"https://doi.org/10.1109/ICDCS.2012.17","url":null,"abstract":"Total ordering is a messaging guarantee increasingly required of content-based pub/sub systems, which are traditionally focused on performance. The main challenge is the uniform ordering of streams of publications from multiple publishers within an overlay broker network to be delivered to multiple subscribers. Our solution integrates total ordering into the pub/sub logic instead of offloading it as an external service. We show that our solution is fully distributed and relies only on local broker knowledge and overlay links. We can identify and isolate specific publications and subscribers where synchronization is required: the overhead is therefore contained to the affected subscribers. Our solution remains safe under the presence of failure, where we show total order to be impossible to maintain. Our experiments demonstrate that our solution scales with the number of subscriptions and has limited overhead for the non-conflicting cases. A holistic comparison with group communication systems is offered to evaluate their relative scalability.","PeriodicalId":6300,"journal":{"name":"2012 IEEE 32nd International Conference on Distributed Computing Systems","volume":"66 1","pages":"335-344"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81621517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Decreasing the soaring energy cost is imperative in large data centers. Meanwhile, limited computational resources need to be fairly allocated among different organizations. Latency is another major concern for resource management. Nevertheless, energy cost, resource allocation fairness, and latency are important but often contradicting metrics on scheduling data center workloads. In this paper, we explore the benefit of electricity price variations across time and locations. We study the problem of scheduling batch jobs, which originate from multiple organizations/users and are scheduled to multiple geographically-distributed data centers. We propose a provably-efficient online scheduling algorithm -- Gre Far -- which optimizes the energy cost and fairness among different organizations subject to queueing delay constraints. Gre Far does not require any statistical information of workload arrivals or electricity prices. We prove that it can minimize the cost (in terms of an affine combination of energy cost and weighted fairness) arbitrarily close to that of the optimal offline algorithm with future information. Moreover, by appropriately setting the control parameters, Gre Far achieves a desirable tradeoff among energy cost, fairness and latency.
{"title":"Provably-Efficient Job Scheduling for Energy and Fairness in Geographically Distributed Data Centers","authors":"Shaolei Ren, Yuxiong He, Fei Xu","doi":"10.1109/ICDCS.2012.77","DOIUrl":"https://doi.org/10.1109/ICDCS.2012.77","url":null,"abstract":"Decreasing the soaring energy cost is imperative in large data centers. Meanwhile, limited computational resources need to be fairly allocated among different organizations. Latency is another major concern for resource management. Nevertheless, energy cost, resource allocation fairness, and latency are important but often contradicting metrics on scheduling data center workloads. In this paper, we explore the benefit of electricity price variations across time and locations. We study the problem of scheduling batch jobs, which originate from multiple organizations/users and are scheduled to multiple geographically-distributed data centers. We propose a provably-efficient online scheduling algorithm -- Gre Far -- which optimizes the energy cost and fairness among different organizations subject to queueing delay constraints. Gre Far does not require any statistical information of workload arrivals or electricity prices. We prove that it can minimize the cost (in terms of an affine combination of energy cost and weighted fairness) arbitrarily close to that of the optimal offline algorithm with future information. Moreover, by appropriately setting the control parameters, Gre Far achieves a desirable tradeoff among energy cost, fairness and latency.","PeriodicalId":6300,"journal":{"name":"2012 IEEE 32nd International Conference on Distributed Computing Systems","volume":"55 1","pages":"22-31"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91259939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}