Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097874
Wagner M. Nunan Zola, L. C. E. Bona, Fabiano Silva
The Barnes-Hut algorithm is a widely used approximation method for the N-Body simulation problem. The irregular nature of this tree walking code presents interesting challenges for its computation on parallel systems. Additional problems arise in effectively exploiting the processing capacity of GPU architectures. We propose and investigate the applicability of software Simulated Wide-Warps (SWW) in this context. To this extent, we explicitly deal with dynamic irregular patterns in data accesses with data remapping and data transformation, by controlling execution flow divergence of threads. We present a new compact data-structure for the tree layout, GPU parallel algorithms for tree transformation and parallel walking using SWW. Benefits of our techniques are in transposing the tree algorithm to execute regular patterns to match the GPU model. Our experiments show significant performance improvement over the best known GPU solutions to this algorithm.
{"title":"Fast GPU parallel N-Body tree traversal with Simulated Wide-Warp","authors":"Wagner M. Nunan Zola, L. C. E. Bona, Fabiano Silva","doi":"10.1109/PADSW.2014.7097874","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097874","url":null,"abstract":"The Barnes-Hut algorithm is a widely used approximation method for the N-Body simulation problem. The irregular nature of this tree walking code presents interesting challenges for its computation on parallel systems. Additional problems arise in effectively exploiting the processing capacity of GPU architectures. We propose and investigate the applicability of software Simulated Wide-Warps (SWW) in this context. To this extent, we explicitly deal with dynamic irregular patterns in data accesses with data remapping and data transformation, by controlling execution flow divergence of threads. We present a new compact data-structure for the tree layout, GPU parallel algorithms for tree transformation and parallel walking using SWW. Benefits of our techniques are in transposing the tree algorithm to execute regular patterns to match the GPU model. Our experiments show significant performance improvement over the best known GPU solutions to this algorithm.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115177573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097839
Nobuhiro Saito, Myungryun Yoo, T. Yokoyama
The paper presents a method to build a distributed real-time operating system for distributed embedded control systems using aspect-oriented programming. We define aspects to weave distributed computing mechanisms to an existing real-time operating system. By using the aspects, we can build a distributed operating system without modifying the original source code. This improves the maintainability of the source code of a real-time operating system family. We have applied the aspects to an OSEK OS and have got a distributed operating system that provides location-transparent system calls for task management and inter-task synchronization. The evaluation results show that the overhead of aspect-oriented programming is practically small enough.
{"title":"A distributed real-time operating system built with aspect-oriented programming for distributed embedded control systems","authors":"Nobuhiro Saito, Myungryun Yoo, T. Yokoyama","doi":"10.1109/PADSW.2014.7097839","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097839","url":null,"abstract":"The paper presents a method to build a distributed real-time operating system for distributed embedded control systems using aspect-oriented programming. We define aspects to weave distributed computing mechanisms to an existing real-time operating system. By using the aspects, we can build a distributed operating system without modifying the original source code. This improves the maintainability of the source code of a real-time operating system family. We have applied the aspects to an OSEK OS and have got a distributed operating system that provides location-transparent system calls for task management and inter-task synchronization. The evaluation results show that the overhead of aspect-oriented programming is practically small enough.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130879558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097902
Eduardo Cerritos, F. Lin
Uncertainty is a key factor that prevents a commuter from using public transportation system. More and more transportation agencies are incorporating real-time Trip Planners to empower commuters with opportune information. However, such systems require continuous status updates from the vehicles and involves expensive communication cost. In this paper we propose an architecture that takes advantage of Machine-to-Machine Communication concepts and provides a degree of intelligence to the vehicles, to alleviate unnecessary communication between the vehicles and the Trip Planner.
{"title":"M2M-enabled real-time Trip Planner","authors":"Eduardo Cerritos, F. Lin","doi":"10.1109/PADSW.2014.7097902","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097902","url":null,"abstract":"Uncertainty is a key factor that prevents a commuter from using public transportation system. More and more transportation agencies are incorporating real-time Trip Planners to empower commuters with opportune information. However, such systems require continuous status updates from the vehicles and involves expensive communication cost. In this paper we propose an architecture that takes advantage of Machine-to-Machine Communication concepts and provides a degree of intelligence to the vehicles, to alleviate unnecessary communication between the vehicles and the Trip Planner.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"228 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122461149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097816
Ruiqing Chi, Zhuzhong Qian, Sanglu Lu
With the rapid development of virtualization techniques, modern data centers move into a new era of cloud in recent years. Despite numerous advantages such as high resource utilization and rapid service scalability, current virtualization techniques don't guarantee perfect performance isolation among virtual machines sharing the physical machine, which may lead to unstable and unpredictable user-perceived application performance in clouds. Therefore, understanding and modeling performance interference among collocated applications is of utmost importance. However, the hypervisor and guest OSes usually run independent resource schedulers and are invisible into each other, thereby making accurately characterizing performance interference a non-trivial work. In this paper, we first present a comprehensive experimental study on performance interference of different combinations of benchmarks, observing that virtual CPU floating overhead between multiple physical CPUs, and VMEXITs, i.e., the control transitions between the hypervisor and VMs, constitute the key source of performance interference. In order to characterize the performance interference effects, we measure both the application-level and VM-level characteristics from the collocated applications and then build a novel interference prediction framework based on kernel canonical correlation analysis. Our evaluations first show the practicability of KCCA in finding reliable correlation, and further confirm the high accuracy and great applicability of our interference model with a low prediction error of no more than 7.9%.
{"title":"Be a good neighbour: Characterizing performance interference of virtual machines under xen virtualization environments","authors":"Ruiqing Chi, Zhuzhong Qian, Sanglu Lu","doi":"10.1109/PADSW.2014.7097816","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097816","url":null,"abstract":"With the rapid development of virtualization techniques, modern data centers move into a new era of cloud in recent years. Despite numerous advantages such as high resource utilization and rapid service scalability, current virtualization techniques don't guarantee perfect performance isolation among virtual machines sharing the physical machine, which may lead to unstable and unpredictable user-perceived application performance in clouds. Therefore, understanding and modeling performance interference among collocated applications is of utmost importance. However, the hypervisor and guest OSes usually run independent resource schedulers and are invisible into each other, thereby making accurately characterizing performance interference a non-trivial work. In this paper, we first present a comprehensive experimental study on performance interference of different combinations of benchmarks, observing that virtual CPU floating overhead between multiple physical CPUs, and VMEXITs, i.e., the control transitions between the hypervisor and VMs, constitute the key source of performance interference. In order to characterize the performance interference effects, we measure both the application-level and VM-level characteristics from the collocated applications and then build a novel interference prediction framework based on kernel canonical correlation analysis. Our evaluations first show the practicability of KCCA in finding reliable correlation, and further confirm the high accuracy and great applicability of our interference model with a low prediction error of no more than 7.9%.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117175785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097844
N. Dasari, D. Ranjan, M. Zubair
Maximal clique enumeration (MCE) is a fundamental problem in graph theory. It plays a vital role in many network analysis applications and in computational biology. MCE is an extensively studied problem. Recently, Eppstein et al. proposed a state-of-the-art sequential algorithm that uses degeneracy based ordering of vertices to improve the efficiency. In this paper, we propose a new parallel implementation of the algorithm of Eppstein et al. using a new bit-based data structure. The new data structure not only reduces the working set size significantly but also by enabling the use of bit-parallelism improves the performance of the algorithm. We illustrate the significance of degeneracy ordering in load balancing and experimentally evaluate the impact of scheduling on the performance of the algorithm. We present experimental results on several types of synthetic and real-world graphs with up to 50 million vertices and 100 million edges. We show that our approach outperforms Eppstein et al.'s approach by up to 4 times and also scales up to 29 times when run on a multicore machine with 32 cores.
{"title":"pbitMCE: A bit-based approach for maximal clique enumeration on multicore processors","authors":"N. Dasari, D. Ranjan, M. Zubair","doi":"10.1109/PADSW.2014.7097844","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097844","url":null,"abstract":"Maximal clique enumeration (MCE) is a fundamental problem in graph theory. It plays a vital role in many network analysis applications and in computational biology. MCE is an extensively studied problem. Recently, Eppstein et al. proposed a state-of-the-art sequential algorithm that uses degeneracy based ordering of vertices to improve the efficiency. In this paper, we propose a new parallel implementation of the algorithm of Eppstein et al. using a new bit-based data structure. The new data structure not only reduces the working set size significantly but also by enabling the use of bit-parallelism improves the performance of the algorithm. We illustrate the significance of degeneracy ordering in load balancing and experimentally evaluate the impact of scheduling on the performance of the algorithm. We present experimental results on several types of synthetic and real-world graphs with up to 50 million vertices and 100 million edges. We show that our approach outperforms Eppstein et al.'s approach by up to 4 times and also scales up to 29 times when run on a multicore machine with 32 cores.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125288849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097814
Yuda Wang, Renyu Yang, Tianyu Wo, Wenbo Jiang, Chunming Hu
Virtualization is one of the most fascinating techniques because it can facilitate the infrastructure management and provide isolated execution for running workloads. Despite the benefits gained from virtualization and resource sharing, improved resource utilization is still far from settled due to the dynamic resource requirements and the widely-used over-provision strategy for guaranteed QoS. Additionally, with the emerging demands for big data analytic, how to effectively manage hybrid workloads such as traditional batch task and long-running virtual machine (VM) service needs to be dealt with. In this paper, we propose a system to combine long-running VM service with typical batch workload like MapReduce. The objectives are to improve the holistic cluster utilization through dynamic resource adjustment mechanism for VM without violating other batch workload executions. Furthermore, VM migration is utilized to ensure high availability and avoid potential performance degradation. The experimental results reveal that the dynamically allocated memory is close to the real usage with only 10% estimation margin, and the performance impact on VM and MapReduce jobs are both within 1%. Additionally, at most 50% increment of resource utilization could be achieved. We believe that these findings are in the right direction to solving workload consolidation issues in hybrid computing environments.
{"title":"Improving utilization through dynamic VM resource allocation in hybrid cloud environment","authors":"Yuda Wang, Renyu Yang, Tianyu Wo, Wenbo Jiang, Chunming Hu","doi":"10.1109/PADSW.2014.7097814","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097814","url":null,"abstract":"Virtualization is one of the most fascinating techniques because it can facilitate the infrastructure management and provide isolated execution for running workloads. Despite the benefits gained from virtualization and resource sharing, improved resource utilization is still far from settled due to the dynamic resource requirements and the widely-used over-provision strategy for guaranteed QoS. Additionally, with the emerging demands for big data analytic, how to effectively manage hybrid workloads such as traditional batch task and long-running virtual machine (VM) service needs to be dealt with. In this paper, we propose a system to combine long-running VM service with typical batch workload like MapReduce. The objectives are to improve the holistic cluster utilization through dynamic resource adjustment mechanism for VM without violating other batch workload executions. Furthermore, VM migration is utilized to ensure high availability and avoid potential performance degradation. The experimental results reveal that the dynamically allocated memory is close to the real usage with only 10% estimation margin, and the performance impact on VM and MapReduce jobs are both within 1%. Additionally, at most 50% increment of resource utilization could be achieved. We believe that these findings are in the right direction to solving workload consolidation issues in hybrid computing environments.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116904691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097797
L. Gan, H. Fu, Wei Xue, Yangtong Xu, Chao Yang, Xinliang Wang, Zihong Lv, Yang You, Guangwen Yang, Kaijian Ou
Stencils are among the most important and time-consuming kernels in many applications. While stencil optimization has been a well-studied topic on CPU platforms, achieving higher performance and efficiency for the evolving numerical stencils on the more recent multi-core and many-core architectures is still an important issue. In this paper, we explore a number of different stencils, ranging from a basic 7-point Jacobi stencil to more complex high-order stencils used in finer numerical simulations. By optimizing and analyzing those stencils on the latest multi-core and many-core architectures (the Intel Sandy Bridge processor, the Intel Xeon Phi coprocessor, and the NVIDIA Fermi C2070 and Kepler K20x GPUs), we investigate the algorithmic and architectural factors that determine the performance and efficiency of the resulting designs. While multi-threading, vectorization, and optimization on cache and other fast buffers are still the most important techniques that provide performance, we observe that the different memory hierarchy and the different mechanism for issuing and executing parallel instructions lead to the different performance behaviors on CPU, MIC and GPU. With vector-like processing units becoming the major provider of computing power on almost all architectures, the compiler's inability to align all the computing and memory operations would become the major bottleneck from getting a high efficiency on current and future platforms. Our specific optimization of the complex WNAD stencil on GPU provides a good example of what the compiler could do to help.
{"title":"Scaling and analyzing the stencil performance on multi-core and many-core architectures","authors":"L. Gan, H. Fu, Wei Xue, Yangtong Xu, Chao Yang, Xinliang Wang, Zihong Lv, Yang You, Guangwen Yang, Kaijian Ou","doi":"10.1109/PADSW.2014.7097797","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097797","url":null,"abstract":"Stencils are among the most important and time-consuming kernels in many applications. While stencil optimization has been a well-studied topic on CPU platforms, achieving higher performance and efficiency for the evolving numerical stencils on the more recent multi-core and many-core architectures is still an important issue. In this paper, we explore a number of different stencils, ranging from a basic 7-point Jacobi stencil to more complex high-order stencils used in finer numerical simulations. By optimizing and analyzing those stencils on the latest multi-core and many-core architectures (the Intel Sandy Bridge processor, the Intel Xeon Phi coprocessor, and the NVIDIA Fermi C2070 and Kepler K20x GPUs), we investigate the algorithmic and architectural factors that determine the performance and efficiency of the resulting designs. While multi-threading, vectorization, and optimization on cache and other fast buffers are still the most important techniques that provide performance, we observe that the different memory hierarchy and the different mechanism for issuing and executing parallel instructions lead to the different performance behaviors on CPU, MIC and GPU. With vector-like processing units becoming the major provider of computing power on almost all architectures, the compiler's inability to align all the computing and memory operations would become the major bottleneck from getting a high efficiency on current and future platforms. Our specific optimization of the complex WNAD stencil on GPU provides a good example of what the compiler could do to help.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"596 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134542936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097861
I. Kim, Jidong Zhai, Yan Li, Wenguang Chen
Image resizing is increasingly important for picture sharing and exchanging between various personal electronic equipments. Seam Carving is a state-of-the-art approach for effective image resizing because of its content-aware characteristic. However, complex computation and memory access patterns make it time-consuming and prevent its wide usage in real-time image processing. To address these problems, we propose a novel algorithm, called Non-Cumulative Seam Carving (NCSC), which removes main computation bottleneck. Furthermore, we also propose an adaptive multi-seam algorithm for better parallelism on GPU platforms. Finally, we implement our algorithm on a multi-GPU platform. Results show that our approach achieves a maximum 140× speedup on a two-GPU system over the sequential version. It only takes 0.11 second to resize a 1024×640 image by half in width compared to 15.5 seconds with the traditional seam carving.
{"title":"Optimizing Seam Carving on multi-GPU systems for real-time image resizing","authors":"I. Kim, Jidong Zhai, Yan Li, Wenguang Chen","doi":"10.1109/PADSW.2014.7097861","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097861","url":null,"abstract":"Image resizing is increasingly important for picture sharing and exchanging between various personal electronic equipments. Seam Carving is a state-of-the-art approach for effective image resizing because of its content-aware characteristic. However, complex computation and memory access patterns make it time-consuming and prevent its wide usage in real-time image processing. To address these problems, we propose a novel algorithm, called Non-Cumulative Seam Carving (NCSC), which removes main computation bottleneck. Furthermore, we also propose an adaptive multi-seam algorithm for better parallelism on GPU platforms. Finally, we implement our algorithm on a multi-GPU platform. Results show that our approach achieves a maximum 140× speedup on a two-GPU system over the sequential version. It only takes 0.11 second to resize a 1024×640 image by half in width compared to 15.5 seconds with the traditional seam carving.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"357 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132884077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097872
Y. Liu, Xiaxi Li, Vladimir Vlassov
Nowadays, more and more IT companies are expanding their businesses and services to a global scale, serving users in several countries. Globally distributed storage systems are employed to reduce data access latency for clients all over the world. We present GlobLease, an elastic, globally-distributed and consistent key-value store. It is organised as multiple distributed hash tables (DHTs) storing replicated data and namespace. Across DHTs, data lookups and accesses are processed with respect to the locality of DHT deployments. We explore the use of leases in GlobLease to maintain data consistency across DHTs. The leases enable GlobLease to provide fast and consistent read access in a global scale with reduced global communications. The write accesses are optimized by migrating the master copy to the locations, where most of the writes take place. The elasticity of GlobLease is provided in a fine-grained manner in order to precisely and efficiently handle spiky and skewed read workloads. In our evaluation, GlobLease has demonstrated its optimized global performance, in comparison with Cassandra, with read and write latency less than 10 ms in most of the cases. Furthermore, our evaluation shows that GlobLease is able to bring down the request latency under an instant 4.5 times workload increase with skewed key distribution (a zipfian distribution with an exponent factor of 4) in less than 20 seconds.
{"title":"GlobLease: A globally consistent and elastic storage system using leases","authors":"Y. Liu, Xiaxi Li, Vladimir Vlassov","doi":"10.1109/PADSW.2014.7097872","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097872","url":null,"abstract":"Nowadays, more and more IT companies are expanding their businesses and services to a global scale, serving users in several countries. Globally distributed storage systems are employed to reduce data access latency for clients all over the world. We present GlobLease, an elastic, globally-distributed and consistent key-value store. It is organised as multiple distributed hash tables (DHTs) storing replicated data and namespace. Across DHTs, data lookups and accesses are processed with respect to the locality of DHT deployments. We explore the use of leases in GlobLease to maintain data consistency across DHTs. The leases enable GlobLease to provide fast and consistent read access in a global scale with reduced global communications. The write accesses are optimized by migrating the master copy to the locations, where most of the writes take place. The elasticity of GlobLease is provided in a fine-grained manner in order to precisely and efficiently handle spiky and skewed read workloads. In our evaluation, GlobLease has demonstrated its optimized global performance, in comparison with Cassandra, with read and write latency less than 10 ms in most of the cases. Furthermore, our evaluation shows that GlobLease is able to bring down the request latency under an instant 4.5 times workload increase with skewed key distribution (a zipfian distribution with an exponent factor of 4) in less than 20 seconds.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"79 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134260216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/PADSW.2014.7097834
S. Rehman, M. A. Khan, T. Zia
Modeling wireless transmission in stringent networks such as VANETs is a challenging task. This requires mathematically incorporating all the environmental effects present within such a dynamics atmosphere. The key attributes to model the wireless channel are physical constraints inherent to such networks such as lack of permanent infrastructure, limited knowledge in relation to the position of vehicles as well as interference that effects the strength of receive signal at each position of vehicles. The selection of an appropriate transmission model plays a key role in the routing decisions for VANET. This paper investigates such wireless transmission models for vehicular communication. It identifies the situations where a particular model can be beneficial. The paper also provides an insight into the use of practical parameters in theoretical transmission models. An analysis of the proposed transmission model is presented. The performance of different transmission models in terms of receive signal strength (RSS) is also presented. These results help to select a transmission model that suits best to a particular VANET communication scenario.
{"title":"Wireless transmission modeling for Vehicular Ad-hoc Networks","authors":"S. Rehman, M. A. Khan, T. Zia","doi":"10.1109/PADSW.2014.7097834","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097834","url":null,"abstract":"Modeling wireless transmission in stringent networks such as VANETs is a challenging task. This requires mathematically incorporating all the environmental effects present within such a dynamics atmosphere. The key attributes to model the wireless channel are physical constraints inherent to such networks such as lack of permanent infrastructure, limited knowledge in relation to the position of vehicles as well as interference that effects the strength of receive signal at each position of vehicles. The selection of an appropriate transmission model plays a key role in the routing decisions for VANET. This paper investigates such wireless transmission models for vehicular communication. It identifies the situations where a particular model can be beneficial. The paper also provides an insight into the use of practical parameters in theoretical transmission models. An analysis of the proposed transmission model is presented. The performance of different transmission models in terms of receive signal strength (RSS) is also presented. These results help to select a transmission model that suits best to a particular VANET communication scenario.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124767120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}