Many real-world problems can be represented as graphs and solved by graph traversal algorithms. Single-Source Shortest Path (SSSP) is a fundamental graph algorithm. Today, large-scale graphs involve millions or even billions of vertices, making efficient parallel graph processing challenging. In this paper, we propose a single-FPGA based design to accelerate SSSP for massive graphs. We adopt the well-known Bellman-Ford algorithm. In the proposed design, graph is stored in external memory, which is more realistic for processing large scale graphs. Using the available external memory bandwidth, our design achieves the maximum data parallelism to concurrently process multiple edges in each clock cycle, regardless of data dependencies. The performance of our design is independent of the graph structure as well. We propose a optimized data layout to enable efficient utilization of external memory bandwidth. We prototype our design using a state-of-the-art FPGA. Experimental results show that our design is capable of processing 1.6 billion edges per second (GTEPS) using a single FPGA, while simultaneously achieving high clock rate of over 200 MHz. This would place us in the 131st position of the Graph 500 benchmark list of supercomputing systems for data intensive applications. Our solution therefore provides comparable performance to state-of-the-art systems.
{"title":"Accelerating Large-Scale Single-Source Shortest Path on FPGA","authors":"Shijie Zhou, C. Chelmis, V. Prasanna","doi":"10.1109/IPDPSW.2015.130","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.130","url":null,"abstract":"Many real-world problems can be represented as graphs and solved by graph traversal algorithms. Single-Source Shortest Path (SSSP) is a fundamental graph algorithm. Today, large-scale graphs involve millions or even billions of vertices, making efficient parallel graph processing challenging. In this paper, we propose a single-FPGA based design to accelerate SSSP for massive graphs. We adopt the well-known Bellman-Ford algorithm. In the proposed design, graph is stored in external memory, which is more realistic for processing large scale graphs. Using the available external memory bandwidth, our design achieves the maximum data parallelism to concurrently process multiple edges in each clock cycle, regardless of data dependencies. The performance of our design is independent of the graph structure as well. We propose a optimized data layout to enable efficient utilization of external memory bandwidth. We prototype our design using a state-of-the-art FPGA. Experimental results show that our design is capable of processing 1.6 billion edges per second (GTEPS) using a single FPGA, while simultaneously achieving high clock rate of over 200 MHz. This would place us in the 131st position of the Graph 500 benchmark list of supercomputing systems for data intensive applications. Our solution therefore provides comparable performance to state-of-the-art systems.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"54 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113956531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This section includes the articles presented at the 18th International Workshop on Nature Inspired Distributed Computing (NIDISC 2015) held in conjunction with the 29th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2015), May 25-29 2015, Hyderabad, India. The NIDISC workshop is an opportunity for researchers to explore the connections between biology, nature-inspired techniques, metaheuristics and the development of solutions to problems that arise in parallel and distributed processing, communications and other application areas.
{"title":"NIDISC Introduction and Committees","authors":"P. Bouvry, F. Seredyński, E. Talbi","doi":"10.1109/IPDPSW.2014.211","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.211","url":null,"abstract":"This section includes the articles presented at the 18th International Workshop on Nature Inspired Distributed Computing (NIDISC 2015) held in conjunction with the 29th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2015), May 25-29 2015, Hyderabad, India. The NIDISC workshop is an opportunity for researchers to explore the connections between biology, nature-inspired techniques, metaheuristics and the development of solutions to problems that arise in parallel and distributed processing, communications and other application areas.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125642672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolas Braud-Santoni, S. Dubois, Mohamed-Hamza Kaaouachi, F. Petit
We address highly dynamic distributed systems modelled by time-varying graphs (TVGs). We are interested in proof of impossibility results that often use informal arguments about convergence. First, we provide a topological distance metric over sets of TVGs to correctly define the convergence of TVG sequences in such sets. Next, we provide a general framework that formally proves the convergence of the sequence of executions of any deterministic algorithm over TVGs of any convergent sequence of TVGs. Finally, we illustrate the relevance of the above result by proving that no deterministic algorithm exists to compute the underlying graph of any connected-over-time TVG, i.e. Any TVG of the weakest class of long-lived TVGs.
{"title":"A Generic Framework for Impossibility Results in Time-Varying Graphs","authors":"Nicolas Braud-Santoni, S. Dubois, Mohamed-Hamza Kaaouachi, F. Petit","doi":"10.1109/IPDPSW.2015.59","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.59","url":null,"abstract":"We address highly dynamic distributed systems modelled by time-varying graphs (TVGs). We are interested in proof of impossibility results that often use informal arguments about convergence. First, we provide a topological distance metric over sets of TVGs to correctly define the convergence of TVG sequences in such sets. Next, we provide a general framework that formally proves the convergence of the sequence of executions of any deterministic algorithm over TVGs of any convergent sequence of TVGs. Finally, we illustrate the relevance of the above result by proving that no deterministic algorithm exists to compute the underlying graph of any connected-over-time TVG, i.e. Any TVG of the weakest class of long-lived TVGs.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122679351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The emergence of multi-clouds makes it difficult for application providers to offer reliable applications to end users. The different levels of infrastructure reliability offered by various cloud providers need to be abstracted at application level through application-aware algorithms for high availability. This task is challenging due to the closed world approach taken by the various cloud providers. In the face of different access and management policies orchestrated distributed management algorithms are needed instead of centralized solutions. In this paper we present a decentralized autonomic algorithm for achieving application high availability by harnessing the properties of scalable component-based applications and the advantage of overlay networks to communicate between peers. In a multi-cloud environment the algorithm maintains cloud provider independence while achieving global application availability. The algorithm was tested on a simulator and results show that it gives similar results to a centralized approach without inducing much communication overhead.
{"title":"Distributed Scheduling Algorithm for Highly Available Component Based Applications","authors":"M. Frîncu","doi":"10.1109/IPDPSW.2015.114","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.114","url":null,"abstract":"The emergence of multi-clouds makes it difficult for application providers to offer reliable applications to end users. The different levels of infrastructure reliability offered by various cloud providers need to be abstracted at application level through application-aware algorithms for high availability. This task is challenging due to the closed world approach taken by the various cloud providers. In the face of different access and management policies orchestrated distributed management algorithms are needed instead of centralized solutions. In this paper we present a decentralized autonomic algorithm for achieving application high availability by harnessing the properties of scalable component-based applications and the advantage of overlay networks to communicate between peers. In a multi-cloud environment the algorithm maintains cloud provider independence while achieving global application availability. The algorithm was tested on a simulator and results show that it gives similar results to a centralized approach without inducing much communication overhead.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"439 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122885804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular Dynamics simulations are widely used to obtain a deeper understanding of chemical reactions, fluid flows, phase transitions, and other physical phenomena due to molecular interactions. The main problem with this method is that it is computationally demanding because of its amount of O (N2) and requirements for prolonged simulations. The use of Graphics Processing Units (GPUs) is an attractive solution and has been applied to this problem thus far. However, such heterogeneous approaches occasionally cause load imbalances between CPUs and GPUs and they don't utilize all computational resources. We propose a method of balancing the workload between CPUs and GPUs, which we implemented. Our method is based on formulating and observing workloads and it statically distributes work according to spatial decomposition. We succeeded in utilizing processors more efficiently and accelerating simulations by 20.7 % at most compared to the original GPU optimized code.
{"title":"GPU Accelerated Molecular Dynamics with Method of Heterogeneous Load Balancing","authors":"T. Udagawa, M. Sekijima","doi":"10.1109/IPDPSW.2015.41","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.41","url":null,"abstract":"Molecular Dynamics simulations are widely used to obtain a deeper understanding of chemical reactions, fluid flows, phase transitions, and other physical phenomena due to molecular interactions. The main problem with this method is that it is computationally demanding because of its amount of O (N2) and requirements for prolonged simulations. The use of Graphics Processing Units (GPUs) is an attractive solution and has been applied to this problem thus far. However, such heterogeneous approaches occasionally cause load imbalances between CPUs and GPUs and they don't utilize all computational resources. We propose a method of balancing the workload between CPUs and GPUs, which we implemented. Our method is based on formulating and observing workloads and it statically distributes work according to spatial decomposition. We succeeded in utilizing processors more efficiently and accelerating simulations by 20.7 % at most compared to the original GPU optimized code.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132657339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frequency scaling and precision reduction optimization of an FPGA accelerated SPICE circuit simulator can enhance performance by 1.5x while lowering implementation cost by 15 -- 20%. This is possible due the inherent fault tolerant capabilities of SPICE that can naturally drive simulator convergence even in presence of arithmetic errors due to frequency scaling and precision reduction. We quantify the impact of these transformations on SPICE by analyzing the resulting convergence residue and runtime. To explain the impact of our optimizations, we develop an empirical error model derived from in-situ frequency scaling experiments and build analytical models of rounding and truncation errors using Gappa-based numerical analysis. Across a range of benchmark SPICE circuits, we are able to tolerate to bit-level fault rates of 10--4 (frequency scaling) and manage up to 8-bit loss in least-significant digits (precision reduction) without compromising SPICE convergence quality while delivering speedups.
{"title":"Enhancing Speedups for FPGA Accelerated SPICE through Frequency Scaling and Precision Reduction","authors":"L. Hui, Nachiket Kapre","doi":"10.1109/IPDPSW.2015.100","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.100","url":null,"abstract":"Frequency scaling and precision reduction optimization of an FPGA accelerated SPICE circuit simulator can enhance performance by 1.5x while lowering implementation cost by 15 -- 20%. This is possible due the inherent fault tolerant capabilities of SPICE that can naturally drive simulator convergence even in presence of arithmetic errors due to frequency scaling and precision reduction. We quantify the impact of these transformations on SPICE by analyzing the resulting convergence residue and runtime. To explain the impact of our optimizations, we develop an empirical error model derived from in-situ frequency scaling experiments and build analytical models of rounding and truncation errors using Gappa-based numerical analysis. Across a range of benchmark SPICE circuits, we are able to tolerate to bit-level fault rates of 10--4 (frequency scaling) and manage up to 8-bit loss in least-significant digits (precision reduction) without compromising SPICE convergence quality while delivering speedups.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134404026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed virtual simulation are prone to load oscillations, as well as load imbalances during run-time. Detecting such imbalances and responding accordingly using load redistribution can be of great utility in keeping execution performance close to the aimed optimal. A dynamic balancing scheme can introduce a reactive approach, but a predictive scheme can prevent imbalances before they occur. Several models can be employed for predicting load, but due to the characteristics in which the load is collected and presented, time series offer reasonable load forecasting in a short time. However, the Holt's model, well known model for time series representation, shows limitations on the forecasting of load. In order to correct this issue, a genetic algorithm approach is introduced to dynamically adjust the model based on the recent modifications on the load behaviour. The convergence of the algorithm can substantially influence the response time of the predictive balancing system, so an analysis is conducted to identify the minimum number of iterations for generating a reasonable adjustment.
{"title":"A Genetic Algorithm Approach for Adjusting Time Series Based Load Prediction","authors":"Raed Alkharboush, R. E. Grande, A. Boukerche","doi":"10.1109/IPDPSW.2015.96","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.96","url":null,"abstract":"Distributed virtual simulation are prone to load oscillations, as well as load imbalances during run-time. Detecting such imbalances and responding accordingly using load redistribution can be of great utility in keeping execution performance close to the aimed optimal. A dynamic balancing scheme can introduce a reactive approach, but a predictive scheme can prevent imbalances before they occur. Several models can be employed for predicting load, but due to the characteristics in which the load is collected and presented, time series offer reasonable load forecasting in a short time. However, the Holt's model, well known model for time series representation, shows limitations on the forecasting of load. In order to correct this issue, a genetic algorithm approach is introduced to dynamically adjust the model based on the recent modifications on the load behaviour. The convergence of the algorithm can substantially influence the response time of the predictive balancing system, so an analysis is conducted to identify the minimum number of iterations for generating a reasonable adjustment.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134589198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent work on graph analytics has sought to leverage the high performance offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithm and limitations in GPU-resident memory for storing large graphs. The Graph Reduce methods presented in this paper permit a GPU-based accelerator to operate on graphs that exceed its internal memory capacity. Graph Reduce operates with a combination of both edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model, to achieve high degrees of parallelism supported by methods that partition graphs across GPU and host memories and efficiently move graph data between both. Graph Reduce-based programming is performed via device functions that include gather map, gather reduce, apply, and scatter, implemented by programmers for the graph algorithms they wish to realize. Experimental evaluations for a wide variety of graph inputs, algorithms, and system configuration demonstrate that Graph Reduce outperforms other competing approaches.
{"title":"GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems","authors":"D. Sengupta, K. Agarwal, S. Song, K. Schwan","doi":"10.1109/IPDPSW.2015.16","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.16","url":null,"abstract":"Recent work on graph analytics has sought to leverage the high performance offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithm and limitations in GPU-resident memory for storing large graphs. The Graph Reduce methods presented in this paper permit a GPU-based accelerator to operate on graphs that exceed its internal memory capacity. Graph Reduce operates with a combination of both edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model, to achieve high degrees of parallelism supported by methods that partition graphs across GPU and host memories and efficiently move graph data between both. Graph Reduce-based programming is performed via device functions that include gather map, gather reduce, apply, and scatter, implemented by programmers for the graph algorithms they wish to realize. Experimental evaluations for a wide variety of graph inputs, algorithms, and system configuration demonstrate that Graph Reduce outperforms other competing approaches.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131837528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nachiket Kapre, Han Jianglei, Andrew Bean, P. Moorthy, Siddhartha
Memory management units that use low-level AXI descriptor chains to hold irregular graph-oriented access sequences can help improve DRAM memory throughput of graph algorithms by almost an order of magnitude. For the Xilinx Zed board, we explore and compare the memory throughputs achievable when using (1) cache-enabled CPUs with an OS, (2) cache-enabled CPUs running bare metal code, (2) CPU-based control of FPGA-based AXI DMAs, and finally (3) local FPGA-based control of AXI DMA transfers. For short-burst irregular traffic generated from sparse graph access patterns, we observe a performance penalty of almost 10× due to DRAM row activations when compared to cache-friendly sequential access. When using an AXI DMA engine configured in FPGA logic and programmed in AXI register mode from the CPU, we can improve DRAM performance by as much as 2.4× over naïve random access on the CPU. In this mode, we use the host CPU to trigger DMA transfer by writing appropriate control information in the internal register of the DMA engine. We also encode the sparse graph access patterns as locally-stored BRAM-hosted AXI descriptor chains to drive the AXI DMA engines with minimal CPU involvement under Scatter Gather mode. In this configuration, we deliver an additional 3× speedup, for a cumulative throughput improvement of 7× over a CPU-based approach using caches while running an OS to manage irregular access.
{"title":"GraphMMU: Memory Management Unit for Sparse Graph Accelerators","authors":"Nachiket Kapre, Han Jianglei, Andrew Bean, P. Moorthy, Siddhartha","doi":"10.1109/IPDPSW.2015.101","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.101","url":null,"abstract":"Memory management units that use low-level AXI descriptor chains to hold irregular graph-oriented access sequences can help improve DRAM memory throughput of graph algorithms by almost an order of magnitude. For the Xilinx Zed board, we explore and compare the memory throughputs achievable when using (1) cache-enabled CPUs with an OS, (2) cache-enabled CPUs running bare metal code, (2) CPU-based control of FPGA-based AXI DMAs, and finally (3) local FPGA-based control of AXI DMA transfers. For short-burst irregular traffic generated from sparse graph access patterns, we observe a performance penalty of almost 10× due to DRAM row activations when compared to cache-friendly sequential access. When using an AXI DMA engine configured in FPGA logic and programmed in AXI register mode from the CPU, we can improve DRAM performance by as much as 2.4× over naïve random access on the CPU. In this mode, we use the host CPU to trigger DMA transfer by writing appropriate control information in the internal register of the DMA engine. We also encode the sparse graph access patterns as locally-stored BRAM-hosted AXI descriptor chains to drive the AXI DMA engines with minimal CPU involvement under Scatter Gather mode. In this configuration, we deliver an additional 3× speedup, for a cumulative throughput improvement of 7× over a CPU-based approach using caches while running an OS to manage irregular access.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134287147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Contention-based protocols are commonly used for providing channel access to the nodes wishing to communicate. The Binary Exponential Back off (BEB) is a well-known contention protocol implemented by the IEEE 802.11 standard. Despite its widespread use, Medium Access Control (MAC) protocols employing BEB struggle to concede channel access when the number of contending nodes increases. The main contribution of this work is to propose a randomized contention protocol to the case where the contending stations have no-collision detection (NCD) capabilities. The proposed protocol, termed RNCD, explores the use of tone signaling to provide fair selection of a transmitter. We show that the task of selecting a single transmitter, among n ≥ 2 NCD-stations, can be accomplished in 48n time slots with probability of at least 1 - 2-1.5n. Furthermore, RNCD works without previous knowledge on the number of contending nodes. For comparison purpose, RNCD and BEB were implemented in OMNeT++ Simulator. For n = 256, the simulation results show that RNCD can deliver twice as much transmissions per second while channel access resolution takes less than 1% of the time needed by the BEB protocol. Different from the exponential growth tendency observed in the channel access time of the BEB implementation, the RNCD has a logarithmic tendency allowing it to better comply with QoS demands of real-time applications.
{"title":"A Fair Randomized Contention Resolution Protocol for Wireless Nodes without Collision Detection Capabilities","authors":"Marcos F. Caetano, J. Bordim","doi":"10.1109/IPDPSW.2015.86","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.86","url":null,"abstract":"Contention-based protocols are commonly used for providing channel access to the nodes wishing to communicate. The Binary Exponential Back off (BEB) is a well-known contention protocol implemented by the IEEE 802.11 standard. Despite its widespread use, Medium Access Control (MAC) protocols employing BEB struggle to concede channel access when the number of contending nodes increases. The main contribution of this work is to propose a randomized contention protocol to the case where the contending stations have no-collision detection (NCD) capabilities. The proposed protocol, termed RNCD, explores the use of tone signaling to provide fair selection of a transmitter. We show that the task of selecting a single transmitter, among n ≥ 2 NCD-stations, can be accomplished in 48n time slots with probability of at least 1 - 2-1.5n. Furthermore, RNCD works without previous knowledge on the number of contending nodes. For comparison purpose, RNCD and BEB were implemented in OMNeT++ Simulator. For n = 256, the simulation results show that RNCD can deliver twice as much transmissions per second while channel access resolution takes less than 1% of the time needed by the BEB protocol. Different from the exponential growth tendency observed in the channel access time of the BEB implementation, the RNCD has a logarithmic tendency allowing it to better comply with QoS demands of real-time applications.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131926484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}