Many real-world problems can be represented as graphs and solved by graph traversal algorithms. Single-Source Shortest Path (SSSP) is a fundamental graph algorithm. Today, large-scale graphs involve millions or even billions of vertices, making efficient parallel graph processing challenging. In this paper, we propose a single-FPGA based design to accelerate SSSP for massive graphs. We adopt the well-known Bellman-Ford algorithm. In the proposed design, graph is stored in external memory, which is more realistic for processing large scale graphs. Using the available external memory bandwidth, our design achieves the maximum data parallelism to concurrently process multiple edges in each clock cycle, regardless of data dependencies. The performance of our design is independent of the graph structure as well. We propose a optimized data layout to enable efficient utilization of external memory bandwidth. We prototype our design using a state-of-the-art FPGA. Experimental results show that our design is capable of processing 1.6 billion edges per second (GTEPS) using a single FPGA, while simultaneously achieving high clock rate of over 200 MHz. This would place us in the 131st position of the Graph 500 benchmark list of supercomputing systems for data intensive applications. Our solution therefore provides comparable performance to state-of-the-art systems.
{"title":"Accelerating Large-Scale Single-Source Shortest Path on FPGA","authors":"Shijie Zhou, C. Chelmis, V. Prasanna","doi":"10.1109/IPDPSW.2015.130","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.130","url":null,"abstract":"Many real-world problems can be represented as graphs and solved by graph traversal algorithms. Single-Source Shortest Path (SSSP) is a fundamental graph algorithm. Today, large-scale graphs involve millions or even billions of vertices, making efficient parallel graph processing challenging. In this paper, we propose a single-FPGA based design to accelerate SSSP for massive graphs. We adopt the well-known Bellman-Ford algorithm. In the proposed design, graph is stored in external memory, which is more realistic for processing large scale graphs. Using the available external memory bandwidth, our design achieves the maximum data parallelism to concurrently process multiple edges in each clock cycle, regardless of data dependencies. The performance of our design is independent of the graph structure as well. We propose a optimized data layout to enable efficient utilization of external memory bandwidth. We prototype our design using a state-of-the-art FPGA. Experimental results show that our design is capable of processing 1.6 billion edges per second (GTEPS) using a single FPGA, while simultaneously achieving high clock rate of over 200 MHz. This would place us in the 131st position of the Graph 500 benchmark list of supercomputing systems for data intensive applications. Our solution therefore provides comparable performance to state-of-the-art systems.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"54 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113956531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This section includes the articles presented at the 18th International Workshop on Nature Inspired Distributed Computing (NIDISC 2015) held in conjunction with the 29th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2015), May 25-29 2015, Hyderabad, India. The NIDISC workshop is an opportunity for researchers to explore the connections between biology, nature-inspired techniques, metaheuristics and the development of solutions to problems that arise in parallel and distributed processing, communications and other application areas.
{"title":"NIDISC Introduction and Committees","authors":"P. Bouvry, F. Seredyński, E. Talbi","doi":"10.1109/IPDPSW.2014.211","DOIUrl":"https://doi.org/10.1109/IPDPSW.2014.211","url":null,"abstract":"This section includes the articles presented at the 18th International Workshop on Nature Inspired Distributed Computing (NIDISC 2015) held in conjunction with the 29th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS 2015), May 25-29 2015, Hyderabad, India. The NIDISC workshop is an opportunity for researchers to explore the connections between biology, nature-inspired techniques, metaheuristics and the development of solutions to problems that arise in parallel and distributed processing, communications and other application areas.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125642672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolas Braud-Santoni, S. Dubois, Mohamed-Hamza Kaaouachi, F. Petit
We address highly dynamic distributed systems modelled by time-varying graphs (TVGs). We are interested in proof of impossibility results that often use informal arguments about convergence. First, we provide a topological distance metric over sets of TVGs to correctly define the convergence of TVG sequences in such sets. Next, we provide a general framework that formally proves the convergence of the sequence of executions of any deterministic algorithm over TVGs of any convergent sequence of TVGs. Finally, we illustrate the relevance of the above result by proving that no deterministic algorithm exists to compute the underlying graph of any connected-over-time TVG, i.e. Any TVG of the weakest class of long-lived TVGs.
{"title":"A Generic Framework for Impossibility Results in Time-Varying Graphs","authors":"Nicolas Braud-Santoni, S. Dubois, Mohamed-Hamza Kaaouachi, F. Petit","doi":"10.1109/IPDPSW.2015.59","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.59","url":null,"abstract":"We address highly dynamic distributed systems modelled by time-varying graphs (TVGs). We are interested in proof of impossibility results that often use informal arguments about convergence. First, we provide a topological distance metric over sets of TVGs to correctly define the convergence of TVG sequences in such sets. Next, we provide a general framework that formally proves the convergence of the sequence of executions of any deterministic algorithm over TVGs of any convergent sequence of TVGs. Finally, we illustrate the relevance of the above result by proving that no deterministic algorithm exists to compute the underlying graph of any connected-over-time TVG, i.e. Any TVG of the weakest class of long-lived TVGs.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122679351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The emergence of multi-clouds makes it difficult for application providers to offer reliable applications to end users. The different levels of infrastructure reliability offered by various cloud providers need to be abstracted at application level through application-aware algorithms for high availability. This task is challenging due to the closed world approach taken by the various cloud providers. In the face of different access and management policies orchestrated distributed management algorithms are needed instead of centralized solutions. In this paper we present a decentralized autonomic algorithm for achieving application high availability by harnessing the properties of scalable component-based applications and the advantage of overlay networks to communicate between peers. In a multi-cloud environment the algorithm maintains cloud provider independence while achieving global application availability. The algorithm was tested on a simulator and results show that it gives similar results to a centralized approach without inducing much communication overhead.
{"title":"Distributed Scheduling Algorithm for Highly Available Component Based Applications","authors":"M. Frîncu","doi":"10.1109/IPDPSW.2015.114","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.114","url":null,"abstract":"The emergence of multi-clouds makes it difficult for application providers to offer reliable applications to end users. The different levels of infrastructure reliability offered by various cloud providers need to be abstracted at application level through application-aware algorithms for high availability. This task is challenging due to the closed world approach taken by the various cloud providers. In the face of different access and management policies orchestrated distributed management algorithms are needed instead of centralized solutions. In this paper we present a decentralized autonomic algorithm for achieving application high availability by harnessing the properties of scalable component-based applications and the advantage of overlay networks to communicate between peers. In a multi-cloud environment the algorithm maintains cloud provider independence while achieving global application availability. The algorithm was tested on a simulator and results show that it gives similar results to a centralized approach without inducing much communication overhead.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"439 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122885804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular Dynamics simulations are widely used to obtain a deeper understanding of chemical reactions, fluid flows, phase transitions, and other physical phenomena due to molecular interactions. The main problem with this method is that it is computationally demanding because of its amount of O (N2) and requirements for prolonged simulations. The use of Graphics Processing Units (GPUs) is an attractive solution and has been applied to this problem thus far. However, such heterogeneous approaches occasionally cause load imbalances between CPUs and GPUs and they don't utilize all computational resources. We propose a method of balancing the workload between CPUs and GPUs, which we implemented. Our method is based on formulating and observing workloads and it statically distributes work according to spatial decomposition. We succeeded in utilizing processors more efficiently and accelerating simulations by 20.7 % at most compared to the original GPU optimized code.
{"title":"GPU Accelerated Molecular Dynamics with Method of Heterogeneous Load Balancing","authors":"T. Udagawa, M. Sekijima","doi":"10.1109/IPDPSW.2015.41","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.41","url":null,"abstract":"Molecular Dynamics simulations are widely used to obtain a deeper understanding of chemical reactions, fluid flows, phase transitions, and other physical phenomena due to molecular interactions. The main problem with this method is that it is computationally demanding because of its amount of O (N2) and requirements for prolonged simulations. The use of Graphics Processing Units (GPUs) is an attractive solution and has been applied to this problem thus far. However, such heterogeneous approaches occasionally cause load imbalances between CPUs and GPUs and they don't utilize all computational resources. We propose a method of balancing the workload between CPUs and GPUs, which we implemented. Our method is based on formulating and observing workloads and it statically distributes work according to spatial decomposition. We succeeded in utilizing processors more efficiently and accelerating simulations by 20.7 % at most compared to the original GPU optimized code.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132657339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frequency scaling and precision reduction optimization of an FPGA accelerated SPICE circuit simulator can enhance performance by 1.5x while lowering implementation cost by 15 -- 20%. This is possible due the inherent fault tolerant capabilities of SPICE that can naturally drive simulator convergence even in presence of arithmetic errors due to frequency scaling and precision reduction. We quantify the impact of these transformations on SPICE by analyzing the resulting convergence residue and runtime. To explain the impact of our optimizations, we develop an empirical error model derived from in-situ frequency scaling experiments and build analytical models of rounding and truncation errors using Gappa-based numerical analysis. Across a range of benchmark SPICE circuits, we are able to tolerate to bit-level fault rates of 10--4 (frequency scaling) and manage up to 8-bit loss in least-significant digits (precision reduction) without compromising SPICE convergence quality while delivering speedups.
{"title":"Enhancing Speedups for FPGA Accelerated SPICE through Frequency Scaling and Precision Reduction","authors":"L. Hui, Nachiket Kapre","doi":"10.1109/IPDPSW.2015.100","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.100","url":null,"abstract":"Frequency scaling and precision reduction optimization of an FPGA accelerated SPICE circuit simulator can enhance performance by 1.5x while lowering implementation cost by 15 -- 20%. This is possible due the inherent fault tolerant capabilities of SPICE that can naturally drive simulator convergence even in presence of arithmetic errors due to frequency scaling and precision reduction. We quantify the impact of these transformations on SPICE by analyzing the resulting convergence residue and runtime. To explain the impact of our optimizations, we develop an empirical error model derived from in-situ frequency scaling experiments and build analytical models of rounding and truncation errors using Gappa-based numerical analysis. Across a range of benchmark SPICE circuits, we are able to tolerate to bit-level fault rates of 10--4 (frequency scaling) and manage up to 8-bit loss in least-significant digits (precision reduction) without compromising SPICE convergence quality while delivering speedups.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134404026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed virtual simulation are prone to load oscillations, as well as load imbalances during run-time. Detecting such imbalances and responding accordingly using load redistribution can be of great utility in keeping execution performance close to the aimed optimal. A dynamic balancing scheme can introduce a reactive approach, but a predictive scheme can prevent imbalances before they occur. Several models can be employed for predicting load, but due to the characteristics in which the load is collected and presented, time series offer reasonable load forecasting in a short time. However, the Holt's model, well known model for time series representation, shows limitations on the forecasting of load. In order to correct this issue, a genetic algorithm approach is introduced to dynamically adjust the model based on the recent modifications on the load behaviour. The convergence of the algorithm can substantially influence the response time of the predictive balancing system, so an analysis is conducted to identify the minimum number of iterations for generating a reasonable adjustment.
{"title":"A Genetic Algorithm Approach for Adjusting Time Series Based Load Prediction","authors":"Raed Alkharboush, R. E. Grande, A. Boukerche","doi":"10.1109/IPDPSW.2015.96","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.96","url":null,"abstract":"Distributed virtual simulation are prone to load oscillations, as well as load imbalances during run-time. Detecting such imbalances and responding accordingly using load redistribution can be of great utility in keeping execution performance close to the aimed optimal. A dynamic balancing scheme can introduce a reactive approach, but a predictive scheme can prevent imbalances before they occur. Several models can be employed for predicting load, but due to the characteristics in which the load is collected and presented, time series offer reasonable load forecasting in a short time. However, the Holt's model, well known model for time series representation, shows limitations on the forecasting of load. In order to correct this issue, a genetic algorithm approach is introduced to dynamically adjust the model based on the recent modifications on the load behaviour. The convergence of the algorithm can substantially influence the response time of the predictive balancing system, so an analysis is conducted to identify the minimum number of iterations for generating a reasonable adjustment.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134589198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sanem Arslan, H. Topcuoglu, M. Kandemir, Oguz Tosun
Modern architectures are increasingly susceptible to transient and permanent faults due to continuously decreasing transistor sizes and faster operating frequencies. The probability of soft error occurrence is relatively high on cache structures due to the large area of the logic compared to other parts. Applying fault tolerance unselectively for all caches has a significant overhead on performance and energy. In this study, we propose asymmetrically reliable caches aiming to provide required reliability using just enough extra hardware under the performance and energy constraints. In our framework, a chip multiprocessor consists of one reliability-aware core which has ECC protection on its data cache for critical data and a set of less reliable cores with unprotected data caches to map noncritical data. The experimental results for selected applications show that our proposed technique provides 21% better reliability for only 6% more energy consumption compared to traditional caches.
{"title":"Performance and Energy Efficient Asymmetrically Reliable Caches for Multicore Architectures","authors":"Sanem Arslan, H. Topcuoglu, M. Kandemir, Oguz Tosun","doi":"10.1109/IPDPSW.2015.113","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.113","url":null,"abstract":"Modern architectures are increasingly susceptible to transient and permanent faults due to continuously decreasing transistor sizes and faster operating frequencies. The probability of soft error occurrence is relatively high on cache structures due to the large area of the logic compared to other parts. Applying fault tolerance unselectively for all caches has a significant overhead on performance and energy. In this study, we propose asymmetrically reliable caches aiming to provide required reliability using just enough extra hardware under the performance and energy constraints. In our framework, a chip multiprocessor consists of one reliability-aware core which has ECC protection on its data cache for critical data and a set of less reliable cores with unprotected data caches to map noncritical data. The experimental results for selected applications show that our proposed technique provides 21% better reliability for only 6% more energy consumption compared to traditional caches.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115715127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-tiered transactional web applications are frequently used in enterprise based systems. Due to their inherent distributed nature, pre-deployment testing for high-availability and varying concurrency are important for post-deployment performance. Accurate performance modeling of such applications can help estimate values for future deployment variations as well as validate experimental results. In order to theoretically model performance of multi-tiered applications, we use queuing networks and Mean Value Analysis (MVA) models. While MVA has been shown to work well with closed queuing networks, there are particular limitations in cases where the service demands vary with concurrency. This is further contrived by the use of multi-server queues in multi-core CPUs, that are not traditionally captured in MVA. We compare performance of a multi-server MVA model alongside actual performance testing measurements and demonstrate this deviation. Using spline interpolation of collected service demands, we show that a modified version of the MVA algorithm (called MVASD) that accepts an array of service demands, can provide superior estimates of maximum throughput and response time. Results are demonstrated over multi-tier vehicle insurance registration and e-commerce web applications. The mean deviations of predicted throughput and response time are shown to be less the 3% and 9%, respectively. Additionally, we analyze the effect of spline interpolation of service demands as a function of throughput on the prediction results.
{"title":"Performance Modeling of Multi-tiered Web Applications with Varying Service Demands","authors":"A. Kattepur, M. Nambiar","doi":"10.1109/IPDPSW.2015.28","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.28","url":null,"abstract":"Multi-tiered transactional web applications are frequently used in enterprise based systems. Due to their inherent distributed nature, pre-deployment testing for high-availability and varying concurrency are important for post-deployment performance. Accurate performance modeling of such applications can help estimate values for future deployment variations as well as validate experimental results. In order to theoretically model performance of multi-tiered applications, we use queuing networks and Mean Value Analysis (MVA) models. While MVA has been shown to work well with closed queuing networks, there are particular limitations in cases where the service demands vary with concurrency. This is further contrived by the use of multi-server queues in multi-core CPUs, that are not traditionally captured in MVA. We compare performance of a multi-server MVA model alongside actual performance testing measurements and demonstrate this deviation. Using spline interpolation of collected service demands, we show that a modified version of the MVA algorithm (called MVASD) that accepts an array of service demands, can provide superior estimates of maximum throughput and response time. Results are demonstrated over multi-tier vehicle insurance registration and e-commerce web applications. The mean deviations of predicted throughput and response time are shown to be less the 3% and 9%, respectively. Additionally, we analyze the effect of spline interpolation of service demands as a function of throughput on the prediction results.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115285543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Device-to-Device (i.e. D2D) communication under-laying cellular technology not only increases system capacity but also utilizes the advantage of physical proximity of communicating devices to support services like proximity services, offload traffic from Base Station (i.e. BS) etc. But proximity discovery and synchronization among devices efficiently poses new research challenges for cellular networks. Inspired by the synchronization behaviour of fire fly found in nature, the reported algorithms based on bio-inspired firefly heuristics for synchronization among devices as well as service interest among them having drawback of large convergence time and large message exchanges. Therefore, we propose an improved O (n log n) distributed firefly algorithm for D2D large scale networks using tree based topological mechanism using RSSI based ranging scheme.
{"title":"Firefly Inspired Improved Distributed Proximity Algorithm for D2D Communication","authors":"A. Pratap, R. Misra","doi":"10.1109/IPDPSW.2015.64","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.64","url":null,"abstract":"Device-to-Device (i.e. D2D) communication under-laying cellular technology not only increases system capacity but also utilizes the advantage of physical proximity of communicating devices to support services like proximity services, offload traffic from Base Station (i.e. BS) etc. But proximity discovery and synchronization among devices efficiently poses new research challenges for cellular networks. Inspired by the synchronization behaviour of fire fly found in nature, the reported algorithms based on bio-inspired firefly heuristics for synchronization among devices as well as service interest among them having drawback of large convergence time and large message exchanges. Therefore, we propose an improved O (n log n) distributed firefly algorithm for D2D large scale networks using tree based topological mechanism using RSSI based ranging scheme.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121742933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}