Pub Date : 2017-06-01DOI: 10.1142/S0129626417500025
A. Datta, Stéphane Devismes, L. Larmore, V. Villain
We propose a deterministic silent self-stabilizing algorithm for the weak leader election problem in anonymous trees. Our algorithm is designed in the message passing model, and requires only O(1) ...
{"title":"Self-Stabilizing Weak Leader Election in Anonymous Trees Using Constant Memory per Edge","authors":"A. Datta, Stéphane Devismes, L. Larmore, V. Villain","doi":"10.1142/S0129626417500025","DOIUrl":"https://doi.org/10.1142/S0129626417500025","url":null,"abstract":"We propose a deterministic silent self-stabilizing algorithm for the weak leader election problem in anonymous trees. Our algorithm is designed in the message passing model, and requires only O(1) ...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128571297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-06-01DOI: 10.1142/S0129626417500049
Brahim Neggazi, V. Turau, Mohammed Haddad, H. Kheddouci
The triangle partition problem is a generalization of the well-known graph matching problem consisting of finding the maximum number of independent edges in a given graph, i.e., edges with no commo...
{"title":"A O(m) Self-Stabilizing Algorithm for Maximal Triangle Partition of General Graphs","authors":"Brahim Neggazi, V. Turau, Mohammed Haddad, H. Kheddouci","doi":"10.1142/S0129626417500049","DOIUrl":"https://doi.org/10.1142/S0129626417500049","url":null,"abstract":"The triangle partition problem is a generalization of the well-known graph matching problem consisting of finding the maximum number of independent edges in a given graph, i.e., edges with no commo...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128273101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-06-01DOI: 10.1142/S0129626417500050
Chun-Nan Hung, Cheng-Kuan Lin, Lih-Hsing Hsu, E. Cheng, László Lipták
Fault-Hamiltonicity is an important measure of robustness for interconnection networks. Given a graph G = (V, E). The goal is to ensure that G − F remains Hamiltonian for every F ⊆ V ∪E such that |...
{"title":"Strong Fault-Hamiltonicity for the Crossed Cube and Its Extensions","authors":"Chun-Nan Hung, Cheng-Kuan Lin, Lih-Hsing Hsu, E. Cheng, László Lipták","doi":"10.1142/S0129626417500050","DOIUrl":"https://doi.org/10.1142/S0129626417500050","url":null,"abstract":"Fault-Hamiltonicity is an important measure of robustness for interconnection networks. Given a graph G = (V, E). The goal is to ensure that G − F remains Hamiltonian for every F ⊆ V ∪E such that |...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122305475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-09DOI: 10.1142/S0129626417400047
M. Danelutto, D. D. Sensi, M. Torquati
The dataflow programming model has been extensively used as an effective solution to implement efficient parallel programming frameworks. However, the amount of resources allocated to the runtime s...
数据流编程模型作为实现高效并行编程框架的有效解决方案已被广泛使用。然而,分配给运行时的资源量…
{"title":"A Power-Aware, Self-Adaptive Macro Data Flow Framework","authors":"M. Danelutto, D. D. Sensi, M. Torquati","doi":"10.1142/S0129626417400047","DOIUrl":"https://doi.org/10.1142/S0129626417400047","url":null,"abstract":"The dataflow programming model has been extensively used as an effective solution to implement efficient parallel programming frameworks. However, the amount of resources allocated to the runtime s...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134292692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-09DOI: 10.1142/S0129626417400059
Dalvan Griebler, M. Danelutto, M. Torquati, L. G. Fernandes
This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce a...
{"title":"SPar: A DSL for High-Level and Productive Stream Parallelism","authors":"Dalvan Griebler, M. Danelutto, M. Torquati, L. G. Fernandes","doi":"10.1142/S0129626417400059","DOIUrl":"https://doi.org/10.1142/S0129626417400059","url":null,"abstract":"This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce a...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134503999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-03-09DOI: 10.1142/S0129626417400011
Wijnand Suijlen
Testing parallel applications on a large number of processors is often impractical. Not only does it require access to scarce compute resources, but tracking down defects with the available debuggi...
{"title":"Mock BSPlib for Testing and Debugging Bulk Synchronous Parallel Software","authors":"Wijnand Suijlen","doi":"10.1142/S0129626417400011","DOIUrl":"https://doi.org/10.1142/S0129626417400011","url":null,"abstract":"Testing parallel applications on a large number of processors is often impractical. Not only does it require access to scarce compute resources, but tracking down defects with the available debuggi...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133894959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-21DOI: 10.1142/S012962641650016X
Johanne Cohen, Jonas Lefèvre, Khaled Maâmra, Laurence Pilard, D. Sohier
We propose a self-stabilizing algorithm for computing a maximal matching in an anonymous network. The complexity is O(2) moves with high probability, under the adversarial distributed daemon. Among all adversarial distributed daemons and with the anonymous assumption, our algorithm provides the best known complexity. Moreover, the previous best known algorithm working under the same daemon and using identity has a O(m) complexity leading to the same order of growth than our anonymous algorithm. Finally, we do not make the common assumption that a node can determine whether one of its neighbors points to it or to another node, and still we present a solution with the same asymptotic behavior.
{"title":"A Self-Stabilizing Algorithm for Maximal Matching in Anonymous Networks","authors":"Johanne Cohen, Jonas Lefèvre, Khaled Maâmra, Laurence Pilard, D. Sohier","doi":"10.1142/S012962641650016X","DOIUrl":"https://doi.org/10.1142/S012962641650016X","url":null,"abstract":"We propose a self-stabilizing algorithm for computing a maximal matching in an anonymous network. The complexity is O(2) moves with high probability, under the adversarial distributed daemon. Among all adversarial distributed daemons and with the anonymous assumption, our algorithm provides the best known complexity. Moreover, the previous best known algorithm working under the same daemon and using identity has a O(m) complexity leading to the same order of growth than our anonymous algorithm. Finally, we do not make the common assumption that a node can determine whether one of its neighbors points to it or to another node, and still we present a solution with the same asymptotic behavior.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123161959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-21DOI: 10.1142/S0129626416400028
Jianbin Fang, Peng Zhang, Zhaokui Li, T. Tang, Xuhao Chen, Cheng Chen, Canqun Yang
Using multiple streams can improve the overall system performance by mitigating the data transfer overhead on heterogeneous systems. Prior work focuses a lot on GPUs but little is known about the performance impact on (Intel Xeon) Phi. In this work, we apply multiple streams into six real-world applications on Phi. We then systematically evaluate the performance benefits of using multiple streams. The evaluation work is performed at two levels: the microbenchmarking level and the real-world application level. Our experimental results at the microbenchmark level show that data transfers and kernel execution can be overlapped on Phi, while data transfers in both directions are performed in a serial manner. At the real-world application level, we show that both overlappable and non-overlappable applications can benefit from using multiple streams (with an performance improvement of up to 24%). We also quantify how task granularity and resource granularity impact the overall performance. Finally, we present a...
{"title":"Evaluating Multiple Streams on Heterogeneous Platforms","authors":"Jianbin Fang, Peng Zhang, Zhaokui Li, T. Tang, Xuhao Chen, Cheng Chen, Canqun Yang","doi":"10.1142/S0129626416400028","DOIUrl":"https://doi.org/10.1142/S0129626416400028","url":null,"abstract":"Using multiple streams can improve the overall system performance by mitigating the data transfer overhead on heterogeneous systems. Prior work focuses a lot on GPUs but little is known about the performance impact on (Intel Xeon) Phi. In this work, we apply multiple streams into six real-world applications on Phi. We then systematically evaluate the performance benefits of using multiple streams. The evaluation work is performed at two levels: the microbenchmarking level and the real-world application level. Our experimental results at the microbenchmark level show that data transfers and kernel execution can be overlapped on Phi, while data transfers in both directions are performed in a serial manner. At the real-world application level, we show that both overlappable and non-overlappable applications can benefit from using multiple streams (with an performance improvement of up to 24%). We also quantify how task granularity and resource granularity impact the overall performance. Finally, we present a...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"749 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132333577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-21DOI: 10.1142/S0129626416400016
Jiaquan Gao, Yuanshen Zhou, Kesong Wu
Accelerating the sparse matrix-vector multiplication (SpMV) on the graphics processing units (GPUs) has attracted considerable attention recently. We observe that on a specific multiple-GPU platform, the SpMV performance can usually be greatly improved when a matrix is partitioned into several blocks according to a predetermined rule and each block is assigned to a GPU with an appropriate storage format. This motivates us to propose a novel multi-GPU parallel SpMV optimization model. Our model involves two stages. In the first stage, a simple rule is defined to divide any given matrix among multiple GPUs, and then a performance model, which is independent of the problems and dependent on the resources of devices, is proposed to accurately predict the execution time of SpMV kernels. Using these models, we construct in the second stage an optimally multi-GPU parallel SpMV algorithm that is automatically and rapidly generated for the platform for any problem. Given that our model for SpMV is general, indepen...
{"title":"A Novel Multi-GPU Parallel Optimization Model for The Sparse Matrix-Vector Multiplication","authors":"Jiaquan Gao, Yuanshen Zhou, Kesong Wu","doi":"10.1142/S0129626416400016","DOIUrl":"https://doi.org/10.1142/S0129626416400016","url":null,"abstract":"Accelerating the sparse matrix-vector multiplication (SpMV) on the graphics processing units (GPUs) has attracted considerable attention recently. We observe that on a specific multiple-GPU platform, the SpMV performance can usually be greatly improved when a matrix is partitioned into several blocks according to a predetermined rule and each block is assigned to a GPU with an appropriate storage format. This motivates us to propose a novel multi-GPU parallel SpMV optimization model. Our model involves two stages. In the first stage, a simple rule is defined to divide any given matrix among multiple GPUs, and then a performance model, which is independent of the problems and dependent on the resources of devices, is proposed to accurately predict the execution time of SpMV kernels. Using these models, we construct in the second stage an optimally multi-GPU parallel SpMV algorithm that is automatically and rapidly generated for the platform for any problem. Given that our model for SpMV is general, indepen...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114420964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-21DOI: 10.1142/S0129626416500195
J. Michalakes, M. Iacono, E. Jessup
Large numerical weather prediction (NWP) codes such as the Weather Research and Forecast (WRF) model and the NOAA Nonhydrostatic Multiscale Model (NMM-B) port easily to Intel's Many Integrated Core...
{"title":"Optimizing Weather Model Radiative Transfer Physics for Intel's Many Integrated Core (MIC) Architecture","authors":"J. Michalakes, M. Iacono, E. Jessup","doi":"10.1142/S0129626416500195","DOIUrl":"https://doi.org/10.1142/S0129626416500195","url":null,"abstract":"Large numerical weather prediction (NWP) codes such as the Weather Research and Forecast (WRF) model and the NOAA Nonhydrostatic Multiscale Model (NMM-B) port easily to Intel's Many Integrated Core...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116565691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}