Pub Date : 2018-04-01DOI: 10.1142/S0129626418500020
Y. Sudo, A. Datta, L. Larmore, T. Masuzawa
Self-stabilizing, but non-silent, distributed algorithms, for center finding in chain and tree networks are presented in the link register model. We assume that there exists a designated root in a ...
{"title":"Constant Space Self-stabilizing Center Finding Algorithms in Chains and Trees","authors":"Y. Sudo, A. Datta, L. Larmore, T. Masuzawa","doi":"10.1142/S0129626418500020","DOIUrl":"https://doi.org/10.1142/S0129626418500020","url":null,"abstract":"Self-stabilizing, but non-silent, distributed algorithms, for center finding in chain and tree networks are presented in the link register model. We assume that there exists a designated root in a ...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121952287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-01DOI: 10.1142/S0129626418500044
Weijian Zheng, Fengguang Song, Lan Lin, Zizhong Chen
Implementing parallel software for QR factorizations to achieve scalable performance on massively parallel manycore systems requires a comprehensive design that includes algorithm redesign, efficie...
{"title":"Scaling Up Parallel Computation of Tiled QR Factorizations by a Distributed Scheduling Runtime System and Analytical Modeling","authors":"Weijian Zheng, Fengguang Song, Lan Lin, Zizhong Chen","doi":"10.1142/S0129626418500044","DOIUrl":"https://doi.org/10.1142/S0129626418500044","url":null,"abstract":"Implementing parallel software for QR factorizations to achieve scalable performance on massively parallel manycore systems requires a comprehensive design that includes algorithm redesign, efficie...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124720914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-02-02DOI: 10.1142/s0129626418500068
A. Mostéfaoui, Matthieu Perrin, M. Raynal
This paper presents a simple generalization of the basic atomic read/write register object, whose genericity parameter spans the whole set of integers and is such that its k-parameterized instance has exactly consensus number k. This object, whose definition is natural, is a sliding window register of size k. Its interest lies in its simplicity and its genericity dimension which provides a global view capturing the whole consensus hierarchy. Hence, this short article should be seen as a simple pedagogical introduction to Herlihy’s consensus hierarchy. The paper also shows that the consensus number of a ledger object is [Formula: see text].
{"title":"A Simple Object that Spans the Whole Consensus Hierarchy","authors":"A. Mostéfaoui, Matthieu Perrin, M. Raynal","doi":"10.1142/s0129626418500068","DOIUrl":"https://doi.org/10.1142/s0129626418500068","url":null,"abstract":"This paper presents a simple generalization of the basic atomic read/write register object, whose genericity parameter spans the whole set of integers and is such that its k-parameterized instance has exactly consensus number k. This object, whose definition is natural, is a sliding window register of size k. Its interest lies in its simplicity and its genericity dimension which provides a global view capturing the whole consensus hierarchy. Hence, this short article should be seen as a simple pedagogical introduction to Herlihy’s consensus hierarchy. The paper also shows that the consensus number of a ledger object is [Formula: see text].","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133751171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-05DOI: 10.1142/S0129626417500074
Gaetano Coccimiglio, Salimur Choudhury
Clustering is an effective technique that can be used to analyze and extract useful information from large biological networks. Popular clustering solutions often require user input for several algorithm options that can seem very arbitrary without experimentation. These algorithms can provide good results in a reasonable time period but they are not above improvements. We present a local search based clustering algorithm free of such required input that can be used to improve the cluster quality of a set of given clusters taken from any existing algorithm or clusters produced via any arbitrary assignment. We implement this local search using a modern GPU based approach to allow for efficient runtime. The proposed algorithm shows promising results for improving the quality of clusters. With already high quality input clusters we can achieve cluster rating improvements upto to 33%.
{"title":"A Parallel Local Search Algorithm for Clustering Large Biological Networks","authors":"Gaetano Coccimiglio, Salimur Choudhury","doi":"10.1142/S0129626417500074","DOIUrl":"https://doi.org/10.1142/S0129626417500074","url":null,"abstract":"Clustering is an effective technique that can be used to analyze and extract useful information from large biological networks. Popular clustering solutions often require user input for several algorithm options that can seem very arbitrary without experimentation. These algorithms can provide good results in a reasonable time period but they are not above improvements. We present a local search based clustering algorithm free of such required input that can be used to improve the cluster quality of a set of given clusters taken from any existing algorithm or clusters produced via any arbitrary assignment. We implement this local search using a modern GPU based approach to allow for efficient runtime. The proposed algorithm shows promising results for improving the quality of clusters. With already high quality input clusters we can achieve cluster rating improvements upto to 33%.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"701 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124338406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-05DOI: 10.1142/S0129626417500098
Amit Datta, M. De, B. Sinha
A parallel algorithm for prefix computation on N data elements mapped on a Multi Mesh (MM) network of N = n4 processing elements is presented here. The time required by the proposed algorithm is significantly less than that by any of the existing algorithms for prefix computation on mesh-like architectures due to the specific interconnection pattern used in the MM network. The proposed technique requires O(N1/4) time for data communication and O(log N1/4) time for computation, when mapped on a MM network constituted by N1/2 meshes, each of size N1/4 × N1/4. The data communication time in the proposed algorithm is less than the prefix sum algorithm proposed in extended Multi Mesh. To be precise, instead of (13N1/4 − 5)τ communication time the proposed algorithm requires a data communication time of 7.5N1/4τ only. Moreover, the proposed parallel algorithm does not need any extra inter block links as used in the extended Multi Mesh.
{"title":"Fast Parallel Algorithm for Prefix Computation in Multi-Mesh Architecture","authors":"Amit Datta, M. De, B. Sinha","doi":"10.1142/S0129626417500098","DOIUrl":"https://doi.org/10.1142/S0129626417500098","url":null,"abstract":"A parallel algorithm for prefix computation on N data elements mapped on a Multi Mesh (MM) network of N = n4 processing elements is presented here. The time required by the proposed algorithm is significantly less than that by any of the existing algorithms for prefix computation on mesh-like architectures due to the specific interconnection pattern used in the MM network. The proposed technique requires O(N1/4) time for data communication and O(log N1/4) time for computation, when mapped on a MM network constituted by N1/2 meshes, each of size N1/4 × N1/4. The data communication time in the proposed algorithm is less than the prefix sum algorithm proposed in extended Multi Mesh. To be precise, instead of (13N1/4 − 5)τ communication time the proposed algorithm requires a data communication time of 7.5N1/4τ only. Moreover, the proposed parallel algorithm does not need any extra inter block links as used in the extended Multi Mesh.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127892994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-05DOI: 10.1142/S0129626417500104
Amedeo Sapio, M. Baldi, Fulvio Risso, Narendra Anand, A. Nucci
Traffic capture and analysis is key to many domains including network management, security and network forensics. Traditionally, it is performed by a dedicated device accessing traffic at a specific point within the network through a link tap or a port of a node mirroring packets. This approach is problematic because the dedicated device must be equipped with a large amount of computation and storage resources to store and analyze packets. Alternatively, in order to achieve scalability, analysis can be performed by a cluster of hosts. However, this is normally located at a remote location with respect to the observation point, hence requiring to move across the network a large volume of captured traffic. To address this problem, this paper presents an algorithm to distribute the task of capturing, processing and storing packets traversing a network across multiple packet forwarding nodes (e.g., IP routers). Essentially, our solution allows individual nodes on the path of a flow to operate on subsets of pa...
{"title":"Packet Capture and Analysis on MEDINA, A Massively Distributed Network Data Caching Platform","authors":"Amedeo Sapio, M. Baldi, Fulvio Risso, Narendra Anand, A. Nucci","doi":"10.1142/S0129626417500104","DOIUrl":"https://doi.org/10.1142/S0129626417500104","url":null,"abstract":"Traffic capture and analysis is key to many domains including network management, security and network forensics. Traditionally, it is performed by a dedicated device accessing traffic at a specific point within the network through a link tap or a port of a node mirroring packets. This approach is problematic because the dedicated device must be equipped with a large amount of computation and storage resources to store and analyze packets. Alternatively, in order to achieve scalability, analysis can be performed by a cluster of hosts. However, this is normally located at a remote location with respect to the observation point, hence requiring to move across the network a large volume of captured traffic. To address this problem, this paper presents an algorithm to distribute the task of capturing, processing and storing packets traversing a network across multiple packet forwarding nodes (e.g., IP routers). Essentially, our solution allows individual nodes on the path of a flow to operate on subsets of pa...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128725738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-12-04DOI: 10.1142/S0129626421500031
L. Boxer
We analyze the running time of the Saukas-Song algorithm for selection on a coarse grained multicomputer without expressing the running time in terms of communication rounds. This shows that while in the best case the Saukas-Song algorithm runs in asymptotically optimal time, in general it does not. We propose other algorithms for coarse grained selection that have optimal expected running time.
{"title":"Coarse Grained Parallel Selection","authors":"L. Boxer","doi":"10.1142/S0129626421500031","DOIUrl":"https://doi.org/10.1142/S0129626421500031","url":null,"abstract":"We analyze the running time of the Saukas-Song algorithm for selection on a coarse grained multicomputer without expressing the running time in terms of communication rounds. This shows that while in the best case the Saukas-Song algorithm runs in asymptotically optimal time, in general it does not. We propose other algorithms for coarse grained selection that have optimal expected running time.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128100455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present an intelligent scheduling framework which takes as input a set of OpenCL kernels and distributes the workload across multiple CPUs and GPUs in a heterogeneous multicore platform. The fra...
{"title":"A Framework for OpenCL Task Scheduling on Heterogeneous Multicores","authors":"Anirban Ghose, Lokesh Dokara, Soumyajit Dey, Pabitra Mitra","doi":"10.1142/S0129626417500086","DOIUrl":"https://doi.org/10.1142/S0129626417500086","url":null,"abstract":"We present an intelligent scheduling framework which takes as input a set of OpenCL kernels and distributes the workload across multiple CPUs and GPUs in a heterogeneous multicore platform. The fra...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121470592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-06-20DOI: 10.1142/S0129626417500013
P. Fatourou, Y. Nikolakopoulos, M. Papatriantafilou
Shared data object implementations that allow non-blocking concurrent operations are useful for in-memory data-processing, especially when they support consistent bulk operations like iterations. We propose an algorithmic implementation for concurrent iterators on shared double-ended queues (deques), building on and complementing a known lock-free deque implementation by M. Michael. The proposed construction is linearizable and wait-free. Moreover, it is read-only, so it does not execute expensive synchronization primitives and it does not interfere with update operations.
{"title":"Linearizable Wait-Free Iteration Operations in Shared Double-Ended Queues","authors":"P. Fatourou, Y. Nikolakopoulos, M. Papatriantafilou","doi":"10.1142/S0129626417500013","DOIUrl":"https://doi.org/10.1142/S0129626417500013","url":null,"abstract":"Shared data object implementations that allow non-blocking concurrent operations are useful for in-memory data-processing, especially when they support consistent bulk operations like iterations. We propose an algorithmic implementation for concurrent iterators on shared double-ended queues (deques), building on and complementing a known lock-free deque implementation by M. Michael. The proposed construction is linearizable and wait-free. Moreover, it is read-only, so it does not execute expensive synchronization primitives and it does not interfere with update operations.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127359185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-06-01DOI: 10.1142/S0129626417500037
Toni Mancini, A. Massini, E. Tronci
Verification of digital circuits by Cycle-based simulation can be performed in parallel. The parallel implementation requires two phases: the compilation phase, that sets up the data needed for the...
通过基于周期的仿真验证数字电路可以并行进行。并行实现需要两个阶段:编译阶段,设置所需的数据…
{"title":"Parallelization of Cycle-Based Logic Simulation","authors":"Toni Mancini, A. Massini, E. Tronci","doi":"10.1142/S0129626417500037","DOIUrl":"https://doi.org/10.1142/S0129626417500037","url":null,"abstract":"Verification of digital circuits by Cycle-based simulation can be performed in parallel. The parallel implementation requires two phases: the compilation phase, that sets up the data needed for the...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130840496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}