Pub Date : 2004-09-01DOI: 10.1142/S0129626404001921
M. Eshaghian-Wilner, Russ Miller
In this paper, we introduce the Systolic Reconfigurable Mesh (SRM), which combines aspects of the reconfigurable mesh with that of systolic arrays. Every processor controls a local switch that can be reconfigured during every clock cycle in order to control the physical connections between its four bi-directional bus lines. Data is input on one side of the systolic reconfigurable mesh and output from another side, one row/column per unit time. Efficient algorithms are presented for intermediate-level vision tasks, including histograming, connectivity, convexity, and proximity.
{"title":"The systolic reconfigurable mesh","authors":"M. Eshaghian-Wilner, Russ Miller","doi":"10.1142/S0129626404001921","DOIUrl":"https://doi.org/10.1142/S0129626404001921","url":null,"abstract":"In this paper, we introduce the Systolic Reconfigurable Mesh (SRM), which combines aspects of the reconfigurable mesh with that of systolic arrays. Every processor controls a local switch that can be reconfigured during every clock cycle in order to control the physical connections between its four bi-directional bus lines. Data is input on one side of the systolic reconfigurable mesh and output from another side, one row/column per unit time. Efficient algorithms are presented for intermediate-level vision tasks, including histograming, connectivity, convexity, and proximity.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"33 1","pages":"146-150"},"PeriodicalIF":0.4,"publicationDate":"2004-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87501301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-26DOI: 10.1142/S0129626403001574
L. Garcés-Erice, E. Biersack, K. Ross, P. Felber, G. Urvoy-Keller
Structured peer-to-peer (P2P) lookup services organize peers into a flat overlay network and offer distributed hash table (DHT) functionality. Data is associated with keys and each peer is responsible for a subset of the keys. In hierarchical DHTs, peers are organized into groups, and each group has its autonomous intra-group overlay network and lookup service. Groups are organized in a top-level overlay network. To find a peer that is responsible for a key, the top-level overlay first determines the group responsible for the key; the responsible group then uses its intra-group overlay to determine the specific peer that is responsible for the key. We provide a general framework for hierarchical DHTs with scalable overlay management. We specifically study a two-tier hierarchy that uses Chord for the top level. Our analysis shows that by using the most reliable peers in the top level, the hierarchical design significantly reduces the expected number of hops. We also present a method to construct hierarchical DHTs that map well to the Internet topology and achieve short intra-group communication delay. The results demonstrate the feasibility of locality-based peer groups, which allow P2P systems to take full advantage of the hierarchical design.
{"title":"Hierarchical Peer-To-Peer Systems","authors":"L. Garcés-Erice, E. Biersack, K. Ross, P. Felber, G. Urvoy-Keller","doi":"10.1142/S0129626403001574","DOIUrl":"https://doi.org/10.1142/S0129626403001574","url":null,"abstract":"Structured peer-to-peer (P2P) lookup services organize peers into a flat overlay network and offer distributed hash table (DHT) functionality. Data is associated with keys and each peer is responsible for a subset of the keys. In hierarchical DHTs, peers are organized into groups, and each group has its autonomous intra-group overlay network and lookup service. Groups are organized in a top-level overlay network. To find a peer that is responsible for a key, the top-level overlay first determines the group responsible for the key; the responsible group then uses its intra-group overlay to determine the specific peer that is responsible for the key. We provide a general framework for hierarchical DHTs with scalable overlay management. We specifically study a two-tier hierarchy that uses Chord for the top level. Our analysis shows that by using the most reliable peers in the top level, the hierarchical design significantly reduces the expected number of hops. We also present a method to construct hierarchical DHTs that map well to the Internet topology and achieve short intra-group communication delay. The results demonstrate the feasibility of locality-based peer groups, which allow P2P systems to take full advantage of the hierarchical design.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"11 1","pages":"1230-1239"},"PeriodicalIF":0.4,"publicationDate":"2003-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81311228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-26DOI: 10.1142/S0129626403001525
H. Bischof, S. Gorlatch, E. Kitzelmann
Skeletons are reusable, parameterized components with well-defined semantics and pre-packaged efficient parallel implementation. This paper develops a new, provably cost-optimal implementation of the DS (double-scan) skeleton for the divide-and-conquer paradigm. Our implementation is based on a novel data structure called plist (pointed list); implementation’s performance is estimated using an analytical model. We demonstrate the use of the DS skeleton for parallelizing a tridiagonal system solver and report experimental results for its MPI implementation on a Cray T3E and a Linux cluster: they confirm the performance improvement achieved by the cost-optimal implementation and demonstrate its good predictability by our performance model.
{"title":"Cost Optimality And Predictability Of Parallel Programming With Skeletons","authors":"H. Bischof, S. Gorlatch, E. Kitzelmann","doi":"10.1142/S0129626403001525","DOIUrl":"https://doi.org/10.1142/S0129626403001525","url":null,"abstract":"Skeletons are reusable, parameterized components with well-defined semantics and pre-packaged efficient parallel implementation. This paper develops a new, provably cost-optimal implementation of the DS (double-scan) skeleton for the divide-and-conquer paradigm. Our implementation is based on a novel data structure called plist (pointed list); implementation’s performance is estimated using an analytical model. We demonstrate the use of the DS skeleton for parallelizing a tridiagonal system solver and report experimental results for its MPI implementation on a Cray T3E and a Linux cluster: they confirm the performance improvement achieved by the cost-optimal implementation and demonstrate its good predictability by our performance model.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"93 1","pages":"682-693"},"PeriodicalIF":0.4,"publicationDate":"2003-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78598008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-03-25DOI: 10.1142/S0129626404001933
S. Agostino
We show nearly work-optimal parallel decoding algorithms which run on the PRAM EREW in O(log n) time with O(n/(log n)1/2) processors for text compressed with LZ1 and LZ2 methods, where n is the length of the output string. We also present pseudo work-optimal PRAM EREW decoders for finite window compression and LZ2 compression requiring logarithmic time with O(dn) work, where d is the window size and the alphabet size respectively. Finally, we observe that PRAM EREW decoders requiring O(log n) time and O(n/log n) processors are possible with the non-conservative assumption that the computer word length is O(log2 n) bits.
{"title":"Almost work-optimal pram erew decoders of LZ compressed text","authors":"S. Agostino","doi":"10.1142/S0129626404001933","DOIUrl":"https://doi.org/10.1142/S0129626404001933","url":null,"abstract":"We show nearly work-optimal parallel decoding algorithms which run on the PRAM EREW in O(log n) time with O(n/(log n)1/2) processors for text compressed with LZ1 and LZ2 methods, where n is the length of the output string. We also present pseudo work-optimal PRAM EREW decoders for finite window compression and LZ2 compression requiring logarithmic time with O(dn) work, where d is the window size and the alphabet size respectively. Finally, we observe that PRAM EREW decoders requiring O(log n) time and O(n/log n) processors are possible with the non-conservative assumption that the computer word length is O(log2 n) bits.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"74 1","pages":"422"},"PeriodicalIF":0.4,"publicationDate":"2003-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85061595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-06-01DOI: 10.1016/S0129-6264(01)00064-6
S. Bruda, S. Akl
{"title":"ON THE NECESSITY OF FORMAL MODELS FOR REAL-TIME PARALLEL COMPUTATIONS*","authors":"S. Bruda, S. Akl","doi":"10.1016/S0129-6264(01)00064-6","DOIUrl":"https://doi.org/10.1016/S0129-6264(01)00064-6","url":null,"abstract":"","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"11 1","pages":"353-361"},"PeriodicalIF":0.4,"publicationDate":"2001-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"56429612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-06-01DOI: 10.1016/S0129-6264(01)00065-8
M. Guo
{"title":"DENOTATIONAL SEMANTICS OF AN HPF-LIKE DATA-PARALLEL LANGUAGE MODEL","authors":"M. Guo","doi":"10.1016/S0129-6264(01)00065-8","DOIUrl":"https://doi.org/10.1016/S0129-6264(01)00065-8","url":null,"abstract":"","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"11 1","pages":"363-374"},"PeriodicalIF":0.4,"publicationDate":"2001-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"56429713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Author’s name in Latin alphabet, using the author’s own transliteration, repeated in brackets in ISO 9:1995 translitera− tion, date, title of the paper translated into English, followed by: [in Russian], title of the journal in ISO 9:1995 transliteration. The International Standard ISO 9:1995, a univocal system of one character for one character equivalents for Cyrillic and Latin alphabets does not require any knowledge of Russian language.
{"title":"Editorial Note","authors":"Michel Cosnard, Ulrich Finger, Jean-Luc Gaudiot","doi":"10.4202/app.2008.0214","DOIUrl":"https://doi.org/10.4202/app.2008.0214","url":null,"abstract":"Author’s name in Latin alphabet, using the author’s own transliteration, repeated in brackets in ISO 9:1995 translitera− tion, date, title of the paper translated into English, followed by: [in Russian], title of the journal in ISO 9:1995 transliteration. The International Standard ISO 9:1995, a univocal system of one character for one character equivalents for Cyrillic and Latin alphabets does not require any knowledge of Russian language.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"36 1","pages":"356 - 356"},"PeriodicalIF":0.4,"publicationDate":"2001-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81536500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-01DOI: 10.1142/S0129626400000202
Gabriel Antoniu, L. Bougé, R. Namyst, Christian Pérez
The compilation of data-parallel languages is traditionally targeted to low-level runtime environments: abstract processors are mapped onto static system processes, which directly address the low-level communication library. Alternatively, we propose to map each HPF abstract processor onto a "lightweight process" (thread) which can be dynamically migrated between nodes together with the data it manages, under the supervision of some external scheduler. We discuss the pros and cons of such an approach and the facilities which must be provided by the multithreaded runtime. We describe a prototype HPF compiling system built along these lines, based on the Adaptor HPF compiler and using the PM2 multithreaded runtime environment.
{"title":"Compiling Data-Parallel Programs to a Distributed Runtime Environment with Thread Isomigration","authors":"Gabriel Antoniu, L. Bougé, R. Namyst, Christian Pérez","doi":"10.1142/S0129626400000202","DOIUrl":"https://doi.org/10.1142/S0129626400000202","url":null,"abstract":"The compilation of data-parallel languages is traditionally targeted to low-level runtime environments: abstract processors are mapped onto static system processes, which directly address the low-level communication library. Alternatively, we propose to map each HPF abstract processor onto a \"lightweight process\" (thread) which can be dynamically migrated between nodes together with the data it manages, under the supervision of some external scheduler. We discuss the pros and cons of such an approach and the facilities which must be provided by the multithreaded runtime. We describe a prototype HPF compiling system built along these lines, based on the Adaptor HPF compiler and using the PM2 multithreaded runtime environment.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"56 1","pages":"1756-1762"},"PeriodicalIF":0.4,"publicationDate":"2000-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80423172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-12-01DOI: 10.1142/S0129626499000451
Berna L. Massingill
Parallel programming continues to be difficult and errorprone, whether starting from specifications or from an existing sequential program. This paper presents (1) a methodology for parallelizing sequential applications and (2) experiments in applying the methodology. The methodology is based on the use of stepwise refinement together with what we call parallel programming archetypes (briefly, abstractions that capture common features of classes of programs), in which most of the work of parallelization is done using familiar sequential tools and techniques, and those parts of the process that cannot be addressed with sequential tools and techniques are addressed with formally-justified transformations. The experiments consist of applying the methodology to sequential application programs, and they provide evidence that the methodology produces correct and reasonably efficient programs at reasonable human-effort cost. Of particular interest is the fact that the aspect of the methodology that is most completely formally justified is the aspect that in practice was the most trouble-free.
{"title":"Experiments with Program Parallelization Using Archetypes and Stepwise Refinement","authors":"Berna L. Massingill","doi":"10.1142/S0129626499000451","DOIUrl":"https://doi.org/10.1142/S0129626499000451","url":null,"abstract":"Parallel programming continues to be difficult and errorprone, whether starting from specifications or from an existing sequential program. This paper presents (1) a methodology for parallelizing sequential applications and (2) experiments in applying the methodology. The methodology is based on the use of stepwise refinement together with what we call parallel programming archetypes (briefly, abstractions that capture common features of classes of programs), in which most of the work of parallelization is done using familiar sequential tools and techniques, and those parts of the process that cannot be addressed with sequential tools and techniques are addressed with formally-justified transformations. The experiments consist of applying the methodology to sequential application programs, and they provide evidence that the methodology produces correct and reasonably efficient programs at reasonable human-effort cost. Of particular interest is the fact that the aspect of the methodology that is most completely formally justified is the aspect that in practice was the most trouble-free.","PeriodicalId":44742,"journal":{"name":"Parallel Processing Letters","volume":"1982 1","pages":"844-856"},"PeriodicalIF":0.4,"publicationDate":"1999-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82208338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}