Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262898
Xiaojun Shen, W. Liang
The authors present a parallel algorithm for the multiple edge update problem on a minimum spanning tree. This problem is defined as follows: given a minimum spanning tree T(V,E/sub T/) of an undirected graph G(V,E), where mod V mod =n and E/sub T/ is the set of tree edges, recompute a new minimum spanning tree when (1) adding K new edges, (2) changing the weights of existent K edges, or (3) deleting a vertex of degree K in the tree, where 1>
提出了一种求解最小生成树多边更新问题的并行算法。该问题定义如下:给定无向图G(V,E)的最小生成树T(V,E/下标T/),其中mod V mod =n,E/下标T/为树边的集合,当(1)添加K条新边,(2)改变现有K条边的权值,或(3)删除树中K度的顶点,其中1>
{"title":"A parallel algorithm for multiple edge updates of minimum spanning trees","authors":"Xiaojun Shen, W. Liang","doi":"10.1109/IPPS.1993.262898","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262898","url":null,"abstract":"The authors present a parallel algorithm for the multiple edge update problem on a minimum spanning tree. This problem is defined as follows: given a minimum spanning tree T(V,E/sub T/) of an undirected graph G(V,E), where mod V mod =n and E/sub T/ is the set of tree edges, recompute a new minimum spanning tree when (1) adding K new edges, (2) changing the weights of existent K edges, or (3) deleting a vertex of degree K in the tree, where 1<or=K<n. Their algorithm requires O(logKlogn) time and O(n/sup 2//lognlogK) processors on a SIMD CREW PRAM model. If the weights of the current tree edges are not allowed to increase, then their algorithm runs in the same time bound, but only using O(max(n,nK/lognlogK)) processors. Their algorithm is optimal for dense graphs, if no intermediate results are available from computing the original MST.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"18 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132568319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262771
K. Asakura, Toyohide Watanabe, N. Sugie
The local-network-based computer system, in which some workstations are connected is coming into practical use. Software development is, however, very difficult for end-users because the system has complicated problems such as load balancing, communication among processes on different workstations and so on. The authors propose a C-specific parallelizing compiler to solve these problems. The compiler adopts function call parallelization, and takes less communication overhead than DO loop parallelization.<>
{"title":"C parallelizing compiler on local-network-based computer environment","authors":"K. Asakura, Toyohide Watanabe, N. Sugie","doi":"10.1109/IPPS.1993.262771","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262771","url":null,"abstract":"The local-network-based computer system, in which some workstations are connected is coming into practical use. Software development is, however, very difficult for end-users because the system has complicated problems such as load balancing, communication among processes on different workstations and so on. The authors propose a C-specific parallelizing compiler to solve these problems. The compiler adopts function call parallelization, and takes less communication overhead than DO loop parallelization.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133331915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262850
Yung-Syau Chen, M. Dubois
The authors introduce hardware cache protocols in which invalidations affect only part of a cached block so that the processor can keep reading the valid part. On a cache miss the entire block is fetched in the cache. The proposed protocols take advantage of the prefetching effects associated with large block sizes while reducing the false sharing miss rate. It does not rely on synchronization as other previous proposals do and therefore it is applicable to systems under any memory consistency model including sequential consistency. Simulation results show that protocols with partial block invalidations may provide significant miss rate and memory traffic reductions over protocols with invalidations of entire blocks. The hardware cost is low and the protocol complexity is only marginally increased.<>
{"title":"Cache protocols with partial block invalidations","authors":"Yung-Syau Chen, M. Dubois","doi":"10.1109/IPPS.1993.262850","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262850","url":null,"abstract":"The authors introduce hardware cache protocols in which invalidations affect only part of a cached block so that the processor can keep reading the valid part. On a cache miss the entire block is fetched in the cache. The proposed protocols take advantage of the prefetching effects associated with large block sizes while reducing the false sharing miss rate. It does not rely on synchronization as other previous proposals do and therefore it is applicable to systems under any memory consistency model including sequential consistency. Simulation results show that protocols with partial block invalidations may provide significant miss rate and memory traffic reductions over protocols with invalidations of entire blocks. The hardware cost is low and the protocol complexity is only marginally increased.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"76 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133054669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262820
J. Antonio, R. C. Metzger
A nonlinear programming approach is introduced for solving the hypercube embedding problem. The basic idea of the proposed approach is to approximate the discrete space of an n-dimensional hypercube, i.e. (z:z in (0,1)/sup n/), with the continuous space of an n-dimensional hypersphere, i.e. (x:x in R/sup n/ and mod mod x mod mod /sup 2/=1). The mapping problem is initially solved in the continuous domain by employing the gradient projection technique to a continuously differentiable objective function. The optimal process 'locations' from the solution of the continuous hypersphere mapping problem are then discretized onto the n-dimensional hypercube. The proposed approach can solve, directly, the problem of mapping P processes onto N nodes for the general case where P>N. In contrast, competing embedding heuristics from the literature can produce only one-to-one mappings and cannot, therefore, be directly applied when P>N.<>
介绍了一种求解超立方体嵌入问题的非线性规划方法。该方法的基本思想是将n维超立方体的离散空间(z:z in (0,1)/sup n/)近似为n维超球的连续空间(x:x in R/sup n/ and mod mod x mod mod /sup 2/=1)。采用梯度投影技术对连续可微目标函数在连续域内的映射问题进行了初步解决。然后将连续超球映射问题解的最优过程“位置”离散到n维超立方体上。对于P>N的一般情况,该方法可以直接解决P个过程映射到N个节点的问题。相反,来自文献的竞争性嵌入启发式只能产生一对一映射,因此,当P> n > >时,不能直接应用。
{"title":"Hypersphere Mapper: a nonlinear programming approach to the hypercube embedding problem","authors":"J. Antonio, R. C. Metzger","doi":"10.1109/IPPS.1993.262820","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262820","url":null,"abstract":"A nonlinear programming approach is introduced for solving the hypercube embedding problem. The basic idea of the proposed approach is to approximate the discrete space of an n-dimensional hypercube, i.e. (z:z in (0,1)/sup n/), with the continuous space of an n-dimensional hypersphere, i.e. (x:x in R/sup n/ and mod mod x mod mod /sup 2/=1). The mapping problem is initially solved in the continuous domain by employing the gradient projection technique to a continuously differentiable objective function. The optimal process 'locations' from the solution of the continuous hypersphere mapping problem are then discretized onto the n-dimensional hypercube. The proposed approach can solve, directly, the problem of mapping P processes onto N nodes for the general case where P>N. In contrast, competing embedding heuristics from the literature can produce only one-to-one mappings and cannot, therefore, be directly applied when P>N.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128833054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262821
V. Chaudhary, B. Sabata, J. Aggarwal
The authors show the universality of the VEDIC network in simulating other well known interconnection networks by generating the parameters of the VEDIC network automatically. Algorithms are given to represent chordal rings, toroidal meshes, binary hypercubes, k-ary n-cubes, and Cayley graphs-star graph and pancake graph, as VEDIC networks. Using these parameters the VEDIC network can be used as a tool for generating currently known and new interconnection networks.<>
{"title":"Mapping interconnection networks into VEDIC networks","authors":"V. Chaudhary, B. Sabata, J. Aggarwal","doi":"10.1109/IPPS.1993.262821","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262821","url":null,"abstract":"The authors show the universality of the VEDIC network in simulating other well known interconnection networks by generating the parameters of the VEDIC network automatically. Algorithms are given to represent chordal rings, toroidal meshes, binary hypercubes, k-ary n-cubes, and Cayley graphs-star graph and pancake graph, as VEDIC networks. Using these parameters the VEDIC network can be used as a tool for generating currently known and new interconnection networks.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133154669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262818
S. Guha
The systolic screen is a very natural parallel architecture for image processing. A square root n* square root n systolic screen consists of a square root n* square root n mesh-of-processors with each processor representing a pixel in a grid. The author studies computational geometry problems for polygonal images on such a screen. The algorithms are analog in that they simulate 'physical' processes based on the homeomorphic representation of polygons on the systolic screen. He obtains simple optimal parallel analog algorithms, subject to some restrictions due to the limited resolution of a systolic screen, that run in O( square root n) time on a square root n* square root n systolic screen, for various problems on polygons.<>
{"title":"Parallel analog algorithms for processing polygonal images on a systolic screen","authors":"S. Guha","doi":"10.1109/IPPS.1993.262818","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262818","url":null,"abstract":"The systolic screen is a very natural parallel architecture for image processing. A square root n* square root n systolic screen consists of a square root n* square root n mesh-of-processors with each processor representing a pixel in a grid. The author studies computational geometry problems for polygonal images on such a screen. The algorithms are analog in that they simulate 'physical' processes based on the homeomorphic representation of polygons on the systolic screen. He obtains simple optimal parallel analog algorithms, subject to some restrictions due to the limited resolution of a systolic screen, that run in O( square root n) time on a square root n* square root n systolic screen, for various problems on polygons.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115667577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262792
R. Vetter, D. Du, A. Klietz
The CM-2's natural data layout is not conducive to exchanging data with other machines. Before CM-2 data is sent to a remote machine, a bitwise transpose must be performed on the data. Each bit in an n bit value must be transmitted to a different processor, requiring n send operations through the CM-2's global router network. The time required to transpose the data limits the effective throughput of the I/O channel to a small fraction of its peak theoretical bandwidth. For example, when sending data to a remote supercomputer using a 100 MB/s HIPPI channel, an effective throughput of only 4.9 MB/s can be achieved. The authors describe the CM-2 transpose problem and study ways to improve the performance of transposed data transmissions.<>
{"title":"The CM-2 data transposition problem","authors":"R. Vetter, D. Du, A. Klietz","doi":"10.1109/IPPS.1993.262792","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262792","url":null,"abstract":"The CM-2's natural data layout is not conducive to exchanging data with other machines. Before CM-2 data is sent to a remote machine, a bitwise transpose must be performed on the data. Each bit in an n bit value must be transmitted to a different processor, requiring n send operations through the CM-2's global router network. The time required to transpose the data limits the effective throughput of the I/O channel to a small fraction of its peak theoretical bandwidth. For example, when sending data to a remote supercomputer using a 100 MB/s HIPPI channel, an effective throughput of only 4.9 MB/s can be achieved. The authors describe the CM-2 transpose problem and study ways to improve the performance of transposed data transmissions.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116037540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262815
E. Dekel, Jie Hu
Given an arbitrary st-numbering of a biconnected graph of n vertices and m edges, parallel algorithms for dynamically finding another st-numbering of any specified pair of vertices on EREW P-RAM are presented. The algorithms run in O(log n) time using m/logn processors on EREW P-RAM. The dynamic st-numbering is applied to some network problems such as bipartition, centroid trees and centered trees construction.<>
{"title":"Parallel dynamic st-numbering and applications","authors":"E. Dekel, Jie Hu","doi":"10.1109/IPPS.1993.262815","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262815","url":null,"abstract":"Given an arbitrary st-numbering of a biconnected graph of n vertices and m edges, parallel algorithms for dynamically finding another st-numbering of any specified pair of vertices on EREW P-RAM are presented. The algorithms run in O(log n) time using m/logn processors on EREW P-RAM. The dynamic st-numbering is applied to some network problems such as bipartition, centroid trees and centered trees construction.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116457451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262881
O. Ibarra, Q. Zheng
The authors show that the single-source shortest path problem for permutation graphs can be solved in O(logn) time using O(n/logn) processors on an EREW PRAM. As an application, they show that a minimum connected dominating set of a permutation graph can be found in O(logn) time using O(n/logn) processors. The algorithms are optimal with respect to the time-processor product.<>
{"title":"On the shortest path problem for permutation graphs","authors":"O. Ibarra, Q. Zheng","doi":"10.1109/IPPS.1993.262881","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262881","url":null,"abstract":"The authors show that the single-source shortest path problem for permutation graphs can be solved in O(logn) time using O(n/logn) processors on an EREW PRAM. As an application, they show that a minimum connected dominating set of a permutation graph can be found in O(logn) time using O(n/logn) processors. The algorithms are optimal with respect to the time-processor product.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123627286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262917
H. K. Dai
The optimal bound of (n-m+c)c+(m-c) and an Omega (k(n-m+c)c/sup 1/k/) lower bound on the size of synchronous strictly non-blocking c-limited (n,m)-concentrators with depth 1 and depth k respectively are proved. A consequence of the lower-bound result is an Theta (n/sup 1+1/k/) bound on the size of synchronous strictly non-blocking fixed ratio gamma n-limited ( alpha n, beta n)-concentrators ( gamma < beta ) with constant depth k. For synchronous strictly non-blocking (c,r)-limited (n,m)-generalized-concentrators, the optimal size of nc-(/sup m-c///sub r/)(c-r) for depth 1 and a lower bound size-depth tradeoff Omega ((n-/sup m///sub r/+/sup c///sub r/)r/sup k-1/k/c/sup 1/k/) for constant depth k and r=o(c) are also presented.<>
{"title":"On synchronous strictly non-blocking concentrators and generalized-concentrators","authors":"H. K. Dai","doi":"10.1109/IPPS.1993.262917","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262917","url":null,"abstract":"The optimal bound of (n-m+c)c+(m-c) and an Omega (k(n-m+c)c/sup 1/k/) lower bound on the size of synchronous strictly non-blocking c-limited (n,m)-concentrators with depth 1 and depth k respectively are proved. A consequence of the lower-bound result is an Theta (n/sup 1+1/k/) bound on the size of synchronous strictly non-blocking fixed ratio gamma n-limited ( alpha n, beta n)-concentrators ( gamma < beta ) with constant depth k. For synchronous strictly non-blocking (c,r)-limited (n,m)-generalized-concentrators, the optimal size of nc-(/sup m-c///sub r/)(c-r) for depth 1 and a lower bound size-depth tradeoff Omega ((n-/sup m///sub r/+/sup c///sub r/)r/sup k-1/k/c/sup 1/k/) for constant depth k and r=o(c) are also presented.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130441730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}