Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234903
P. Loustaunau, P.Y. Wang
The authors summarize the results of a preliminary study that examines the feasibility of implementing computer algebra systems on massively parallel single-instruction multiple-data architectures. On serial computers, these systems rely on B.Buchberger's (1970, 1985) algorithm for computing Grobner bases. A parallelization of this algorithm that addresses the potential growth in the number of polynomials that can be generated during the computation is proposed. The parallel algorithm was implemented on a Connection Machine CM-200 System. The experimental results which were obtained for seven test problems are evaluated. The results of this study provide insights into ongoing research to develop more efficient parallel algorithms for finding Grobner bases.<>
{"title":"Towards efficient parallelizations of a computer algebra algorithm","authors":"P. Loustaunau, P.Y. Wang","doi":"10.1109/FMPC.1992.234903","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234903","url":null,"abstract":"The authors summarize the results of a preliminary study that examines the feasibility of implementing computer algebra systems on massively parallel single-instruction multiple-data architectures. On serial computers, these systems rely on B.Buchberger's (1970, 1985) algorithm for computing Grobner bases. A parallelization of this algorithm that addresses the potential growth in the number of polynomials that can be generated during the computation is proposed. The parallel algorithm was implemented on a Connection Machine CM-200 System. The experimental results which were obtained for seven test problems are evaluated. The results of this study provide insights into ongoing research to develop more efficient parallel algorithms for finding Grobner bases.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123126312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234900
Z. Bozkus, S. Ranka, G. Fox
The authors study the performance of the CM-5 multiprocessor. They provide a number of benchmarks for its communication and computation performance. Many of the operations, like scans and global reduction, can be performed using special hardware available on the CM-5. These operations have been benchmarked. The authors also describe how to embed a mesh and a hypercube on a CM-5 architecture and provide timings for some mesh and hypercube communication primitives on the CM-5.<>
{"title":"Benchmarking the CM-5 multicomputer","authors":"Z. Bozkus, S. Ranka, G. Fox","doi":"10.1109/FMPC.1992.234900","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234900","url":null,"abstract":"The authors study the performance of the CM-5 multiprocessor. They provide a number of benchmarks for its communication and computation performance. Many of the operations, like scans and global reduction, can be performed using special hardware available on the CM-5. These operations have been benchmarked. The authors also describe how to embed a mesh and a hypercube on a CM-5 architecture and provide timings for some mesh and hypercube communication primitives on the CM-5.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114091013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234935
A. Youssef
The problem of offline permutation scheduling on linear arrays, rings, hypercubes, and two-dimensional arrays, assuming the CSFR (circuit-switched fixed routing) model, is examined. Optimal permutation scheduling involves finding a minimum number of subsets of nonconflicting source-destination paths. Every subset of paths can be established to run in one pass. Optimal permutation scheduling on linear arrays is shown to be linear and on rings NP-complete. On hypercubes, the problem is NP-complete. However, the author discusses an O(N log N) algorithm that routes any permutation in two passes if the model is relaxed to allow for two routing rules, the e-cube rule and the e/sup -1/-cube rule. This complexity is reduced to O(N) hypercube-parallel time. An O(N log/sup 2/ N) bipartite-matching-based algorithm designed to schedule any permutation on p*q meshes/tori in q passes is considered.<>
{"title":"Off-line permutation scheduling on circuit-switched fixed routing networks","authors":"A. Youssef","doi":"10.1109/FMPC.1992.234935","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234935","url":null,"abstract":"The problem of offline permutation scheduling on linear arrays, rings, hypercubes, and two-dimensional arrays, assuming the CSFR (circuit-switched fixed routing) model, is examined. Optimal permutation scheduling involves finding a minimum number of subsets of nonconflicting source-destination paths. Every subset of paths can be established to run in one pass. Optimal permutation scheduling on linear arrays is shown to be linear and on rings NP-complete. On hypercubes, the problem is NP-complete. However, the author discusses an O(N log N) algorithm that routes any permutation in two passes if the model is relaxed to allow for two routing rules, the e-cube rule and the e/sup -1/-cube rule. This complexity is reduced to O(N) hypercube-parallel time. An O(N log/sup 2/ N) bipartite-matching-based algorithm designed to schedule any permutation on p*q meshes/tori in q passes is considered.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115398897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234949
S.K. Das, A. Banerjee
The authors propose and analyze a new hypercubelike topology, called the hyper-Petersen (HP) network, which is constructed from the Cartesian product of a binary hypercube and the Petersen graph. The properties of HP topology include regularity, a high degree of symmetry and connectivity, and a small diameter. For example, it is shown that an n-dimensional HP network with N=1.25*2/sup n/ nodes covers 2.5 times more nodes than the binary hypercube at the cost of increasing the degree by one. Furthermore, with the same degree and connectivity, the diameter of the HP network is one less than that of a hypercube, yet it has a 1.25 times higher packing density. The authors also discuss the embedding of various other topologies such as meshes, trees, and twisted hypercubes on the HP, thereby emphasizing its rich interconnection structure with a simple routing scheme for message communication. A ring of odd length can be embedded in an HP network, which is a limitation of a binary hypercube.<>
作者提出并分析了一种新的超立方体拓扑,称为超Petersen (HP)网络,它是由二元超立方体与Petersen图的笛卡尔积构造而成的。HP拓扑具有规则性、高度对称性和连通性以及直径小等特点。例如,N=1.25*2/sup N /个节点的N维HP网络覆盖的节点数量是二元超立方体的2.5倍,其代价是度增加1。此外,在相同的度和连通性下,HP网络的直径比超立方体的直径小1,但其填充密度是超立方体的1.25倍。作者还讨论了在HP上嵌入各种其他拓扑结构,如网格、树和扭曲超立方体,从而强调了其丰富的互连结构和用于消息通信的简单路由方案。奇数长度的环可以嵌入到HP网络中,这是二元超立方体的限制。
{"title":"Hyper Petersen network: yet another hypercube-like topology","authors":"S.K. Das, A. Banerjee","doi":"10.1109/FMPC.1992.234949","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234949","url":null,"abstract":"The authors propose and analyze a new hypercubelike topology, called the hyper-Petersen (HP) network, which is constructed from the Cartesian product of a binary hypercube and the Petersen graph. The properties of HP topology include regularity, a high degree of symmetry and connectivity, and a small diameter. For example, it is shown that an n-dimensional HP network with N=1.25*2/sup n/ nodes covers 2.5 times more nodes than the binary hypercube at the cost of increasing the degree by one. Furthermore, with the same degree and connectivity, the diameter of the HP network is one less than that of a hypercube, yet it has a 1.25 times higher packing density. The authors also discuss the embedding of various other topologies such as meshes, trees, and twisted hypercubes on the HP, thereby emphasizing its rich interconnection structure with a simple routing scheme for message communication. A ring of odd length can be embedded in an HP network, which is a limitation of a binary hypercube.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122864969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234883
K. Batcher
The static perfect shuffle (shuffle-exchange) network is based on H. S. Stone's (1971) perfect shuffle network; the two-by-two switches are removed and replaced by exchange links between processors. The network could be used in a low-cost flexible simulator for other networks such as multistage cube networks and hypercubes. The simulation of switched networks and of static networks is discussed.<>
{"title":"Low-cost flexible simulation with the static perfect shuffle network","authors":"K. Batcher","doi":"10.1109/FMPC.1992.234883","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234883","url":null,"abstract":"The static perfect shuffle (shuffle-exchange) network is based on H. S. Stone's (1971) perfect shuffle network; the two-by-two switches are removed and replaced by exchange links between processors. The network could be used in a low-cost flexible simulator for other networks such as multistage cube networks and hypercubes. The simulation of switched networks and of static networks is discussed.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120899905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234884
Y.-C. Chen, W.-T. Cheng
The authors present an asymptotically efficient parallel algorithm for summing up n binary values on reconfigurable meshes. They show that, given n binary values on an n/sup 1/2/*n/sup 1/2/ reconfigurable mesh with each processor containing one value initially, the summation of the n binary values can be performed in O(log log n) time. Several applications of the algorithm are presented. It is shown that summing up n b-bit numbers can be performed in O(b log log n) time on an n/sup 1/2/*n/sup 1/2/ reconfigurable mesh. Next, the histogram computation, of an n*n image can be completed in O(L* log log n) time on an n*n reconfigurable mesh, where L is the number of gray-level values. A parallel algorithm for computing the area and perimeter of image components in an n*n image is developed on an n*n reconfigurable mesh. The resulting time complexity is O(C log log n) time, where C is the number of image components. The implementation of enumeration sort on an n*n/sup 1/2/*n/sup 1/2/ reconfigurable mesh is shown. O(log log n) time is required for the sorting algorithm.<>
{"title":"Reconfigurable mesh algorithms for summing up binary values and its applications","authors":"Y.-C. Chen, W.-T. Cheng","doi":"10.1109/FMPC.1992.234884","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234884","url":null,"abstract":"The authors present an asymptotically efficient parallel algorithm for summing up n binary values on reconfigurable meshes. They show that, given n binary values on an n/sup 1/2/*n/sup 1/2/ reconfigurable mesh with each processor containing one value initially, the summation of the n binary values can be performed in O(log log n) time. Several applications of the algorithm are presented. It is shown that summing up n b-bit numbers can be performed in O(b log log n) time on an n/sup 1/2/*n/sup 1/2/ reconfigurable mesh. Next, the histogram computation, of an n*n image can be completed in O(L* log log n) time on an n*n reconfigurable mesh, where L is the number of gray-level values. A parallel algorithm for computing the area and perimeter of image components in an n*n image is developed on an n*n reconfigurable mesh. The resulting time complexity is O(C log log n) time, where C is the number of image components. The implementation of enumeration sort on an n*n/sup 1/2/*n/sup 1/2/ reconfigurable mesh is shown. O(log log n) time is required for the sorting algorithm.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116534411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234953
R. Poovendran, J. Dorband
A novel, highly parallel algorithm for a class of direct and inverse scattering problems is proposed. It is shown that this algorithm reduces the noise propagation exhibited by the existing algorithms, and produces error terms that are proportional to the square of the discrete step size. Unlike the conventional algorithms, this new formulation decouples the reflection kernel in a given layer. Due to its decoupling nature, the new formulation completely eliminated any error propagation between any two points in the same layer. Numerical examples are presented to illustrate the proposed algorithm.<>
{"title":"An algorithm for a class of direct and inverse scattering problems","authors":"R. Poovendran, J. Dorband","doi":"10.1109/FMPC.1992.234953","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234953","url":null,"abstract":"A novel, highly parallel algorithm for a class of direct and inverse scattering problems is proposed. It is shown that this algorithm reduces the noise propagation exhibited by the existing algorithms, and produces error terms that are proportional to the square of the discrete step size. Unlike the conventional algorithms, this new formulation decouples the reflection kernel in a given layer. Due to its decoupling nature, the new formulation completely eliminated any error propagation between any two points in the same layer. Numerical examples are presented to illustrate the proposed algorithm.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126516765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234872
B. Alleyne, I. Scherson
The authors discuss a N input/output recirculating network that can take advantage of compile-time knowledge of algorithm-dependent communications, but still performs efficiently on data-dependent permutations. A deterministic routing algorithm and a randomized routing algorithm are given. Mapping to Clos networks is considered.<>
{"title":"Permutation routing in 2-stage recirculating delta networks","authors":"B. Alleyne, I. Scherson","doi":"10.1109/FMPC.1992.234872","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234872","url":null,"abstract":"The authors discuss a N input/output recirculating network that can take advantage of compile-time knowledge of algorithm-dependent communications, but still performs efficiently on data-dependent permutations. A deterministic routing algorithm and a randomized routing algorithm are given. Mapping to Clos networks is considered.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131486219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234947
S. Park, B. Bose
The authors propose simple and optimal fault-tolerant broadcasting algorithms in the presence of at most n-1 link or node faults in an n-dimensional hypercube. Further results for up to 2n-3 faulty links or nodes are also considered. These algorithms are optimal or close to optimal in terms of the number of communication steps. The algorithm takes n+1 time steps even in the presence of n-1 faulty links or nodes; this can be achieved even with a single port for up to 2n-3 link or node faults, even with a single port, the algorithms take at most n+3 steps.<>
{"title":"Broadcasting in hypercubes with link/node failures","authors":"S. Park, B. Bose","doi":"10.1109/FMPC.1992.234947","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234947","url":null,"abstract":"The authors propose simple and optimal fault-tolerant broadcasting algorithms in the presence of at most n-1 link or node faults in an n-dimensional hypercube. Further results for up to 2n-3 faulty links or nodes are also considered. These algorithms are optimal or close to optimal in terms of the number of communication steps. The algorithm takes n+1 time steps even in the presence of n-1 faulty links or nodes; this can be achieved even with a single port for up to 2n-3 link or node faults, even with a single port, the algorithms take at most n+3 steps.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134561102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234907
E. Hao, P. MacKenzie, Q. Stout
A Theta (log n) time algorithm to select the kth smallest element in a set of n elements on a reconfigurable mesh with n processors is obtained. This improves on the previous fastest algorithm's running time by a factor of log n. It is also shown that variants of this problem can be solved even faster. Finally, a proof of Omega (log log n) lower bound time for the rmesh selection problem is given.<>
{"title":"Selection on the reconfigurable mesh","authors":"E. Hao, P. MacKenzie, Q. Stout","doi":"10.1109/FMPC.1992.234907","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234907","url":null,"abstract":"A Theta (log n) time algorithm to select the kth smallest element in a set of n elements on a reconfigurable mesh with n processors is obtained. This improves on the previous fastest algorithm's running time by a factor of log n. It is also shown that variants of this problem can be solved even faster. Finally, a proof of Omega (log log n) lower bound time for the rmesh selection problem is given.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134598400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}