Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555431
C. Vaughan
An implementation is presented for JAC3D on a massively parallel hypercube computer. JACSD, a three dimensional finite element code developed at Sandia, uses several hundred hours of Cray time each year in solving structural analysis problems. Two major areas of investigation are discussed: (1) the development of general methods, data structures, and routines to communicate information between processors, and (2) the implementation and evaluation of four algorithms to map problems onto the node processors of the hypercube in a loadbalanced fashion. The performance of JACJD on the NCUBE/ten is compared with that on a Cray X-MP: the NCUBE/ten version presently takes 20% more compute time than the Cray. On a larger simulation which used more of the NCUBE's memory, the NCUBE/ten would take less compute time than the Cray. Current activity on the newer NCUBE 2 hypercube is summarized which should lead to an order of magnitude improvement in run-time performance for the massively parallel solution of structural analysis problems.
{"title":"Implementation of JAC3D on The NCUBE/ten","authors":"C. Vaughan","doi":"10.1109/DMCC.1990.555431","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555431","url":null,"abstract":"An implementation is presented for JAC3D on a massively parallel hypercube computer. JACSD, a three dimensional finite element code developed at Sandia, uses several hundred hours of Cray time each year in solving structural analysis problems. Two major areas of investigation are discussed: (1) the development of general methods, data structures, and routines to communicate information between processors, and (2) the implementation and evaluation of four algorithms to map problems onto the node processors of the hypercube in a loadbalanced fashion. The performance of JACJD on the NCUBE/ten is compared with that on a Cray X-MP: the NCUBE/ten version presently takes 20% more compute time than the Cray. On a larger simulation which used more of the NCUBE's memory, the NCUBE/ten would take less compute time than the Cray. Current activity on the newer NCUBE 2 hypercube is summarized which should lead to an order of magnitude improvement in run-time performance for the massively parallel solution of structural analysis problems.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124561446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556338
S.N. Gupta, M. Zubair, C. Grosch
In this paper, we describe two visualization tools developed on the DAP-510, a SIMD machine having 1024 processors. The two tools are (i) the interactive visualization tool, and (ii) the display tool. The interactive visualization tool allows the user to steer the course of computation by interactively modifying its parameters based on the visual feedback. The display tool transforms the numeric data into a visual form. It also gives the user capability to manipulate the visual representation. In the implementation of these tools we exploit the parallel features of DAP-510. These tools are utilized for designing and understanding the neural networks. However, it is worth mentioning that these tools are general in nature and can easily interact with other parallel computa, tion processes.
{"title":"Visualization: An Aid to Design and Understand Neural Networks in a Parallel Environment","authors":"S.N. Gupta, M. Zubair, C. Grosch","doi":"10.1109/DMCC.1990.556338","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556338","url":null,"abstract":"In this paper, we describe two visualization tools developed on the DAP-510, a SIMD machine having 1024 processors. The two tools are (i) the interactive visualization tool, and (ii) the display tool. The interactive visualization tool allows the user to steer the course of computation by interactively modifying its parameters based on the visual feedback. The display tool transforms the numeric data into a visual form. It also gives the user capability to manipulate the visual representation. In the implementation of these tools we exploit the parallel features of DAP-510. These tools are utilized for designing and understanding the neural networks. However, it is worth mentioning that these tools are general in nature and can easily interact with other parallel computa, tion processes.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"62 25","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120943586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555387
P.P. Li, Y. Tung
In this paper, three sorting algorithms, Bitonic sort, Shell sort and parallel Quicksort are studied. We analyze the performance of these algorithms and compare them with the empirical results obtained from the implementations on the Symult Series 2010, a distributed-memory, message-passing MIMD machine. Each sorting algorithm is a combination of a parallel sort component and a sequential sort component. These algorithms are designed for sorting M elements of random integers on a N-processor machine, where M > N . We found that Bitonic sort is the best parallel sorting algorithm for small problem size, ( M / N ) < 64, and the parallel Quicksort is the best for large problem size. The new Parallel Quicksort algorithm with a simple key selection method achieves a decent speed-up comparing with other versions of parallel Quicksort on similar parallel machines. Although Shell sort has a worse theoretical time complexity, it does achieve linear speedup for large problem size by using a synchronization step to detect early termination of the sorting steps.
本文研究了三种排序算法:Bitonic排序、Shell排序和并行快速排序。我们分析了这些算法的性能,并将它们与在Symult Series 2010(一个分布式内存、消息传递的MIMD机器)上实现的经验结果进行了比较。每个排序算法都是并行排序组件和顺序排序组件的组合。这些算法设计用于在N处理器机器上对随机整数的M个元素进行排序,其中M > N。研究发现,对于小问题规模(M / N) < 64时,Bitonic排序是最好的并行排序算法;对于大问题规模,并行快速排序是最好的并行排序算法。新的并行快速排序算法采用简单的键选择方法,与同类并行机器上的其他版本的并行快速排序相比,实现了较好的加速。尽管Shell排序具有较差的理论时间复杂度,但通过使用同步步骤来检测排序步骤的早期终止,它确实实现了大问题规模的线性加速。
{"title":"Parallel Sorting on Symult 2010","authors":"P.P. Li, Y. Tung","doi":"10.1109/DMCC.1990.555387","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555387","url":null,"abstract":"In this paper, three sorting algorithms, Bitonic sort, Shell sort and parallel Quicksort are studied. We analyze the performance of these algorithms and compare them with the empirical results obtained from the implementations on the Symult Series 2010, a distributed-memory, message-passing MIMD machine. Each sorting algorithm is a combination of a parallel sort component and a sequential sort component. These algorithms are designed for sorting M elements of random integers on a N-processor machine, where M > N . We found that Bitonic sort is the best parallel sorting algorithm for small problem size, ( M / N ) < 64, and the parallel Quicksort is the best for large problem size. The new Parallel Quicksort algorithm with a simple key selection method achieves a decent speed-up comparing with other versions of parallel Quicksort on similar parallel machines. Although Shell sort has a worse theoretical time complexity, it does achieve linear speedup for large problem size by using a synchronization step to detect early termination of the sorting steps.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"131 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124251863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555413
S. Hutchinson, S. Castillo, E. Hensel, K. Dalton
A study is conducted of the finite element solution of the partial differential equations governing twodimensional electromagnetic field scattering problems on a SIMD computer. A nodal assembly technique is introduced which maps a single node to a single processor. The physical domain is first discretized in parallel to yield the node locations of an 0-grid mesh. Next, the system of equations is assembled and then solved in parallel using a conjugate gradient algorithm for complexvalued, non-symmetric, non-positive definite systems. Using this technique and Thinking Machines Corporation’s Connection Machine-2 (CM-2) , problems with more than 250k nodes are solved. Results of electromagnetic scattering, governed by the 2-d scalar Helmholtz wave equations are presented for a variety of infinite cylinders and airfoil crosssections. Solutions are demonstrated for a wide range of objects. A summary of performance data is given for the set of test problems.
{"title":"The Finite Element Solution of Two-Dimensional Transverse Magnetic Scattering Problems on the Connection Machine","authors":"S. Hutchinson, S. Castillo, E. Hensel, K. Dalton","doi":"10.1109/DMCC.1990.555413","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555413","url":null,"abstract":"A study is conducted of the finite element solution of the partial differential equations governing twodimensional electromagnetic field scattering problems on a SIMD computer. A nodal assembly technique is introduced which maps a single node to a single processor. The physical domain is first discretized in parallel to yield the node locations of an 0-grid mesh. Next, the system of equations is assembled and then solved in parallel using a conjugate gradient algorithm for complexvalued, non-symmetric, non-positive definite systems. Using this technique and Thinking Machines Corporation’s Connection Machine-2 (CM-2) , problems with more than 250k nodes are solved. Results of electromagnetic scattering, governed by the 2-d scalar Helmholtz wave equations are presented for a variety of infinite cylinders and airfoil crosssections. Solutions are demonstrated for a wide range of objects. A summary of performance data is given for the set of test problems.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"313 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115806027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555390
Xiaodong Zhang, Hao Lu
We give a group of parallel methods for solving polynomial related problems and their implementations on a distributed memory multicomputer. These problems are 1. the evaluation of polynomials, 2. the multiplication of polynomials, 3. the division of polynomials, and 4. the interpolation of polynomials. Mathematical analyses are given for exploiting the parallelisms of these operations. The related parallel methods supporting the solutions of these polynomial problems, such as FFT, Toeplitz linear systems and others are also discussed. We present some experimental results of these parallel methods on the Intel hypercube. polynomials based on the Horner’s rule is discussed in section 2. The experimental results on the Intel hypercube are also presented. The parallelism of the polynomial multiplication is exploited by transferring the problem to a set of special FFT series functions, on which the operations can be perfectly distributed among different processors. Section 3 gives the mathematical analyses and parallel method of the polynomial multiplication. The polynomial division problem is solved based on parallel solutions for Toeplitz triangular linear systems and the parallel polynomial multiplication, and is discussed in section 4. Section 5 addresses a parallel method for the Lagrange piecewise cubic polynomial interpolation. Finally, we give a summary and future work in the last section.
{"title":"Parallel Methods for Solving Polynomial Problems on Distributed Memory Multicomputers","authors":"Xiaodong Zhang, Hao Lu","doi":"10.1109/DMCC.1990.555390","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555390","url":null,"abstract":"We give a group of parallel methods for solving polynomial related problems and their implementations on a distributed memory multicomputer. These problems are 1. the evaluation of polynomials, 2. the multiplication of polynomials, 3. the division of polynomials, and 4. the interpolation of polynomials. Mathematical analyses are given for exploiting the parallelisms of these operations. The related parallel methods supporting the solutions of these polynomial problems, such as FFT, Toeplitz linear systems and others are also discussed. We present some experimental results of these parallel methods on the Intel hypercube. polynomials based on the Horner’s rule is discussed in section 2. The experimental results on the Intel hypercube are also presented. The parallelism of the polynomial multiplication is exploited by transferring the problem to a set of special FFT series functions, on which the operations can be perfectly distributed among different processors. Section 3 gives the mathematical analyses and parallel method of the polynomial multiplication. The polynomial division problem is solved based on parallel solutions for Toeplitz triangular linear systems and the parallel polynomial multiplication, and is discussed in section 4. Section 5 addresses a parallel method for the Lagrange piecewise cubic polynomial interpolation. Finally, we give a summary and future work in the last section.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130821873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556268
B. Rochat
The GOTHIC* distributed system implements a generalized virtual memory. The basic memory concept is the segment characterized by three main features : persistence, direct access and shared memory consistency. Thus, the segment can be used for interprocess communication and permanent storage management. This paper explores the design and implementation of a generalized virtual memory and emphasizes data sharing management in loosely coupled multiprocessor memory hierarchy.
{"title":"Design and lmplementation of a Multi-Cache System on a Loosely Coupled Multiprocessor","authors":"B. Rochat","doi":"10.1109/DMCC.1990.556268","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556268","url":null,"abstract":"The GOTHIC* distributed system implements a generalized virtual memory. The basic memory concept is the segment characterized by three main features : persistence, direct access and shared memory consistency. Thus, the segment can be used for interprocess communication and permanent storage management. This paper explores the design and implementation of a generalized virtual memory and emphasizes data sharing management in loosely coupled multiprocessor memory hierarchy.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131074055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555420
F. Pépin, A. Leonard
v2u = -V x (we,) . (3) Vortex methods are a powerfil tool for the numerical simulation of incompressible flows at high Reynolds number. They are based on a discrete representation of the vorticity field and in the inviscid limit, the computational elements, or vortices, are simply advected at the local fluid velocity. The numerical approximations transform the vorticity equation, a non-linear PDE, into a N-body problem. The S(N2) time complexity usually associated with these problems has limited the number of computational elements to a few thousands. This paper is concerned with the concurrent implementation of fast vortex methods that reduce the time complexity to U(N1ogN). The fast algorithm that is used combines a binary tree data structure with high order expansions for the induced velocity field. The implementation of this particular algorithm on an MIMD archilecture is discussed. Vortex Methods
{"title":"Concurrent Implementation of a Fast Vortex Method","authors":"F. Pépin, A. Leonard","doi":"10.1109/DMCC.1990.555420","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555420","url":null,"abstract":"v2u = -V x (we,) . (3) Vortex methods are a powerfil tool for the numerical simulation of incompressible flows at high Reynolds number. They are based on a discrete representation of the vorticity field and in the inviscid limit, the computational elements, or vortices, are simply advected at the local fluid velocity. The numerical approximations transform the vorticity equation, a non-linear PDE, into a N-body problem. The S(N2) time complexity usually associated with these problems has limited the number of computational elements to a few thousands. This paper is concerned with the concurrent implementation of fast vortex methods that reduce the time complexity to U(N1ogN). The fast algorithm that is used combines a binary tree data structure with high order expansions for the induced velocity field. The implementation of this particular algorithm on an MIMD archilecture is discussed. Vortex Methods","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128718071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555364
J. Baek, K. Teague
A p a r a l l e l th inning algor i thm based on boundary fo l lowing i s presented i n t h i s paper. The boundary o f each object region i s extracted and l inked i n pa ra l l e l . The resu l t ing object boundary data i s div ided based on the object s ize and the nurrber o f nodes f o r load balancing, then the divided objects are red is t r ibu ted t o the nodes. Each boundary i n a node i s projected on a Ilworking planell. Next, the boundary data i s repeatedly shrunken unti l only the skeleton o f the region remains. The conventional i t e r a t i v e pa ra l l e l algori thm as wel l as our new algor i thm are implemented on a hypercubetopology multiprocessor computer, the I n t e l iPSC/2. The two algorithms are compared and analyzed. Some resu l t ing f igures and execution times are presented.
{"title":"Parallel Thinning on a Distributed Memory Machine","authors":"J. Baek, K. Teague","doi":"10.1109/DMCC.1990.555364","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555364","url":null,"abstract":"A p a r a l l e l th inning algor i thm based on boundary fo l lowing i s presented i n t h i s paper. The boundary o f each object region i s extracted and l inked i n pa ra l l e l . The resu l t ing object boundary data i s div ided based on the object s ize and the nurrber o f nodes f o r load balancing, then the divided objects are red is t r ibu ted t o the nodes. Each boundary i n a node i s projected on a Ilworking planell. Next, the boundary data i s repeatedly shrunken unti l only the skeleton o f the region remains. The conventional i t e r a t i v e pa ra l l e l algori thm as wel l as our new algor i thm are implemented on a hypercubetopology multiprocessor computer, the I n t e l iPSC/2. The two algorithms are compared and analyzed. Some resu l t ing f igures and execution times are presented.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126859794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556292
D. Lee, M. Aboelaze
{"title":"An Efficient Method For Distributing Data In Hypercube Computers","authors":"D. Lee, M. Aboelaze","doi":"10.1109/DMCC.1990.556292","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556292","url":null,"abstract":"","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123574199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}