Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555439
A. Skjellum, M. Morari
The accurate, high-speed solution of systems of ordinary differential-algebraic equations (DAE’s) of low index is of great importance in chemical, electrical and other engineering disciplines. Petzold’s Fortran-based DASSL is the most widely used sequential code for solving DAE’s. We have devised and implemented a completely new C code, Concurrent DASSL, specifically for multicomputers and patterned on DASSL. In this work, we address the issues of data distribution and the performance of the overall algorithm, rather than just that of individual steps. Concurrent DASSL is designed as an open, application-independent environment below which linear algebra algorithms may be added in addition to standard support for dense and sparse algorithms. The user may furthermore attach explicit data interconversions between the main computational steps, or choose compromise distributions. A “problem formulator” (simulation layer) must be constructed above Concurrent DASSL, for any specific problem domain. We indicate performance for a particular chemical engineering application, a sequence of coupled distillation columns. Future efforts are cited in conclusion.
{"title":"Concurrent DASSL Applied to Dynamic Distillation Column Simulation","authors":"A. Skjellum, M. Morari","doi":"10.1109/DMCC.1990.555439","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555439","url":null,"abstract":"The accurate, high-speed solution of systems of ordinary differential-algebraic equations (DAE’s) of low index is of great importance in chemical, electrical and other engineering disciplines. Petzold’s Fortran-based DASSL is the most widely used sequential code for solving DAE’s. We have devised and implemented a completely new C code, Concurrent DASSL, specifically for multicomputers and patterned on DASSL. In this work, we address the issues of data distribution and the performance of the overall algorithm, rather than just that of individual steps. Concurrent DASSL is designed as an open, application-independent environment below which linear algebra algorithms may be added in addition to standard support for dense and sparse algorithms. The user may furthermore attach explicit data interconversions between the main computational steps, or choose compromise distributions. A “problem formulator” (simulation layer) must be constructed above Concurrent DASSL, for any specific problem domain. We indicate performance for a particular chemical engineering application, a sequence of coupled distillation columns. Future efforts are cited in conclusion.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128019193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556275
P. Raja, S. Ganesan
This paper discusses the design and implementation of a multiprocessor for scientific calculations using snooping caches. Architecture of the orthogonal multiprocessor, cache coherency problem, cache coherency protocol, performance analysis, response time equations, cache memory architecture, etc., are discussed.
{"title":"An Orthogonal Multiprocessor With Snooping Caches","authors":"P. Raja, S. Ganesan","doi":"10.1109/DMCC.1990.556275","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556275","url":null,"abstract":"This paper discusses the design and implementation of a multiprocessor for scientific calculations using snooping caches. Architecture of the orthogonal multiprocessor, cache coherency problem, cache coherency protocol, performance analysis, response time equations, cache memory architecture, etc., are discussed.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"82 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128165177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556314
S. Mirchandaney, J. Saltz, P. Mehrotra, H. Berryman
Abstract : A data migration mechanism is proposed that allows an explicit and controlled mapping of data to memory. While read or write or write copies of each data element can be assigned to any processor's memory, longer term storage of each data element is assigned to a specific location in the memory of a particular processor. Data is presented that suggests that the scheme may be a practical method for efficiently supporting data migration. Keywords: Distributed machines, Data migration, Cacheing.
{"title":"A Scheme for Supporting Automatic Data Migration on Multlcomputers","authors":"S. Mirchandaney, J. Saltz, P. Mehrotra, H. Berryman","doi":"10.1109/DMCC.1990.556314","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556314","url":null,"abstract":"Abstract : A data migration mechanism is proposed that allows an explicit and controlled mapping of data to memory. While read or write or write copies of each data element can be assigned to any processor's memory, longer term storage of each data element is assigned to a specific location in the memory of a particular processor. Data is presented that suggests that the scheme may be a practical method for efficiently supporting data migration. Keywords: Distributed machines, Data migration, Cacheing.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130194636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555425
C. Baldwin, S. Durham, J. D. Lynch, W. J. Padgett
This large scale application combines several areas of research to develop computational models for simulating the failure mechanisms of composite materials consisting of brittle fibers (such as carbon) embedded in a matrix material (such as epoxy resin). The simulations combine the ideas of structural stress analysis, numerical linear algebra, and visualization techniques to model the behavior of fibrous composites under uniaxial tensile load. This will allow laboratory experiments to be extrapolated more accurately to real applications, providing an enhanced capability to optimize designs of large structures made of composite materials with less extensive and costly experimental programs. Further, system performance and reliability may be improved substantially. In this paper a brief discussion of the theory of composite materials as it relates to the simulations will first be given. Next the procedures used to generate and analyze the structure will be presented. The computational techniques used to perform the simulation will be given as well as results from selected test cases. A summary of results and future directions in this research will be given at the end of the paper.
{"title":"A Hypercube Application in Large Scale Composite Materials Modeling","authors":"C. Baldwin, S. Durham, J. D. Lynch, W. J. Padgett","doi":"10.1109/DMCC.1990.555425","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555425","url":null,"abstract":"This large scale application combines several areas of research to develop computational models for simulating the failure mechanisms of composite materials consisting of brittle fibers (such as carbon) embedded in a matrix material (such as epoxy resin). The simulations combine the ideas of structural stress analysis, numerical linear algebra, and visualization techniques to model the behavior of fibrous composites under uniaxial tensile load. This will allow laboratory experiments to be extrapolated more accurately to real applications, providing an enhanced capability to optimize designs of large structures made of composite materials with less extensive and costly experimental programs. Further, system performance and reliability may be improved substantially. In this paper a brief discussion of the theory of composite materials as it relates to the simulations will first be given. Next the procedures used to generate and analyze the structure will be presented. The computational techniques used to perform the simulation will be given as well as results from selected test cases. A summary of results and future directions in this research will be given at the end of the paper.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129395267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555394
Ron Daniel
This paper describes the implementation of a parallel Levenberg-Marquardt algorithm on an iPSC/2. The Levenberg-Marquardt algorithm is a standard technique for non-linear least-squares optimization. For a problem with D data points and P parameters to be estimated, each iteration requires that the objective function and its P partials be evaluated at all D data points, using the current parameter estimates. Each iteration also requires the solution of a PxP linear system to obtain the next set of parameter estimates. A simple data-parallel decomposition is used where the data is evenly distributed across the nodes to parallelize the evaluations of the objective function and its partial derivatives. The performance of the method is characterized versus the number of nodes, the number of data points, and the number of parameters in the objective function. Further enhancements are also discussed.
{"title":"Parallel Nonlinear Optimization","authors":"Ron Daniel","doi":"10.1109/DMCC.1990.555394","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555394","url":null,"abstract":"This paper describes the implementation of a parallel Levenberg-Marquardt algorithm on an iPSC/2. The Levenberg-Marquardt algorithm is a standard technique for non-linear least-squares optimization. For a problem with D data points and P parameters to be estimated, each iteration requires that the objective function and its P partials be evaluated at all D data points, using the current parameter estimates. Each iteration also requires the solution of a PxP linear system to obtain the next set of parameter estimates. A simple data-parallel decomposition is used where the data is evenly distributed across the nodes to parallelize the evaluations of the objective function and its partial derivatives. The performance of the method is characterized versus the number of nodes, the number of data points, and the number of parameters in the objective function. Further enhancements are also discussed.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132911550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555391
E. F. Van de Velde, J. Lorenz
Continuation methods compute paths of solutions of nonlinear equations that depend on a parameter. This paper examines some aspects of the multicomputer implementation of such methods. The computation is done on the Symult Series 2010 multicomputer. One of the main issues in the development of concurrent programs is load balancing, achieved here by using appropriate data distributions. In the continuation process, a large number of linear systems have to be solved. For nearby points along the solution path, the corresponding system matrices are closely related to each other. Therefore, pivots which are good for the LU-decomposition of one matrix are likely to be acceptable for a whole segment of the solution path. This suggests to choose certain data distributions that achieve good load balancing. In addition, if these distributions are used, the resulting code is easily vectorized. To test this technique, the invariant manifold of a system of two identical nonlinear oscillators is computed as a function of the coupling between them. This invariant manifold is determined by the solution of a system of nonlinear partial differential equations that depends on the coupling parameter. A symmetry in the problem reduces this system to one single equation, which is discretized by finite differences. The solution of this discrete nonlinear system is followed as the coupling parameter is changed.
延拓法计算依赖于参数的非线性方程的解的路径。本文探讨了这些方法的多机实现的一些方面。计算是在Symult Series 2010多台计算机上完成的。并发程序开发中的一个主要问题是负载平衡,这是通过使用适当的数据分布来实现的。在延拓过程中,需要求解大量的线性系统。对于解路径附近的点,对应的系统矩阵彼此密切相关。因此,对一个矩阵的lu分解有利的支点可能对整个解路径段都是可接受的。这建议选择某些能够实现良好负载平衡的数据分布。此外,如果使用这些分布,生成的代码很容易向量化。为了验证这一技术,我们计算了两个相同非线性振子组成的系统的不变流形作为它们之间耦合的函数。该不变流形由依赖于耦合参数的非线性偏微分方程组的解决定。该问题的对称性将该系统简化为一个单一的方程,该方程被有限差分离散。随着耦合参数的变化,得到了该离散非线性系统的解。
{"title":"Applications of Adaptive Data Distributions","authors":"E. F. Van de Velde, J. Lorenz","doi":"10.1109/DMCC.1990.555391","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555391","url":null,"abstract":"Continuation methods compute paths of solutions of nonlinear equations that depend on a parameter. This paper examines some aspects of the multicomputer implementation of such methods. The computation is done on the Symult Series 2010 multicomputer. One of the main issues in the development of concurrent programs is load balancing, achieved here by using appropriate data distributions. In the continuation process, a large number of linear systems have to be solved. For nearby points along the solution path, the corresponding system matrices are closely related to each other. Therefore, pivots which are good for the LU-decomposition of one matrix are likely to be acceptable for a whole segment of the solution path. This suggests to choose certain data distributions that achieve good load balancing. In addition, if these distributions are used, the resulting code is easily vectorized. To test this technique, the invariant manifold of a system of two identical nonlinear oscillators is computed as a function of the coupling between them. This invariant manifold is determined by the solution of a system of nonlinear partial differential equations that depends on the coupling parameter. A symmetry in the problem reduces this system to one single equation, which is discretized by finite differences. The solution of this discrete nonlinear system is followed as the coupling parameter is changed.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132948075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555433
C. Christara
We study the parallel computation of linear second order elliptic Partial Differential Equation (PDE) problems in rectangular domains. We discuss the application of Conjugate Gradient (CG) and Preconditioned Conjugate Gradient (PCG) methods to the linear system arising from the discretisation of such problems using quadratic splines and the collocation discretisation methodology. Our experiments show that the number of iterations required for convergence of CG-QSC (Conjugate Gradient applied to Quadratic Spline Collocation equations) grows linearly with the square root of the number of equations. We implemented the CG and PCG methods for the solution of the Quadratic Spline Collocation (QSC) equations on the iPSC/2 hypercube and present performance evaluation results for up to 32 processors configurations. Our experiments show efficiencies of the order of 90%, for both the fixed and scaled speedups.
{"title":"Conjugate Gradient Methods for Spline Collocation Equations","authors":"C. Christara","doi":"10.1109/DMCC.1990.555433","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555433","url":null,"abstract":"We study the parallel computation of linear second order elliptic Partial Differential Equation (PDE) problems in rectangular domains. We discuss the application of Conjugate Gradient (CG) and Preconditioned Conjugate Gradient (PCG) methods to the linear system arising from the discretisation of such problems using quadratic splines and the collocation discretisation methodology. Our experiments show that the number of iterations required for convergence of CG-QSC (Conjugate Gradient applied to Quadratic Spline Collocation equations) grows linearly with the square root of the number of equations. We implemented the CG and PCG methods for the solution of the Quadratic Spline Collocation (QSC) equations on the iPSC/2 hypercube and present performance evaluation results for up to 32 processors configurations. Our experiments show efficiencies of the order of 90%, for both the fixed and scaled speedups.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133763993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556390
C. G. Rommel
{"title":"Synchronized Blocking in a Distributed Memory System","authors":"C. G. Rommel","doi":"10.1109/DMCC.1990.556390","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556390","url":null,"abstract":"","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115432524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556273
S. Ganesan, P. Raja
In this paper, we have discussed a parallel processor design using DSP microprocessors and dual-port RAMs(DPRs) for image processing applications and scientific computations. This parallel processor uses eight TMS320C25 Digital Signal Processors (DSPs) and dual-port RAMs. Application of matrix multiplication algorithms and Image processing algorithms to this architecture are discussed.
{"title":"An SIMD Multiprocessor Using DSP Microprocessors","authors":"S. Ganesan, P. Raja","doi":"10.1109/DMCC.1990.556273","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556273","url":null,"abstract":"In this paper, we have discussed a parallel processor design using DSP microprocessors and dual-port RAMs(DPRs) for image processing applications and scientific computations. This parallel processor uses eight TMS320C25 Digital Signal Processors (DSPs) and dual-port RAMs. Application of matrix multiplication algorithms and Image processing algorithms to this architecture are discussed.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117075577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556408
C. Li, W. Fuchs
A data redistribution approach to graceful degradation is described in this paper for hypercube multiprocessors. CPU-bound hypercube programs using the described second-order parametrized data distribution technique can run on a group of cubes of any size to achieve graceful degradation without recompila tion. A transmission mechanism has been designed to switch the performance of a second-order parametrized data distribution hypercube program to that of a corresponding first-order program when the latter is superior. A package of procedures has been implemented on the Intel iPSC/2 hypercube to support the approach.
{"title":"Graceful Degradation on Hypercube Multiprocessors Using Data Redistribution","authors":"C. Li, W. Fuchs","doi":"10.1109/DMCC.1990.556408","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556408","url":null,"abstract":"A data redistribution approach to graceful degradation is described in this paper for hypercube multiprocessors. CPU-bound hypercube programs using the described second-order parametrized data distribution technique can run on a group of cubes of any size to achieve graceful degradation without recompila tion. A transmission mechanism has been designed to switch the performance of a second-order parametrized data distribution hypercube program to that of a corresponding first-order program when the latter is superior. A package of procedures has been implemented on the Intel iPSC/2 hypercube to support the approach.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116036329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}