A parallel algorithm for solving the algebraic Riccati equation is described and its performance on an Intel iPSC/d5 is reported. Three variations of the matrix sign function algorithm are compared. The best one showed efficiencies of about 60 percent on large problems.
{"title":"Solving the algebraic Riccati equation on a hypercube multiprocessor","authors":"J. Gardiner, A. Laub","doi":"10.1145/63047.63116","DOIUrl":"https://doi.org/10.1145/63047.63116","url":null,"abstract":"A parallel algorithm for solving the algebraic Riccati equation is described and its performance on an Intel iPSC/d5 is reported. Three variations of the matrix sign function algorithm are compared. The best one showed efficiencies of about 60 percent on large problems.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115906475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A technique for simulating the motion of granular materials using the Caltech Hypercube is described. We demonstrate that grain dynamics simulations run efficiently on the Hypercube and therefore that they offer an opportunity for greatly expanding the use of parallel simulations in studying granular materials. Several examples, which illustrate how the simulations can be used to extract information concerning the behavior of granular materials, are discussed.
{"title":"Dynamical simulations of granular materials using the Caltech hypercube","authors":"B. Werner, P. Haff","doi":"10.1145/63047.63085","DOIUrl":"https://doi.org/10.1145/63047.63085","url":null,"abstract":"A technique for simulating the motion of granular materials using the Caltech Hypercube is described. We demonstrate that grain dynamics simulations run efficiently on the Hypercube and therefore that they offer an opportunity for greatly expanding the use of parallel simulations in studying granular materials. Several examples, which illustrate how the simulations can be used to extract information concerning the behavior of granular materials, are discussed.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115919672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An Intel Hypercube implementation of a new stationary iterative method developed by one of us (JdP) is presented. This algorithm finds the solution vector x for the invertible n × n linear system Ax = (I - B)x = f where A has real spectrum. The solution method converges quickly because the Jacobi iteration matrix B is replaced by an equivalent iteration matrix with a smaller spectral radius. The parallel algorithm partitions A row-wise among all the processors in order to keep memory load to a minimum and to avoid duplicate computations. With the introduction of vector hardware to the Hypercube, more modifications have been made to the implementation algorithm in order to exploit that hardware and reduce run-time even further. Example problems and timings will be presented.
本文提出了一种新的平稳迭代方法(JdP)的Intel Hypercube实现。该算法求出可逆n × n线性系统Ax = (I - B)x = f的解向量x,其中A具有实数谱。由于将Jacobi迭代矩阵B替换为谱半径更小的等价迭代矩阵,求解方法收敛速度快。并行算法在所有处理器之间按行划分A,以便将内存负载保持在最小并避免重复计算。随着向Hypercube引入矢量硬件,对实现算法进行了更多修改,以便利用该硬件并进一步减少运行时间。将介绍示例问题和时间安排。
{"title":"An iterative solution to speical linear systems on a vector hypercube","authors":"L. G. Pillis, J. Petersen, J. Pillis","doi":"10.1145/63047.63130","DOIUrl":"https://doi.org/10.1145/63047.63130","url":null,"abstract":"An Intel Hypercube implementation of a new stationary iterative method developed by one of us (JdP) is presented. This algorithm finds the solution vector <italic>x</italic> for the invertible <italic>n</italic> × <italic>n</italic> linear system <italic>Ax</italic> = (<italic>I - B</italic>)<italic>x</italic> = <italic>f</italic> where <italic>A</italic> has real spectrum. The solution method converges quickly because the Jacobi iteration matrix <italic>B</italic> is replaced by an equivalent iteration matrix with a smaller spectral radius. The parallel algorithm partitions <italic>A</italic> row-wise among all the processors in order to keep memory load to a minimum and to avoid duplicate computations. With the introduction of vector hardware to the Hypercube, more modifications have been made to the implementation algorithm in order to exploit that hardware and reduce run-time even further. Example problems and timings will be presented.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117043142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DIME (Distributed Irregular Mesh Environment) is a user environment written in C for manipulation of an unstructured triangular mesh in two dimensions. The mesh is distributed among the separate memories of the processors, and communication between processors is handled by DIME; thus the user writes C-code referring to the elements and nodes of the mesh and need not be unduly concerned with the parallelism. A tool is provided for the user to make an initial coarse triangulation of a region, which may then be adaptively refined and load-balanced. DIME provides many graphics facilities for examining the mesh, including contouring and a Postscript hard-copy interface. DIME also runs on sequential machines.
{"title":"DIME: a programming environment for unstructured triangular meshes on a distributed-memory parallel processor","authors":"R. D. Williams","doi":"10.1145/63047.63136","DOIUrl":"https://doi.org/10.1145/63047.63136","url":null,"abstract":"DIME (Distributed Irregular Mesh Environment) is a user environment written in C for manipulation of an unstructured triangular mesh in two dimensions. The mesh is distributed among the separate memories of the processors, and communication between processors is handled by DIME; thus the user writes C-code referring to the elements and nodes of the mesh and need not be unduly concerned with the parallelism. A tool is provided for the user to make an initial coarse triangulation of a region, which may then be adaptively refined and load-balanced. DIME provides many graphics facilities for examining the mesh, including contouring and a Postscript hard-copy interface. DIME also runs on sequential machines.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114932286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Some interesting physical properties of solid 3He are described and the physical observable calculated using Monte Carlo path integral techniques is defined. The relationship between the path integral and the observable is outlined. The parallel algorithm is explained and finally, timing results are presented for runs of the identical code on one parallel computer and two sequential computers: the NCUBE hypercube, the Cray XMP, the Elxsi 6400.
{"title":"Non-local path integral Monte Carlo on the hypercube","authors":"D. Callahan","doi":"10.1145/63047.63083","DOIUrl":"https://doi.org/10.1145/63047.63083","url":null,"abstract":"Some interesting physical properties of solid 3He are described and the physical observable calculated using Monte Carlo path integral techniques is defined. The relationship between the path integral and the observable is outlined. The parallel algorithm is explained and finally, timing results are presented for runs of the identical code on one parallel computer and two sequential computers: the NCUBE hypercube, the Cray XMP, the Elxsi 6400.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127017387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Expert systems are being used to govern the intelligent control of the Robotic Air Vehicle (RAV) which is currently a research project at the Air Force Avionics Laboratory. Due to the nature of the RAV system the associated expert system needs to perform in a demanding real-time environment. The use of a parallel processing capability to support the associated computational requirement may be critical in this application. Thus, parallel search algorithms for real-time expert systems are designed, analyzed and synthesized on the Texas Instruments (TI) Explorer and Intel Hypercube. Examined is the process involved with transporting the RAV expert systems from the TI Explorer, where they are implemented in the Automated Reasoning Tool (ART), to the iPSC Hypercube, where the system is synthesized using Concurrent Common LISP. The performance characteristics of the parallel implementation of these expert systems on the iPSC Hypercube are compared to the TI Explorer implementation.
专家系统被用于管理机器人飞行器(RAV)的智能控制,这是目前空军航空电子实验室的一个研究项目。由于RAV系统的性质,相关的专家系统需要在苛刻的实时环境中执行。在此应用程序中,使用并行处理能力来支持相关的计算需求可能非常关键。为此,在德州仪器(TI) Explorer和英特尔Hypercube平台上设计、分析和合成了实时专家系统的并行搜索算法。研究了将RAV专家系统从TI Explorer(在自动推理工具(ART)中实现)传输到iPSC Hypercube(在iPSC Hypercube中使用Concurrent Common LISP进行系统合成)的过程。将这些专家系统在iPSC Hypercube上并行实现的性能特征与TI Explorer实现进行了比较。
{"title":"Parallel expert system search techniques for a real-time application","authors":"G. Lamont, D. Shakley","doi":"10.1145/63047.63090","DOIUrl":"https://doi.org/10.1145/63047.63090","url":null,"abstract":"Expert systems are being used to govern the intelligent control of the Robotic Air Vehicle (RAV) which is currently a research project at the Air Force Avionics Laboratory. Due to the nature of the RAV system the associated expert system needs to perform in a demanding real-time environment. The use of a parallel processing capability to support the associated computational requirement may be critical in this application. Thus, parallel search algorithms for real-time expert systems are designed, analyzed and synthesized on the Texas Instruments (TI) Explorer and Intel Hypercube. Examined is the process involved with transporting the RAV expert systems from the TI Explorer, where they are implemented in the Automated Reasoning Tool (ART), to the iPSC Hypercube, where the system is synthesized using Concurrent Common LISP. The performance characteristics of the parallel implementation of these expert systems on the iPSC Hypercube are compared to the TI Explorer implementation.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125838806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parallel algorithms for programming low-level vision mechanisms on the JPL-Caltech hypercube are reported. These concern principally edge and region finding. 256x256 8bit images were used. We discuss the problem of programming a hypercube computer, and the Caltech approach to load balancing. We then discuss the distribution of images over the hypercube and the I/O problem for images. In edge finding, we programmed convolution using a separable kernel computational approach. This was tested with 5x5 and 32x32 masks. In region finding, we developed two different parallel histogram techniques. The first finds a global histogram for the image by a completely parallel technique. This method, which was developed from the Fox-Furmanski scalar product method, allows each histogram bucket to be computed by a separate processor, each processor regarding the hypercube as a different tree, and all buckets being computed in parallel by a complete interleaving of all communications required. Similarly the global histogram can then be distributed over the hypercube, so that all processors have the entire global histogram, by an completely parallel technique. The second histogramming method finds a spatially local histogram within each processor and then connects locally found regions together. Work in progress includes the application of a Hopfield neural net approach to region finding.
{"title":"Parallel vision techniques on the hypercube computer","authors":"A. H. Bond, D. Fashena","doi":"10.1145/63047.63054","DOIUrl":"https://doi.org/10.1145/63047.63054","url":null,"abstract":"Parallel algorithms for programming low-level vision mechanisms on the JPL-Caltech hypercube are reported. These concern principally edge and region finding. 256x256 8bit images were used.\u0000We discuss the problem of programming a hypercube computer, and the Caltech approach to load balancing. We then discuss the distribution of images over the hypercube and the I/O problem for images.\u0000In edge finding, we programmed convolution using a separable kernel computational approach. This was tested with 5x5 and 32x32 masks.\u0000In region finding, we developed two different parallel histogram techniques. The first finds a global histogram for the image by a completely parallel technique. This method, which was developed from the Fox-Furmanski scalar product method, allows each histogram bucket to be computed by a separate processor, each processor regarding the hypercube as a different tree, and all buckets being computed in parallel by a complete interleaving of all communications required. Similarly the global histogram can then be distributed over the hypercube, so that all processors have the entire global histogram, by an completely parallel technique.\u0000The second histogramming method finds a spatially local histogram within each processor and then connects locally found regions together.\u0000Work in progress includes the application of a Hopfield neural net approach to region finding.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124946261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We briefly review some key scientific and parallel processing issues in a selection of some 84 existing applications of parallel machines. We include the MIMD hypercube transputer array, BBN Butterfly, and the SIMD ICL DAP, Goodyear MPP and Connection Machine from Thinking Machines. We use a space-time analogy to classify problems and show how a division into synchronous, loosely synchronous and asynchronous problems is helpful. This classifies problems into those suitable for SIMD or MIMD machines and isolates the asynchronous class as that for which major uncertainties as to possible parallelism exist. Interestingly about half of the scientific applications run excellently on SIMD machines with the other half able to take especial advantage of the MIMD architecture.
{"title":"What have we learnt from using real parallel machines to solve real problems?","authors":"Geoffrey C. Fox","doi":"10.1145/63047.63048","DOIUrl":"https://doi.org/10.1145/63047.63048","url":null,"abstract":"We briefly review some key scientific and parallel processing issues in a selection of some 84 existing applications of parallel machines. We include the MIMD hypercube transputer array, BBN Butterfly, and the SIMD ICL DAP, Goodyear MPP and Connection Machine from Thinking Machines. We use a space-time analogy to classify problems and show how a division into synchronous, loosely synchronous and asynchronous problems is helpful. This classifies problems into those suitable for SIMD or MIMD machines and isolates the asynchronous class as that for which major uncertainties as to possible parallelism exist. Interestingly about half of the scientific applications run excellently on SIMD machines with the other half able to take especial advantage of the MIMD architecture.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121895808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Flat Concurrent Prolog is a simple concurrent programming language which has been used for a variety of non-trivial applications. A compiler based parallel implementation has been completed which operates on an Intel Hypercube. This paper presents a brief summary of performance data from a recent study of the implementation. Three categories of program were studied: parallel applications, uniprocessor benchmarks and communication stereotypes. The latter programs are abstractions of common parallel programming techniques and serve to quantify the cost of communication in the language.
{"title":"FCP: a summary of performance results","authors":"Stephen Taylor, R. Shapiro, E. Shapiro","doi":"10.1145/63047.63092","DOIUrl":"https://doi.org/10.1145/63047.63092","url":null,"abstract":"Flat Concurrent Prolog is a simple concurrent programming language which has been used for a variety of non-trivial applications. A compiler based parallel implementation has been completed which operates on an Intel Hypercube. This paper presents a brief summary of performance data from a recent study of the implementation. Three categories of program were studied: parallel applications, uniprocessor benchmarks and communication stereotypes. The latter programs are abstractions of common parallel programming techniques and serve to quantify the cost of communication in the language.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128232742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Least squares modifications associated with the addition or deletion of data often involve updating or downdating the Cholesky factor of the observation matrix. We describe and compare parallel implementations for the hypercube of three methods for down-dating the Cholesky factor: an orthogonal scheme, a hyperbolic scheme, and a hybrid scheme combining the first two. The computational complexities of these algorithms differ significantly, but the parallel implementations of all three have communication complexity similar to solving triangular systems. In computational tests on an Intel iPSC hypercube, the algorithms performed similarly, suggesting a preference for the orthogonal method based on stability considerations. The methods we describe can be adapted to the parallel computation of general orthogonal factorizations, but our discussion is motivated by applications in signal processing using windowed recursive least squares filtering for near real-time solutions.
{"title":"Cholesky downdating on a hypercube","authors":"C. S. Henkel, M. Heath, R. Plemmons","doi":"10.1145/63047.63120","DOIUrl":"https://doi.org/10.1145/63047.63120","url":null,"abstract":"Least squares modifications associated with the addition or deletion of data often involve updating or downdating the Cholesky factor of the observation matrix. We describe and compare parallel implementations for the hypercube of three methods for down-dating the Cholesky factor: an orthogonal scheme, a hyperbolic scheme, and a hybrid scheme combining the first two. The computational complexities of these algorithms differ significantly, but the parallel implementations of all three have communication complexity similar to solving triangular systems. In computational tests on an Intel iPSC hypercube, the algorithms performed similarly, suggesting a preference for the orthogonal method based on stability considerations. The methods we describe can be adapted to the parallel computation of general orthogonal factorizations, but our discussion is motivated by applications in signal processing using windowed recursive least squares filtering for near real-time solutions.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128395851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}