An efficient recursive task allocation scheme, based on the Kernighan-Lin mincut bisection heuristic, is proposed for the effective mapping of tasks of a parallel program onto a hypercube parallel computer. It is evaluated by comparison with an adaptive, scaled simulated annealing method. The recursive allocation scheme is shown to be effective on a number of large test task graphs - its solution quality is nearly as good as that produced by simulated annealing, and its computation time is several orders of magnitude less.
{"title":"Task allocation onto a hypercube by recursive mincut bipartitioning","authors":"F. Erçal, J. Ramanujam, P. Sadayappan","doi":"10.1145/62297.62323","DOIUrl":"https://doi.org/10.1145/62297.62323","url":null,"abstract":"An efficient recursive task allocation scheme, based on the Kernighan-Lin mincut bisection heuristic, is proposed for the effective mapping of tasks of a parallel program onto a hypercube parallel computer. It is evaluated by comparison with an adaptive, scaled simulated annealing method. The recursive allocation scheme is shown to be effective on a number of large test task graphs - its solution quality is nearly as good as that produced by simulated annealing, and its computation time is several orders of magnitude less.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114223803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a portable programming environment for MIMD concurrent processors based on an object-oriented, message-passing paradigm. The basis of this environment is the Virtual Machine Loosely Synchronous Communication System (VMLSCS) which is designed to be used for loosely synchronous problems. VMLSCS is structured to make efficient use of hierarchical memory, and permits communication and calculation to be overlapped on certain concurrent processors. As an example, the use of VMLSCS in performing both one-dimensional and multi-dimensional fast Fourier transforms (FFTs) on concurrent multiprocessors is described. It is shown that all necessary interprocessor communication can be performed by a single routine, vm_index. Thus the construction of a portable concurrent FFT rests on the implementation of vm_index on the target machines. In the multi-dimensional algorithm a strip decomposition is applied to each of the directions in turn so that each of the FFTs performed in a particular direction are done in one processor. This allows fast sequential one-dimensional FFTs to be exploited. The implementation of vm_index on both homogeneous and inhomogeneous hypercubes, and shared memory multiprocessors is discussed.
{"title":"Portable programming within a message-passing model: the FFT as an example","authors":"D. Walker","doi":"10.1145/63047.63100","DOIUrl":"https://doi.org/10.1145/63047.63100","url":null,"abstract":"This paper describes a portable programming environment for MIMD concurrent processors based on an object-oriented, message-passing paradigm. The basis of this environment is the Virtual Machine Loosely Synchronous Communication System (VMLSCS) which is designed to be used for loosely synchronous problems. VMLSCS is structured to make efficient use of hierarchical memory, and permits communication and calculation to be overlapped on certain concurrent processors. As an example, the use of VMLSCS in performing both one-dimensional and multi-dimensional fast Fourier transforms (FFTs) on concurrent multiprocessors is described. It is shown that all necessary interprocessor communication can be performed by a single routine, vm_index. Thus the construction of a portable concurrent FFT rests on the implementation of vm_index on the target machines. In the multi-dimensional algorithm a strip decomposition is applied to each of the directions in turn so that each of the FFTs performed in a particular direction are done in one processor. This allows fast sequential one-dimensional FFTs to be exploited. The implementation of vm_index on both homogeneous and inhomogeneous hypercubes, and shared memory multiprocessors is discussed.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117340734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collective communication routines for the hypercube. We show how a systematic application of scattering reduces load imbalance. A number of examples are considered (Gaussian elimination, Gauss-Jordan matrix inversion, the power method for eigenvectors, and tridiagonalisation by Householder's method), and the concurrent efficiencies are discussed.
{"title":"Optimal matrix algorithms on homogeneous hypercubes","authors":"G. Fox, W. Furmanski, D. Walker","doi":"10.1145/63047.63125","DOIUrl":"https://doi.org/10.1145/63047.63125","url":null,"abstract":"This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collective communication routines for the hypercube. We show how a systematic application of scattering reduces load imbalance. A number of examples are considered (Gaussian elimination, Gauss-Jordan matrix inversion, the power method for eigenvectors, and tridiagonalisation by Householder's method), and the concurrent efficiencies are discussed.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123320469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seismic data processing is a time-consuming operation. Two main reasons for this are the large amounts of data, which have to be put in and taken out of the computer regularly, and the need for intervention by an expert at many intermediate stages. A subsidiary cause is the large amount of computation. The recent appearance of multiprocessor computers has created the opportunity to provide an interactive system to ease and speed-up the seismic data processing cycle. This paper describes the development of an initial interactive system for the velocity analysis and NMO/stacking stage of seismic processing. Computational power and storage are supplied by two 32-node Intel iPSC/1 Hypercubes, both with 16 memory nodes and 16 vector nodes. Each hypercube has 96 Mbytes of memory and a top processing speed of over 100 Mflops (32 bit). A SUN-3 Workstation is used to display intermediate results and to enable the expert to direct the processing more efficiently.
{"title":"An interactive system for seismic velocity analysis","authors":"C. Addison, J. M. Cook, L. R. Hagen","doi":"10.1145/63047.63067","DOIUrl":"https://doi.org/10.1145/63047.63067","url":null,"abstract":"Seismic data processing is a time-consuming operation. Two main reasons for this are the large amounts of data, which have to be put in and taken out of the computer regularly, and the need for intervention by an expert at many intermediate stages. A subsidiary cause is the large amount of computation. The recent appearance of multiprocessor computers has created the opportunity to provide an interactive system to ease and speed-up the seismic data processing cycle.\u0000This paper describes the development of an initial interactive system for the velocity analysis and NMO/stacking stage of seismic processing. Computational power and storage are supplied by two 32-node Intel iPSC/1 Hypercubes, both with 16 memory nodes and 16 vector nodes. Each hypercube has 96 Mbytes of memory and a top processing speed of over 100 Mflops (32 bit). A SUN-3 Workstation is used to display intermediate results and to enable the expert to direct the processing more efficiently.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114274633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A new numerical approach by Furmanski and Kolawa to quantum chromodynamics is based on diagonalizing the underlying Hamiltonian. This method involves the generation of states by repeated action of a potential operator. This symbolic calculation is dominated by the time it takes to search the database of existing states to verify if a generated state is identical to one previously found. We implement this algorithm on the Caltech/JPL Mark II hypercube and analyze its performance of both a simple database search and one optimized for this application. We show that the hypercube performance can be modelled in a fashion similar to conventional numerical (loosely synchronous) applications.
Furmanski和Kolawa提出的量子色动力学的一种新的数值方法是基于对角化底层哈密顿量。这种方法涉及到通过一个潜在算子的重复动作来产生状态。这种符号计算主要取决于搜索现有状态数据库以验证生成的状态是否与先前发现的状态相同所花费的时间。我们在Caltech/JPL Mark II超立方体上实现了该算法,并分析了其简单数据库搜索和针对该应用程序优化的数据库搜索的性能。我们展示了超立方体性能可以以类似于传统数值(松散同步)应用程序的方式建模。
{"title":"Use of the hypercube for symbolic quantum chromodynamics","authors":"A. Kolawa, G. Fox","doi":"10.1145/63047.63097","DOIUrl":"https://doi.org/10.1145/63047.63097","url":null,"abstract":"A new numerical approach by Furmanski and Kolawa to quantum chromodynamics is based on diagonalizing the underlying Hamiltonian. This method involves the generation of states by repeated action of a potential operator. This symbolic calculation is dominated by the time it takes to search the database of existing states to verify if a generated state is identical to one previously found. We implement this algorithm on the Caltech/JPL Mark II hypercube and analyze its performance of both a simple database search and one optimized for this application. We show that the hypercube performance can be modelled in a fashion similar to conventional numerical (loosely synchronous) applications.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129995190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Various image processing algorithms have been implemented on the hypercube architecture and many success stories have been reported. However, the traditional approach to programming the hypercube has been to write programs which perform ;I single operation or a fixed set of operations upon data items. This approach has several drawbacks when considered for use in an interactive computing environment. First, it is difficult to process data with a sequence of sim;ple programs in the Mark III Hypercube because the Mark III software does not support sharing of data between successive programs. This means that data must be reloaded into the cube for each individual program. It also implies that programs should be fairly large and complete, to minimize the repeated downloading of large data items for multiple programs. However, the entire program must be able to fit within the hypercube node memory, which limits what a program can do by putting a restriction on its size. Furtbermore, large programs limit the amount of memory available for data, which must also be present in memory if the communications overhead is to be effectively reduced. The development of an interactive image processing workstation based on the: Mark III Hypercube requires satisfactory solutions to these and other problems.
在超立方体架构上实现了各种图像处理算法,并报道了许多成功的案例。然而,对超立方体进行编程的传统方法是编写对数据项执行单一操作或固定操作集的程序。当考虑在交互式计算环境中使用时,这种方法有几个缺点。首先,Mark III Hypercube中的一系列简单程序很难处理数据,因为Mark III软件不支持连续程序之间的数据共享。这意味着必须为每个单独的程序将数据重新加载到数据集中。它还意味着程序应该相当大且完整,以尽量减少为多个程序重复下载大数据项。但是,整个程序必须能够容纳在超立方体节点内存中,这通过对其大小施加限制来限制程序所能做的事情。此外,大型程序限制了数据可用的内存量,如果要有效地减少通信开销,这些数据也必须存在于内存中。基于Mark III Hypercube的交互式图像处理工作站的开发需要对这些问题和其他问题进行满意的解决。
{"title":"Design and implementation of a concurrent image processing workstation based on the Mark III hypercube","authors":"S. Groom, M. Lee, A. Mazer, W. Williams","doi":"10.1145/63047.63086","DOIUrl":"https://doi.org/10.1145/63047.63086","url":null,"abstract":"Various image processing algorithms have been implemented on the hypercube architecture and many success stories have been reported. However, the traditional approach to programming the hypercube has been to write programs which perform ;I single operation or a fixed set of operations upon data items. This approach has several drawbacks when considered for use in an interactive computing environment. First, it is difficult to process data with a sequence of sim;ple programs in the Mark III Hypercube because the Mark III software does not support sharing of data between successive programs. This means that data must be reloaded into the cube for each individual program. It also implies that programs should be fairly large and complete, to minimize the repeated downloading of large data items for multiple programs. However, the entire program must be able to fit within the hypercube node memory, which limits what a program can do by putting a restriction on its size. Furtbermore, large programs limit the amount of memory available for data, which must also be present in memory if the communications overhead is to be effectively reduced. The development of an interactive image processing workstation based on the: Mark III Hypercube requires satisfactory solutions to these and other problems.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"44 9-10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132286906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large, sparse, linear systems of equations arise frequently when constructing mathematical models of natural phenomena. Most often, these linear systems are fully constrained and can be solved via direct or iterative techniques. However, one important problem class requires solutions to underconstrained linear systems that maximize some objective function. These linear optimization problems are natural formulations of many business plans and often contain hundreds of equations with thousands of variables. Historically, linear optimization problems have been solved via the simplex method. Despite the excellent performance of the simplex method, the size of the optimization problems and the frequency of their solution make linear optimization a computationally taxing endeavor. This paper examines the performance of parallel variants of the simplex algorithm on the Intel iPSC, a message-based parallel system. Linear optimization test data are drawn from commercial sources and represent realistic problems. Analysis shows that the speedup obtained is sensitive to both the structure of the underlying data and the data partitioning.
{"title":"Hypercube implementation of the simplex algorithm","authors":"C. Stunkel, D. Reed","doi":"10.1145/63047.63104","DOIUrl":"https://doi.org/10.1145/63047.63104","url":null,"abstract":"Large, sparse, linear systems of equations arise frequently when constructing mathematical models of natural phenomena. Most often, these linear systems are fully constrained and can be solved via direct or iterative techniques. However, one important problem class requires solutions to underconstrained linear systems that maximize some objective function. These linear optimization problems are natural formulations of many business plans and often contain hundreds of equations with thousands of variables. Historically, linear optimization problems have been solved via the simplex method. Despite the excellent performance of the simplex method, the size of the optimization problems and the frequency of their solution make linear optimization a computationally taxing endeavor. This paper examines the performance of parallel variants of the simplex algorithm on the Intel iPSC, a message-based parallel system. Linear optimization test data are drawn from commercial sources and represent realistic problems. Analysis shows that the speedup obtained is sensitive to both the structure of the underlying data and the data partitioning.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115405866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Numerical solutions to thermal convection flow problems are vital to many scientific and engineering problems. One fundamental geophysical problem is the thermal convection responsible for continental drift and sea floor spreading. The earth's interior undergoes slow creeping flow (~cm/yr) in response to the buoyancy forces generated by temperature variations caused by the decay of radioactive elements and secular cooling. Convection in the earth's mantle, the 3000 km thick solid layer between the crust and core, is difficult to model for three reasons: (1) Complex rheology -- the effective viscosity depends exponentially on temperature, on pressure (or depth) and on the deviatoric stress; (2) the buoyancy forces driving the flow occur in boundary layers thin in comparison to the total depth; and (3) spherical geometry -- the flow in the interior is fully three dimensional. Because of these many difficulties, accurate and realistic simulations of this process easily overwhelm current computer speed and memory (including the Cray XMP and Cray 2) and only simplified problems have been attempted [e.g. Christensen and Yuen, 1984; Gurnis, 1988; Jarvis and Peltier, 1982]. As a start in overcoming these difficulties, a number of finite element formulations have been explored on hypercube concurrent computers. Although two coupled equations are required to solve this problem (the momentum or Stokes equation and the energy or advection-diffusion equation), we will concentrate our efforts on the solution to the latter equation in this paper. Solution of the former equation is discussed elsewhere [Lyzenga, et al, 1988]. We will demonstrate that linear speedups and efficiencies of 99 percent are achieved for sufficiently large problems.
热对流流动问题的数值解对于许多科学和工程问题都是至关重要的。一个基本的地球物理问题是引起大陆漂移和海底扩张的热对流。由于放射性元素衰变和长期冷却引起的温度变化所产生的浮力,地球内部经历缓慢的爬行流动(~cm/yr)。地幔(地壳和地核之间3000公里厚的固体层)中的对流很难建模,原因有三:(1)复杂的流变学——有效粘度以指数形式取决于温度、压力(或深度)和偏应力;(2)驱动流动的浮力发生在相对于总深度较薄的边界层中;(3)球面几何——内部的流动完全是三维的。由于这些困难,对这一过程的准确和真实的模拟很容易超过当前的计算机速度和内存(包括Cray XMP和Cray 2),并且只尝试了简化的问题[例如Christensen和Yuen, 1984;格尼斯,1988;Jarvis and Peltier, 1982]。作为克服这些困难的开端,一些有限元公式已经在超立方体并发计算机上进行了探索。虽然解决这个问题需要两个耦合方程(动量或斯托克斯方程和能量或平流-扩散方程),但本文将集中精力解决后一个方程。前一个方程的解在其他地方有讨论[Lyzenga, et al, 1988]。我们将证明,对于足够大的问题,可以实现99%的线性加速和效率。
{"title":"Finite element solution of thermal convection on a hypercube concurrent computer","authors":"M. Gurnis, A. Raefsky, G. Lyzenga, B. Hager","doi":"10.1145/63047.63070","DOIUrl":"https://doi.org/10.1145/63047.63070","url":null,"abstract":"Numerical solutions to thermal convection flow problems \u0000are vital to many scientific and engineering problems. \u0000One fundamental geophysical problem is the thermal convection \u0000responsible for continental drift and sea floor \u0000spreading. The earth's interior undergoes slow creeping \u0000flow (~cm/yr) in response to the buoyancy forces generated \u0000by temperature variations caused by the decay of \u0000radioactive elements and secular cooling. Convection in \u0000the earth's mantle, the 3000 km thick solid layer between \u0000the crust and core, is difficult to model for three reasons: \u0000(1) Complex rheology -- the effective viscosity depends \u0000exponentially on temperature, on pressure (or depth) and \u0000on the deviatoric stress; (2) the buoyancy forces driving \u0000the flow occur in boundary layers thin in comparison to the \u0000total depth; and (3) spherical geometry -- the flow in the \u0000interior is fully three dimensional. Because of these many \u0000difficulties, accurate and realistic simulations of this process \u0000easily overwhelm current computer speed and memory \u0000(including the Cray XMP and Cray 2) and only simplified \u0000problems have been attempted [e.g. Christensen and \u0000Yuen, 1984; Gurnis, 1988; Jarvis and Peltier, 1982]. \u0000 \u0000As a start in overcoming these difficulties, a number of \u0000finite element formulations have been explored on hypercube \u0000concurrent computers. Although two coupled equations \u0000are required to solve this problem (the momentum \u0000or Stokes equation and the energy or advection-diffusion \u0000equation), we will concentrate our efforts on the solution \u0000to the latter equation in this paper. Solution of the former \u0000equation is discussed elsewhere [Lyzenga, et al, 1988]. \u0000We will demonstrate that linear speedups and efficiencies \u0000of 99 percent are achieved for sufficiently large problems.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116003706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fault simulation is the process of simulating the response of a logic circuit to input patterns in the presence of all possible single faults and is an essential part of test generation for VLSI circuits. Parallelization of the deductive and parallel simulation methods, on a hypercube multiprocessor and vectorization of the parallel simulation method are described. Experimental results are presented.
{"title":"Logic fault simulation on a vector hypercube multiprocessor","authors":"F. Özgüner, C. Aykanat, O. Khalid","doi":"10.1145/63047.63064","DOIUrl":"https://doi.org/10.1145/63047.63064","url":null,"abstract":"Fault simulation is the process of simulating the response of a logic circuit to input patterns in the presence of all possible single faults and is an essential part of test generation for VLSI circuits. Parallelization of the deductive and parallel simulation methods, on a hypercube multiprocessor and vectorization of the parallel simulation method are described. Experimental results are presented.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124027482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The process of effectively coordinating and controlling resources during a military engagement is known as battle management/command, control, and communications (BM/C3). One key task of BM/C3 is allocating weapons to destroy targets. The focus of this research is on developing parallel computation methods to achieve fast and cost effective assignment of weapons to targets. Using the sequential Hungarian method for solving the assignment problem as a basis, this paper presents the development and the relative performance comparison of four parallel assignment methodologies that have been implemented on the Intel iPSC hypercube computer. The first three approaches are approximations to the optimal assignment solution. The advantage to these is that they are computationally fast and have proven to generate assignments that are very close the optimal assignment in terms of cost. The fourth approach is a parallel implementation of the Hungarian algorithm, where certain subtasks are performed in parallel. This approach produces an optimal assignment as compared to the sub-optimal assignments that result from the first three approaches. The relative performance of the four approaches is compared by varying the number of weapons and targets, the number of processors used, and the size of the problem partitions.
{"title":"Implementation and performance analysis of parallel assignment algorithms on a hypercube computer","authors":"BarryK. Carpenter, IV NathanielJ.Davis","doi":"10.1145/63047.63077","DOIUrl":"https://doi.org/10.1145/63047.63077","url":null,"abstract":"The process of effectively coordinating and controlling resources during a military engagement is known as battle management/command, control, and communications (BM/C3). One key task of BM/C3 is allocating weapons to destroy targets. The focus of this research is on developing parallel computation methods to achieve fast and cost effective assignment of weapons to targets. Using the sequential Hungarian method for solving the assignment problem as a basis, this paper presents the development and the relative performance comparison of four parallel assignment methodologies that have been implemented on the Intel iPSC hypercube computer. The first three approaches are approximations to the optimal assignment solution. The advantage to these is that they are computationally fast and have proven to generate assignments that are very close the optimal assignment in terms of cost. The fourth approach is a parallel implementation of the Hungarian algorithm, where certain subtasks are performed in parallel. This approach produces an optimal assignment as compared to the sub-optimal assignments that result from the first three approaches. The relative performance of the four approaches is compared by varying the number of weapons and targets, the number of processors used, and the size of the problem partitions.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127225418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}