Conference on Hypercube Concurrent Computers and Applications最新文献

英文中文

Statistical gravitational lensing on the Mark III hypercube 马克III超立方体的统计引力透镜

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63050

J. Apostolakis, C. Kochanek

We describe a parallel algorithm for the nonlinear optics problem of gravitational lensing. The method is a “ray-tracing” method which studies the statistical properties of the image population associated with a gravitational lens. A parallel computer is needed because the spatial resolution requirements of the problem make the program too large to run on conventional machines. The program is implemented on the Mark III hypercube to take maximum advantage of this machine's 128 Mbytes of memory. The concurrent implementation uses a scattered domain decomposition and the CrOS III communications routines. The communications in the problem are so irregular that no completely satisfactory implementation was made in terms of the execution time of the program: the maximum speed-up relative to a sequential implementation is a factor of 4 on a 32 node machine. However, the goal of efficiently using all of the Mark III's memory was achieved, and the execution time was not the limiting factor in the problem. If the crystal router were used, the implementation would be much more efficient. Development of the program was terminated at this stage, however, because we were able to extract the physics of interest without the more sophisticated communications routines.

提出了一种求解引力透镜非线性光学问题的并行算法。该方法是一种“光线追踪”方法，研究与引力透镜相关的图像种群的统计特性。需要一台并行计算机，因为问题的空间分辨率要求使程序太大，无法在传统机器上运行。该程序在Mark III超立方体上实现，以最大限度地利用这台机器的128兆内存。并发实现使用分散域分解和CrOS III通信例程。问题中的通信是如此不规则，以至于在程序的执行时间方面没有完全令人满意的实现:在32节点的机器上，相对于顺序实现的最大加速是4倍。然而，有效地使用所有马克III的内存的目标是实现的，并且执行时间不是问题中的限制因素。如果使用晶体路由器，实现将更加高效。然而，这个程序的开发在这个阶段被终止了，因为我们能够在没有更复杂的通信例程的情况下提取感兴趣的物理。

{"title":"Statistical gravitational lensing on the Mark III hypercube","authors":"J. Apostolakis, C. Kochanek","doi":"10.1145/63047.63050","DOIUrl":"https://doi.org/10.1145/63047.63050","url":null,"abstract":"We describe a parallel algorithm for the nonlinear optics problem of gravitational lensing. The method is a “ray-tracing” method which studies the statistical properties of the image population associated with a gravitational lens. A parallel computer is needed because the spatial resolution requirements of the problem make the program too large to run on conventional machines. The program is implemented on the Mark III hypercube to take maximum advantage of this machine's 128 Mbytes of memory. The concurrent implementation uses a scattered domain decomposition and the CrOS III communications routines. The communications in the problem are so irregular that no completely satisfactory implementation was made in terms of the execution time of the program: the maximum speed-up relative to a sequential implementation is a factor of 4 on a 32 node machine. However, the goal of efficiently using all of the Mark III's memory was achieved, and the execution time was not the limiting factor in the problem. If the crystal router were used, the implementation would be much more efficient. Development of the program was terminated at this stage, however, because we were able to extract the physics of interest without the more sophisticated communications routines.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132739172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Comparison of two-dimensional FFT methods on the hypercube 超立方体上二维FFT方法的比较

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63099

C. Chu

Complex two-dimensional FFTs up to size 256 x 256 points are implemented on the Intel iPSC/System 286 hypercube with emphasis on comparing the effects of data mapping, data transposition or communication needs, and the use of distributed FFTs. Two new implementations of the 2D-FFT include the Local-Distributed method which performs local FFTs in one direction followed by distributed FFTs in the other direction, and a Vector-Radix implementation that is derived from decimating the DFT in two-dimensions instead of one. In addition, the Transpose-Split method involving local FFTs in both directions with an intervening matrix transposition and the Block 2D-FFT involving distributed FFT butterflies in both directions are implemented and compared with the other two methods. Timing results show that on the Intel iPSC/System 286, there is hardly any difference between the methods, with the only differences arising from the efficiency or inefficiency of communication. Since the Intel cannot overlap communication and computation, this forces the user to buffer data. In some of the methods, this causes processor blocking during communication. Issues of vectorization, communication strategies, data storage and buffering requirements are investigated. A model is given that compares vectorization and communication complexity. While timing results show that the Transpose-Split method is in general slightly faster, our model shows that the Block method and Vector-Radix method have the potential to be faster if the communication difficulties were taken care of. Therefore if communication could be “hidden” within computation, the latter two methods can become useful with the Block method vectorizing the best and the Vector-Radix method having 25% fewer multiplications than row-column 2D-FFT methods. Finally the Local-Distributed method is a good hybrid method requiring no transposing and can be useful in certain circumstances. This paper provides some general guidelines in evaluating parallel distributed 2D-FFT implementations and concludes that while different methods may be best suited for different systems, better implementation techniques as well as faster algorithms still perform better when communication become more efficient.

在Intel iPSC/System 286超立方体上实现了256 x 256点大小的复杂二维fft，重点是比较数据映射，数据转换或通信需求的效果，以及分布式fft的使用。2D-FFT的两种新实现包括local - distributed方法，该方法在一个方向上执行局部fft，然后在另一个方向上执行分布式fft，以及Vector-Radix实现，该实现通过在二维而不是一维中抽取DFT而派生。此外，实现了涉及两个方向的局部FFT的转置-分割方法和涉及两个方向的分布式FFT蝴蝶的Block 2D-FFT方法，并与其他两种方法进行了比较。时序结果表明，在Intel iPSC/System 286上，这两种方法之间几乎没有任何区别，唯一的区别在于通信的效率或低效率。由于英特尔不能重叠通信和计算，这迫使用户缓冲数据。在某些方法中，这会导致处理器在通信期间阻塞。对矢量化、通信策略、数据存储和缓冲要求等问题进行了研究。给出了一个比较向量化和通信复杂度的模型。虽然时序结果表明，转置-分割方法通常略快，但我们的模型表明，如果考虑到通信困难，块方法和向量-基数方法有可能更快。因此，如果通信可以在计算中“隐藏”，后两种方法可以变得有用，块方法矢量化最好，向量-基数方法比行-列2D-FFT方法乘法少25%。最后，局部分布方法是一种不需要转置的很好的混合方法，可以在某些情况下使用。本文提供了一些评估并行分布式2D-FFT实现的一般准则，并得出结论，尽管不同的方法可能最适合不同的系统，但当通信变得更有效时，更好的实现技术和更快的算法仍然表现得更好。

{"title":"Comparison of two-dimensional FFT methods on the hypercube","authors":"C. Chu","doi":"10.1145/63047.63099","DOIUrl":"https://doi.org/10.1145/63047.63099","url":null,"abstract":"Complex two-dimensional FFTs up to size 256 x 256 points are implemented on the Intel iPSC/System 286 hypercube with emphasis on comparing the effects of data mapping, data transposition or communication needs, and the use of distributed FFTs. Two new implementations of the 2D-FFT include the Local-Distributed method which performs local FFTs in one direction followed by distributed FFTs in the other direction, and a Vector-Radix implementation that is derived from decimating the DFT in two-dimensions instead of one. In addition, the Transpose-Split method involving local FFTs in both directions with an intervening matrix transposition and the Block 2D-FFT involving distributed FFT butterflies in both directions are implemented and compared with the other two methods. Timing results show that on the Intel iPSC/System 286, there is hardly any difference between the methods, with the only differences arising from the efficiency or inefficiency of communication. Since the Intel cannot overlap communication and computation, this forces the user to buffer data. In some of the methods, this causes processor blocking during communication. Issues of vectorization, communication strategies, data storage and buffering requirements are investigated. A model is given that compares vectorization and communication complexity. While timing results show that the Transpose-Split method is in general slightly faster, our model shows that the Block method and Vector-Radix method have the potential to be faster if the communication difficulties were taken care of. Therefore if communication could be “hidden” within computation, the latter two methods can become useful with the Block method vectorizing the best and the Vector-Radix method having 25% fewer multiplications than row-column 2D-FFT methods. Finally the Local-Distributed method is a good hybrid method requiring no transposing and can be useful in certain circumstances. This paper provides some general guidelines in evaluating parallel distributed 2D-FFT implementations and concludes that while different methods may be best suited for different systems, better implementation techniques as well as faster algorithms still perform better when communication become more efficient.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"57 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114124612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

A distributed hypercube file system 分布式超立方体文件系统

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63093

R. Flynn, H. Hadimioglu

For the hypercube, an autonomous physically interconnected file system is proposed. The resulting distributed file system consists of an I/O organization and a software interface. The system is loosely-coupled architecturally but from operating systems point of view a tightly-coupled system is formed in which interprocessor messages are handled differently from file accesses. A matrix multiplication algorithm is given to show how the distributed file system is utilized.

对于超立方体，提出了一种自治的物理互联文件系统。生成的分布式文件系统由一个I/O组织和一个软件接口组成。该系统在体系结构上是松耦合的，但从操作系统的角度来看，形成了一个紧耦合的系统，其中处理器间消息的处理方式与文件访问的处理方式不同。给出了一个矩阵乘法算法来说明分布式文件系统是如何被利用的。

引用次数: 7

Implementing the beam and warming method on the hypercube 在超立方体上实现光束和加热方法

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63061

J. Bruno, P. Cappello

Numerical simulation of a wide range of physical phenomena typically involves enormous amounts of computation and, for scores of practical problems, these simulations cannot be carried out even on today's fastest supercomputers. The economic and scientific importance of many of these problems is driving the explosive research in computer architecture, especially the work aimed at achieving ultra high-speed computation by exploiting concurrent processing. Correspondingly, there is great interest in the design and analysis of numerical algorithms which are suitable for implementation on concurrent processor systems.In this paper we consider the implementation of the Beam and Warming implicit factored method on a hypercube concurrent processor system. We present a set of equations and give the numerical method in sufficient detail to illustrate and analyze the problems which arise in implementing this numerical method. We show that there are mappings of the computational domain onto the nodes of a hypercube concurrent processor system which maintain the efficiency of the numerical method. We also show that better methods do not exist.

广泛的物理现象的数值模拟通常涉及大量的计算，对于许多实际问题，即使在当今最快的超级计算机上，这些模拟也无法进行。这些问题在经济和科学上的重要性推动了计算机体系结构的爆炸性研究，特别是旨在通过利用并发处理实现超高速计算的工作。相应地，设计和分析适合在并发处理器系统上实现的数值算法也引起了人们的极大兴趣。本文研究了Beam和warm隐式因子法在超立方并发处理器系统上的实现。我们提出了一组方程，并给出了足够详细的数值方法来说明和分析在实施该数值方法时出现的问题。我们证明了计算域映射到超立方体并发处理器系统的节点上，从而保持了数值方法的效率。我们还表明，不存在更好的方法。

引用次数: 45

LU decomposition of banded matrices and the solution of linear systems on hypercubes 带阵的LU分解及超立方体上线性系统的解

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63124

D. Walker, T. Aldcroft, A. Cisneros, G. Fox, W. Furmanski

We describe the solution of linear systems of equations, Ax = b, on distributed-memory concurrent computers whose interconnect topology contains a two-dimensional mesh. A is assumed to be an M×M banded matrix. The problem is generalized to the case in which there are nb distinct right-hand sides, b, and can thus be expressed as AX = B, where X and B are both M×nb matrices. The solution is obtained by the LU decomposition method which proceeds in three stages: (1) LU decomposition of the matrix A, (2) forward reduction, (3) back substitution. Since the matrix A is banded a simple rectangular subblock decomposition of the matrices A, X, and B over the nodes of the ensemble results in excessive load imbalance. A scattered decomposition is therefore used to decompose the data. The sequential and concurrent algorithms are described in detail, and models of the performance of the concurrent algorithm are presented for each of the three stages of the algorithm. In order to ensure numerical stability the algorithm is extended to include partial pivoting. Performance models for the pivoting case are also given. Results from a 128-node Caltech/JPL Mark II hypercube are presented, and the performance models are found to be a good agreement with these data. Indexing overhead was found to contribute significantly to the total concurrent overhead.

本文描述了互连拓扑包含二维网格的分布式存储并发计算机上线性方程组Ax = b的解。假设A是一个M×M带状矩阵。这个问题推广到有nb个不同的右侧b的情况，因此可以表示为AX = b，其中X和b都是M×nb矩阵。解由LU分解法得到，该方法分三个阶段进行:(1)矩阵A的LU分解，(2)正向约简，(3)反向代入。由于矩阵A是带状的，在集合的节点上对矩阵A、X和B进行简单的矩形子块分解会导致过度的负载不平衡。因此，使用分散分解来分解数据。对顺序算法和并发算法进行了详细的描述，并对算法的三个阶段分别给出了并发算法的性能模型。为了保证数值稳定性，将算法扩展到包含部分枢轴。给出了轴向壳体的性能模型。给出了加州理工学院/喷气推进实验室Mark II超立方体的128个节点的计算结果，发现性能模型与这些数据很好地吻合。发现索引开销对总并发开销有很大贡献。

{"title":"LU decomposition of banded matrices and the solution of linear systems on hypercubes","authors":"D. Walker, T. Aldcroft, A. Cisneros, G. Fox, W. Furmanski","doi":"10.1145/63047.63124","DOIUrl":"https://doi.org/10.1145/63047.63124","url":null,"abstract":"We describe the solution of linear systems of equations, Ax = b, on distributed-memory concurrent computers whose interconnect topology contains a two-dimensional mesh. A is assumed to be an M×M banded matrix. The problem is generalized to the case in which there are nb distinct right-hand sides, b, and can thus be expressed as AX = B, where X and B are both M×nb matrices. The solution is obtained by the LU decomposition method which proceeds in three stages: (1) LU decomposition of the matrix A, (2) forward reduction, (3) back substitution. Since the matrix A is banded a simple rectangular subblock decomposition of the matrices A, X, and B over the nodes of the ensemble results in excessive load imbalance. A scattered decomposition is therefore used to decompose the data. The sequential and concurrent algorithms are described in detail, and models of the performance of the concurrent algorithm are presented for each of the three stages of the algorithm. In order to ensure numerical stability the algorithm is extended to include partial pivoting. Performance models for the pivoting case are also given. Results from a 128-node Caltech/JPL Mark II hypercube are presented, and the performance models are found to be a good agreement with these data. Indexing overhead was found to contribute significantly to the total concurrent overhead.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131566794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Implemention of a divide and conquer cyclic reduction algorithm on the FPS T-20 hypercube 一种分治循环约简算法在FPS T-20超立方体上的实现

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63111

C. Cox

A simple variant of the odd-even cyclic reduction algorithm for solving tridiagonal linear systems is presented. The target architecture for this scheme is a parallel computer with nodes which are vector processors, such as the Floating Point Systems T-Series hypercube. Of particular interest is the case where the number of equations is much larger than the number of processors. The matrix system is partitioned into local subsystems, with the partitioning governed by a parameter which determines the amount of redundancy in computations. The algorithm proceeds after the distribution of local systems with independent computations, all-to-all broadcast of a small number of equations from each processor, solution of this subsystem, more independent computations, and output of the solution. Some redundancy in calculations between neighboring processors results in minimized communication costs. One feature of this approach is that computations are well balanced, as each processor executes an identical algebraic routine.A brief description of the standard cyclic reduction algorithm is given. Then the divide and conquer strategy is presented along with some estimates of speedup and efficiency. Finally, an Occam program for this algorithm which runs on the FPS T-20 computer is discussed along with experimental results.

给出了求解三对角线性系统的奇偶循环约简算法的一个简单变体。该方案的目标体系结构是一个并行计算机，其节点是矢量处理器，例如浮点系统t系列超立方体。特别有趣的是，当方程的数量远远大于处理器的数量时。矩阵系统被划分为局部子系统，划分由一个参数控制，该参数决定了计算中的冗余量。该算法首先进行独立计算的局部系统分布、各处理器少量方程的全对全广播、该子系统的解、更独立的计算、解的输出。相邻处理器之间的冗余计算使通信成本最小化。这种方法的一个特点是计算很好地平衡，因为每个处理器执行相同的代数例程。给出了标准循环约简算法的简要描述。然后提出了分而治之的策略，并对加速和效率进行了估计。最后，讨论了该算法在FPS T-20计算机上的Occam程序，并给出了实验结果。

{"title":"Implemention of a divide and conquer cyclic reduction algorithm on the FPS T-20 hypercube","authors":"C. Cox","doi":"10.1145/63047.63111","DOIUrl":"https://doi.org/10.1145/63047.63111","url":null,"abstract":"A simple variant of the odd-even cyclic reduction algorithm for solving tridiagonal linear systems is presented. The target architecture for this scheme is a parallel computer with nodes which are vector processors, such as the Floating Point Systems T-Series hypercube. Of particular interest is the case where the number of equations is much larger than the number of processors. The matrix system is partitioned into local subsystems, with the partitioning governed by a parameter which determines the amount of redundancy in computations. The algorithm proceeds after the distribution of local systems with independent computations, all-to-all broadcast of a small number of equations from each processor, solution of this subsystem, more independent computations, and output of the solution. Some redundancy in calculations between neighboring processors results in minimized communication costs. One feature of this approach is that computations are well balanced, as each processor executes an identical algebraic routine.\u0000A brief description of the standard cyclic reduction algorithm is given. Then the divide and conquer strategy is presented along with some estimates of speedup and efficiency. Finally, an Occam program for this algorithm which runs on the FPS T-20 computer is discussed along with experimental results.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129106495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Gauss-Jordan inversion with pivoting on the Caltech Mark II hypercube Caltech Mark II超立方体上具有旋转的高斯-乔丹反演

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1989-01-03 DOI: 10.1145/63047.63123

P. Hipes, A. Kuppermann

The performance of a parallel Gauss-Jordan matrix inversion1,2 algorithm on the Mark II hypercube3 at Caltech is discussed. We will show that parallel Gauss-Jordan inversion is superior to parallel Gaussian elimination for inversion, and discuss the reasons for this. Empirical and theoretical efficiencies for parallel Gauss-Jordan inversion as a function of matrix dimension for different numbers and configurations of processors are presented. The theoretical efficiencies are in quantitative agreement with the empirical efficiencies.

讨论了一种并行高斯-乔丹矩阵反演算法在加州理工学院Mark II超立方体计算机上的性能。我们将证明并行高斯-乔丹反演优于并行高斯消去反演，并讨论其原因。给出了不同处理器数量和配置下并行高斯-约当反演的经验效率和理论效率与矩阵维数的关系。理论效率与实证效率在数量上是一致的。

引用次数: 20

Chess on a hypercube 在超立方体上下棋

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1988-04-11 DOI: 10.1145/63047.63088

E. Felten, S. Otto

We report our progress on computer chess last described at the Second Conference on Hypercubes. Our program follows the strategy of currently successful sequential chess programs: searching of an alpha-beta pruned game tree, iterative deepening, transposition and history tables, specialized endgame evaluators, and so on. The search tree is decomposed onto the hypercube (an NCUBE) using a recursive version of the principal-variation-splitting algorithm. Roughly speaking, subtrees are searched by teams of processors in a self-scheduled manner.A crucial feature of the program is the global hashtable. Hashtables are important in the sequential case, but are even more central for a parallel chess algorithm. The table not only stores knowledge but also makes the decision at each node of the chess tree whether to stay sequential or to split up the work in parallel. In the language of Knuth and Moore, the transposition table decides whether each node of the chess tree is a type 2 or a type 3 node and acts accordingly. For this data structure the hypercube is used as a shared-memory machine. Multiple writes to the same location are resolved using a priority system which decides which entry is of more value to the program. The hashtable is implemented as “smart” shared memory.Search times for related subtrees vary widely (up to a factor of 100) so dynamic reconfiguration of processors is necessary to concentrate on such “hot spots” in the tree. A first version of the program with dynamic load balancing has recently been completed and out-performs the non-load-balancing program by a factor of three. The current speedup of the program is 101 out of a possible 256 processors.The program has played in several tournaments, facing both computers and people. Most recently it scored 2-2 in the ACM North American Computer Chess Championship.

我们在第二届超立方体会议上报告了我们在计算机象棋方面的进展。我们的程序遵循当前成功的顺序象棋程序的策略:搜索一个α - β修剪的游戏树，迭代深化，换位和历史表，专门的终局评估器，等等。使用主变分算法的递归版本将搜索树分解到超立方体(一个NCUBE)上。粗略地说，子树是由处理器团队以自调度的方式搜索的。该程序的一个关键特性是全局哈希表。哈希表在顺序情况下很重要，但对于并行国际象棋算法则更为重要。表不仅存储知识，而且还在象棋树的每个节点上做出决定，是保持顺序还是并行地分割工作。在Knuth和Moore的语言中，换位表决定象棋树的每个节点是2型节点还是3型节点，并相应地采取行动。对于这种数据结构，超多维数据集被用作共享内存机器。对同一位置的多次写入使用优先级系统来解决，该系统决定哪个条目对程序更有价值。哈希表被实现为“智能”共享内存。相关子树的搜索时间变化很大(最高可达100倍)，因此需要对处理器进行动态重新配置，以便专注于树中的这些“热点”。具有动态负载平衡的程序的第一个版本最近已经完成，并且比非负载平衡程序的性能高出三倍。目前，该程序的加速速度为101 / 256个处理器。这个程序已经参加了几场比赛，面对的是电脑和人。最近，它在ACM北美计算机国际象棋锦标赛中取得了2比2的成绩。

{"title":"Chess on a hypercube","authors":"E. Felten, S. Otto","doi":"10.1145/63047.63088","DOIUrl":"https://doi.org/10.1145/63047.63088","url":null,"abstract":"We report our progress on computer chess last described at the Second Conference on Hypercubes. Our program follows the strategy of currently successful sequential chess programs: searching of an alpha-beta pruned game tree, iterative deepening, transposition and history tables, specialized endgame evaluators, and so on. The search tree is decomposed onto the hypercube (an NCUBE) using a recursive version of the principal-variation-splitting algorithm. Roughly speaking, subtrees are searched by teams of processors in a self-scheduled manner.\u0000A crucial feature of the program is the global hashtable. Hashtables are important in the sequential case, but are even more central for a parallel chess algorithm. The table not only stores knowledge but also makes the decision at each node of the chess tree whether to stay sequential or to split up the work in parallel. In the language of Knuth and Moore, the transposition table decides whether each node of the chess tree is a type 2 or a type 3 node and acts accordingly. For this data structure the hypercube is used as a shared-memory machine. Multiple writes to the same location are resolved using a priority system which decides which entry is of more value to the program. The hashtable is implemented as “smart” shared memory.\u0000Search times for related subtrees vary widely (up to a factor of 100) so dynamic reconfiguration of processors is necessary to concentrate on such “hot spots” in the tree. A first version of the program with dynamic load balancing has recently been completed and out-performs the non-load-balancing program by a factor of three. The current speedup of the program is 101 out of a possible 256 processors.\u0000The program has played in several tournaments, facing both computers and people. Most recently it scored 2-2 in the ACM North American Computer Chess Championship.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1988-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116570488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

Rapid prototyping of a parallel operating system for a generalized hypercube 广义超立方体并行操作系统的快速原型设计

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1900-01-01 DOI: 10.1145/62297.62339

E. Gehringer, Brian D. Harry

B-HIVE is an experimental multiprocessor system under construction at North Carolina State University. Its operating system is derived from XINU, an operating system designed for teaching purposes. XINU was chosen because it is unusually well documented and supplied most of the features that were necessary at the outset of the project. Among the few changes made to XINU are a supervisor state and an interprocessor communication system.

B-HIVE是北卡罗莱纳州立大学正在建设的一个实验性多处理器系统。它的操作系统来源于XINU，一个为教学目的而设计的操作系统。之所以选择XINU，是因为它有非常好的文档，并提供了项目开始时所需的大部分功能。在对XINU所做的少数更改中，有一个管理器状态和一个处理器间通信系统。

引用次数: 0

A dynamic load balancer on the Intel hypercube Intel超立方体上的动态负载平衡器

Conference on Hypercube Concurrent Computers and Applications

Pub Date : 1900-01-01 DOI: 10.1145/62297.62328

J. Koller

A class of commonly encountered problems requires dynamic load balancing for efficient use of concurrent processors. We are developing a test bed for dynamic load balancing studies, and have chosen the MOOSE operating system and the Intel iPSC as our environment. We discuss these choices, and how we are implementing a general purpose dynamic load balancer.

一类经常遇到的问题需要动态负载平衡来有效地使用并发处理器。我们正在开发一个用于动态负载平衡研究的测试平台，并选择了MOOSE操作系统和Intel iPSC作为我们的环境。我们将讨论这些选择，以及如何实现一个通用的动态负载平衡器。

引用次数: 17

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Conference on Hypercube Concurrent Computers and Applications

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀