首页 > 最新文献

Conference on Hypercube Concurrent Computers and Applications最新文献

英文 中文
Solution of the 3-D Euler equations for the flow about a fighter aircraft configuration using a hypercube parallel processor 用超立方并行处理器求解战斗机构型流场的三维欧拉方程
Pub Date : 1989-01-03 DOI: 10.1145/63047.63066
D. Weissbein, J. F. Mangus, M. W. George
The Computational Fluid Dynamics (CFD) code FL057, which solves the 3-D Euler Equations using an explicit, finite volume, Runge-Kutta algorithm, was implemented on an Intel IPSC-MX parallel processor. Spatial decomposition was effected on the solution grid about a fighter aircraft configuration and Binary Reflected Graycodes were used to map the computational domain onto the IPSC insuring nearest neighbor communication. Results and timings of the implementation are presented with a comparison of the IPSC and a uniprocessor machine of similar classification to assess the performance of the IPSC on FL057. Suggested improvements to the current version of the parallelized code are listed to aid load balancing, vectorization, and more efficient memory use.
计算流体动力学(CFD)代码FL057在英特尔IPSC-MX并行处理器上实现,该代码使用显式有限体积龙格-库塔算法求解三维欧拉方程。对某型战斗机构型解网格进行空间分解,利用二值反射灰度码将计算域映射到保证最近邻通信的IPSC上。通过比较IPSC和类似分类的单处理机的实现结果和时间,来评估IPSC在FL057上的性能。本文列出了对当前版本并行化代码的建议改进,以帮助实现负载平衡、向量化和更有效地使用内存。
{"title":"Solution of the 3-D Euler equations for the flow about a fighter aircraft configuration using a hypercube parallel processor","authors":"D. Weissbein, J. F. Mangus, M. W. George","doi":"10.1145/63047.63066","DOIUrl":"https://doi.org/10.1145/63047.63066","url":null,"abstract":"The Computational Fluid Dynamics (CFD) code FL057, which solves the 3-D Euler Equations using an explicit, finite volume, Runge-Kutta algorithm, was implemented on an Intel IPSC-MX parallel processor. Spatial decomposition was effected on the solution grid about a fighter aircraft configuration and Binary Reflected Graycodes were used to map the computational domain onto the IPSC insuring nearest neighbor communication. Results and timings of the implementation are presented with a comparison of the IPSC and a uniprocessor machine of similar classification to assess the performance of the IPSC on FL057. Suggested improvements to the current version of the parallelized code are listed to aid load balancing, vectorization, and more efficient memory use.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123253724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Hypercube data analysis in astronomy: optical interferometry and millisecond pulsar searches 天文学中的超立方体数据分析:光学干涉测量和毫秒脉冲星搜索
Pub Date : 1989-01-03 DOI: 10.1145/63047.63049
P. Gorham, T. Prince, S. Anderson
Astronomical data sets are beginning to live up to their name, in both their sizes and the complexity of the analysis required. Here we discuss two astronomical data analysis problems which we have begun to implement on a hypercube concurrent processor environment: The intensive image processing required in an optical interferometry project, and the large scale power spectral analysis required by a search for millisecond-period radio pulsars. In both cases the analysis proceeds largely in the Fourier domain, and we find that the problems are readily adapted to a concurrent environment. In the following report, we outline briefly the astronomical background for each problem, then discuss the general computational requirements, and finally present possible hypercube algorithms and results achieved to date.
天文数据集在其规模和所需分析的复杂性方面开始名符其实。在这里,我们讨论了两个天文数据分析问题,我们已经开始在超立方体并发处理器环境中实现:光学干涉测量项目所需的密集图像处理,以及搜索毫秒周期射电脉冲星所需的大规模功率谱分析。在这两种情况下,分析主要在傅里叶域中进行,我们发现问题很容易适应并发环境。在下面的报告中,我们简要概述了每个问题的天文学背景,然后讨论了一般的计算要求,最后介绍了可能的超立方体算法和迄今为止取得的结果。
{"title":"Hypercube data analysis in astronomy: optical interferometry and millisecond pulsar searches","authors":"P. Gorham, T. Prince, S. Anderson","doi":"10.1145/63047.63049","DOIUrl":"https://doi.org/10.1145/63047.63049","url":null,"abstract":"Astronomical data sets are beginning to live up to their name, in both their sizes and the complexity of the analysis required. Here we discuss two astronomical data analysis problems which we have begun to implement on a hypercube concurrent processor environment: The intensive image processing required in an optical interferometry project, and the large scale power spectral analysis required by a search for millisecond-period radio pulsars. In both cases the analysis proceeds largely in the Fourier domain, and we find that the problems are readily adapted to a concurrent environment. In the following report, we outline briefly the astronomical background for each problem, then discuss the general computational requirements, and finally present possible hypercube algorithms and results achieved to date.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126260391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Region growing on a hypercube multiprocessor 在超立方体多处理器上生长区域
Pub Date : 1989-01-03 DOI: 10.1145/63047.63057
M. Willebeek-LeMair, A. Reeves
The region growing paradigm for image segmentation groups neighboring pixels into regions depending upon a predetermined homogeneity criteria. A parallel method for region growing on an MIMD multiprocessor system is presented. Since the region growing problem exhibits non-uniform and unpredictable load fluctuations, it requires a dynamic load balancing scheme to achieve a balanced load distribution. The results of implementing a parallel region growing algorithm on the Intel-iPSC hypercube are discussed.
图像分割的区域增长范式根据预先确定的均匀性标准将相邻像素划分为区域。提出了一种在多处理机系统上进行区域生长的并行方法。由于区域增长问题表现出不均匀和不可预测的负载波动,因此需要动态负载均衡方案来实现负载的均衡分配。讨论了在Intel-iPSC超立方体上实现并行区域生长算法的结果。
{"title":"Region growing on a hypercube multiprocessor","authors":"M. Willebeek-LeMair, A. Reeves","doi":"10.1145/63047.63057","DOIUrl":"https://doi.org/10.1145/63047.63057","url":null,"abstract":"The region growing paradigm for image segmentation groups neighboring pixels into regions depending upon a predetermined homogeneity criteria. A parallel method for region growing on an MIMD multiprocessor system is presented. Since the region growing problem exhibits non-uniform and unpredictable load fluctuations, it requires a dynamic load balancing scheme to achieve a balanced load distribution. The results of implementing a parallel region growing algorithm on the Intel-iPSC hypercube are discussed.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122348759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
The preconditioned conjugate gradient method on the hypercube 超立方体上的预条件共轭梯度法
Pub Date : 1989-01-03 DOI: 10.1145/63047.63126
G. Abe, K. Hane
A parallel algorithm for solving the elliptic partial differential equation (PDE) is described in this paper through the finite difference method (FDM) The Concurrent Preconditioned Conjugate Gradient method is developed to optimize processor load balancing. This algorithm is evaluated on a hypercube-based concurrent machine, the Intel iPSC.
本文提出了一种用有限差分法求解椭圆型偏微分方程的并行算法,并提出了并行预条件共轭梯度法来优化处理器负载均衡。该算法在基于超立方体的并发机器Intel iPSC上进行了评估。
{"title":"The preconditioned conjugate gradient method on the hypercube","authors":"G. Abe, K. Hane","doi":"10.1145/63047.63126","DOIUrl":"https://doi.org/10.1145/63047.63126","url":null,"abstract":"A parallel algorithm for solving the elliptic partial differential equation (PDE) is described in this paper through the finite difference method (FDM) The Concurrent Preconditioned Conjugate Gradient method is developed to optimize processor load balancing. This algorithm is evaluated on a hypercube-based concurrent machine, the Intel iPSC.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116084696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An experimental study of methods for parallel preconditioned Krylov methods 并行预处理Krylov方法的实验研究
Pub Date : 1989-01-03 DOI: 10.1145/63047.63128
D. Baxter, J. Saltz, M. Schultz, S. Eisenstat, K. Crowley
High performance multiprocessor architectures differ both in the number of processors, and in the delay costs for synchronization and communication. In order to obtain good performance on a given architecture for a given problem, adequate parallelization, good balance of load and an appropriate choice of granularity are essential.We discuss the implementation of parallel version of PCGPAK for both shared memory architectures and hypercubes. Our parallel implementation is sufficiently efficient to allow us to complete the solution of our test problems on 16 processors of the Encore Multimax/320 in an amount of time that is a small multiple of that required by a single head of a Cray X/MP, despite the fact that the peak performance of the Multimax processors is not even close to the supercomputer range. We illustrate the effectiveness of our approach on a number of model problems from reservoir engineering and mathematics.
高性能多处理器体系结构在处理器数量以及同步和通信的延迟成本方面都有所不同。为了在给定架构上获得针对给定问题的良好性能,充分的并行化、良好的负载平衡和适当的粒度选择是必不可少的。我们讨论了PCGPAK并行版本在共享内存架构和超多维数据集上的实现。我们的并行实现足够高效,可以让我们在Encore multiax /320的16个处理器上完成测试问题的解决方案,这是Cray X/MP单个头部所需时间的一小倍,尽管multiax处理器的峰值性能甚至没有接近超级计算机的范围。我们说明了我们的方法在一些油藏工程和数学模型问题上的有效性。
{"title":"An experimental study of methods for parallel preconditioned Krylov methods","authors":"D. Baxter, J. Saltz, M. Schultz, S. Eisenstat, K. Crowley","doi":"10.1145/63047.63128","DOIUrl":"https://doi.org/10.1145/63047.63128","url":null,"abstract":"High performance multiprocessor architectures differ both in the number of processors, and in the delay costs for synchronization and communication. In order to obtain good performance on a given architecture for a given problem, adequate parallelization, good balance of load and an appropriate choice of granularity are essential.\u0000We discuss the implementation of parallel version of PCGPAK for both shared memory architectures and hypercubes. Our parallel implementation is sufficiently efficient to allow us to complete the solution of our test problems on 16 processors of the Encore Multimax/320 in an amount of time that is a small multiple of that required by a single head of a Cray X/MP, despite the fact that the peak performance of the Multimax processors is not even close to the supercomputer range. We illustrate the effectiveness of our approach on a number of model problems from reservoir engineering and mathematics.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121588162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Hypercube performance for 2-D seismic finite-difference modeling 二维地震有限差分建模的超立方体性能
Pub Date : 1989-01-03 DOI: 10.1145/63047.63068
L. J. Baker
Wave-equation seismic modeling in two space dimensions is computationally intensive, often requiring hours of supercomputer CPU time to run typical geological models with 500 × 500 grids and 100 sources. This paper analyzes the performance of ACOUS2D, an explicit 4th-order finite-difference program, on Intel's 16-processor vector hypercube computer. The conversion of the sequential version of ACOUS2D to run on hypercube was straightforward, but time-consuming. The key consideration for optimal efficiency is load balancing. On a fairly typical geologic model, the 16-processor Intel vector hypercube computer ran ACOUS2D at 1/3 the speed of a Cray-1S.
二维空间的波动方程地震建模是计算密集型的,通常需要几个小时的超级计算机CPU时间来运行500 × 500网格和100个震源的典型地质模型。本文分析了显式四阶有限差分程序ACOUS2D在Intel的16处理器矢量超立方体计算机上的性能。将连续版本的ACOUS2D转换为在hypercube上运行很简单,但是很耗时。最优效率的关键考虑因素是负载平衡。在一个相当典型的地质模型中,16处理器的英特尔矢量超立方体计算机以Cray-1S的1/3速度运行ACOUS2D。
{"title":"Hypercube performance for 2-D seismic finite-difference modeling","authors":"L. J. Baker","doi":"10.1145/63047.63068","DOIUrl":"https://doi.org/10.1145/63047.63068","url":null,"abstract":"Wave-equation seismic modeling in two space dimensions is computationally intensive, often requiring hours of supercomputer CPU time to run typical geological models with 500 × 500 grids and 100 sources. This paper analyzes the performance of ACOUS2D, an explicit 4th-order finite-difference program, on Intel's 16-processor vector hypercube computer. The conversion of the sequential version of ACOUS2D to run on hypercube was straightforward, but time-consuming. The key consideration for optimal efficiency is load balancing. On a fairly typical geologic model, the 16-processor Intel vector hypercube computer ran ACOUS2D at 1/3 the speed of a Cray-1S.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123030423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Blitz: a rule-based system for massively parallel architectures Blitz:基于规则的大规模并行架构系统
Pub Date : 1989-01-03 DOI: 10.1145/63047.63091
K. Morgan
The rule-based system has emerged as an important tool to developers of artificial intelligence programs. Because of the computational resources required to realize the MATCH-SELECT-EXECUTE cycle of rule-based systems, researchers have been trying to introduce parallelism into these systems for some time. We describe a new approach to parallel rule-based systems which exploits fine-grained hypercube hardware. The new algorithms for parallel rule matching and simultaneous execution of several rules at once are presented. Experimental results using a Connection Machine* implementation of BLITZ are presented.
基于规则的系统已经成为人工智能程序开发人员的重要工具。由于实现基于规则的系统的MATCH-SELECT-EXECUTE周期需要计算资源,研究人员一直在尝试将并行性引入这些系统中。我们描述了一种利用细粒度超立方体硬件的并行基于规则的系统的新方法。提出了并行规则匹配和多规则同时执行的新算法。给出了使用连接机*实现BLITZ的实验结果。
{"title":"Blitz: a rule-based system for massively parallel architectures","authors":"K. Morgan","doi":"10.1145/63047.63091","DOIUrl":"https://doi.org/10.1145/63047.63091","url":null,"abstract":"The rule-based system has emerged as an important tool to developers of artificial intelligence programs. Because of the computational resources required to realize the MATCH-SELECT-EXECUTE cycle of rule-based systems, researchers have been trying to introduce parallelism into these systems for some time. We describe a new approach to parallel rule-based systems which exploits fine-grained hypercube hardware. The new algorithms for parallel rule matching and simultaneous execution of several rules at once are presented. Experimental results using a Connection Machine* implementation of BLITZ are presented.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131190153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Binsorting on hypercubes with d-port communication 具有d-port通信的超多维数据集的分类
Pub Date : 1989-01-03 DOI: 10.1145/63047.63102
S. Seidel, W. George
Three sorting algorithms are given for hypercubes with d-port communication. All of these algorithms are based on binsort at the global level. The binsort allows the movement of keys among nodes to be performed by a d-port complete exchange rather than a sequence of l-port exchanges as in other algorithms. This lowers communication costs by at least a factor of d compared to other sorting algorithms. The first algorithm assumes the keys are uniformly distributed and selects bin boundaries based on the global maximum and minimum keys. The other two algorithms make no assumption about the distribution of keys and so they sample the keys before the binsort in order to estimate their distribution. Splitting keys based on that estimate reduce the variance among the lengths of the subsequences left in the nodes after the complete exchange of bins which in turn helps to balance the computational load in each node. The performance of two of these algorithms on an FPS T-40 is given for data of various distributions and is compared to the performance of bitonic sort and hyperquicksort.
给出了具有d端口通信的超立方体的三种排序算法。所有这些算法都是基于全局层次的binsort。binsort允许通过d端口完全交换来执行节点之间的键移动,而不是像其他算法那样使用l端口交换序列。与其他排序算法相比,这将通信成本降低了至少1 / d。第一种算法假设键是均匀分布的,并根据全局最大键和最小键选择bin边界。另外两种算法没有对键的分布做任何假设,因此它们在binsort之前对键进行采样,以估计它们的分布。基于该估计分割密钥可以减少在完成交换bin后节点中剩余子序列长度之间的方差,这反过来有助于平衡每个节点的计算负载。给出了这两种算法在FPS -40上对不同分布的数据的性能,并与双速排序和超快速排序的性能进行了比较。
{"title":"Binsorting on hypercubes with d-port communication","authors":"S. Seidel, W. George","doi":"10.1145/63047.63102","DOIUrl":"https://doi.org/10.1145/63047.63102","url":null,"abstract":"Three sorting algorithms are given for hypercubes with d-port communication. All of these algorithms are based on binsort at the global level. The binsort allows the movement of keys among nodes to be performed by a d-port complete exchange rather than a sequence of l-port exchanges as in other algorithms. This lowers communication costs by at least a factor of d compared to other sorting algorithms. The first algorithm assumes the keys are uniformly distributed and selects bin boundaries based on the global maximum and minimum keys. The other two algorithms make no assumption about the distribution of keys and so they sample the keys before the binsort in order to estimate their distribution. Splitting keys based on that estimate reduce the variance among the lengths of the subsequences left in the nodes after the complete exchange of bins which in turn helps to balance the computational load in each node. The performance of two of these algorithms on an FPS T-40 is given for data of various distributions and is compared to the performance of bitonic sort and hyperquicksort.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127795040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Molecular dynamics simulation on an iPSC of defects in crystals 晶体缺陷iPSC的分子动力学模拟
Pub Date : 1989-01-03 DOI: 10.1145/63047.63084
P. Flinn
Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. TO copy otherwise, or to republish, requires a fee and/or specfic permission.
允许免费复制本材料的全部或部分,前提是这些副本不是为了直接的商业利益而制作或分发的,必须出现ACM版权声明、出版物的标题和日期,并注明复制是由计算机协会许可的。以其他方式复制或重新发布,需要付费和/或特定许可。
{"title":"Molecular dynamics simulation on an iPSC of defects in crystals","authors":"P. Flinn","doi":"10.1145/63047.63084","DOIUrl":"https://doi.org/10.1145/63047.63084","url":null,"abstract":"Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. TO copy otherwise, or to republish, requires a fee and/or specfic permission.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133057620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Block-matrix operations using orthogonal trees 使用正交树的块矩阵操作
Pub Date : 1989-01-03 DOI: 10.1145/63047.63115
A. Elster, A. Reeves
Hypercube algorithms are presented for distributed block-matrix operations. These algorithms are based entirely on an interconnection scheme which involves two orthogonal sets of binary trees. This switching topology makes use of all hypercube interconnection links in a synchronized manner.An efficient novel matrix-vector multiplication algorithm based on this technique is described. Also, matrix transpose operations moving just pointers rather than actual data, have been implemented for some applications by taking advantage of the above tree structures. For the cases where actual physical vector and matrix transposes are needed, possible techniques, including extensions of the above scheme, are discussed.The algorithms support submatrix partitionings of the data, instead of being limited to row and/or column partitionings. This allows efficient use of nodal vector processors as well as shorter interprocessor communication packets. It also produces a favorable data distribution for applications which involve near neighbor operations such as image processing. The algorithms are based on an interprocessor communication paradigm which involves variable length, tagged block data transfers. They have been implemented on an Intel iPSC hypercube system with the support of the Hypercube Library developed at the Christian Michelsen Institute.
提出了分布式块矩阵运算的超立方体算法。这些算法完全基于一种涉及两组正交二叉树的互连方案。这种交换拓扑以同步的方式利用所有超立方体互连链路。在此基础上提出了一种高效的矩阵向量乘法算法。此外,通过利用上述树结构,在某些应用程序中实现了仅移动指针而不移动实际数据的矩阵转置操作。对于需要实际物理向量和矩阵转置的情况,讨论了可能的技术,包括上述方案的扩展。这些算法支持数据的子矩阵分区,而不局限于行和/或列分区。这允许有效地使用节点矢量处理器以及更短的处理器间通信数据包。它还为涉及近邻操作(如图像处理)的应用程序提供了良好的数据分布。该算法基于处理器间通信范式,该范式涉及可变长度,标记块数据传输。在Christian Michelsen研究所开发的hypercube库的支持下,它们已经在Intel iPSC超立方体系统上实现。
{"title":"Block-matrix operations using orthogonal trees","authors":"A. Elster, A. Reeves","doi":"10.1145/63047.63115","DOIUrl":"https://doi.org/10.1145/63047.63115","url":null,"abstract":"Hypercube algorithms are presented for distributed block-matrix operations. These algorithms are based entirely on an interconnection scheme which involves two orthogonal sets of binary trees. This switching topology makes use of all hypercube interconnection links in a synchronized manner.\u0000An efficient novel matrix-vector multiplication algorithm based on this technique is described. Also, matrix transpose operations moving just pointers rather than actual data, have been implemented for some applications by taking advantage of the above tree structures. For the cases where actual physical vector and matrix transposes are needed, possible techniques, including extensions of the above scheme, are discussed.\u0000The algorithms support submatrix partitionings of the data, instead of being limited to row and/or column partitionings. This allows efficient use of nodal vector processors as well as shorter interprocessor communication packets. It also produces a favorable data distribution for applications which involve near neighbor operations such as image processing. The algorithms are based on an interprocessor communication paradigm which involves variable length, tagged block data transfers. They have been implemented on an Intel iPSC hypercube system with the support of the Hypercube Library developed at the Christian Michelsen Institute.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124579529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
Conference on Hypercube Concurrent Computers and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1