首页 > 最新文献

The Sixth Distributed Memory Computing Conference, 1991. Proceedings最新文献

英文 中文
A Symmetrical Communication Interface for Distributed-Memory Computers 分布式存储计算机的对称通信接口
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633140
Peter Steenkiste
example, both sends and receives can operate on both Applications have very diverse communication local and remote buffers. Although this communication requirements. Although individual algorithms often use model does not correspond directly to the low-level regular communication patterns, there is little regularity communication primitives supported by the hardware, it across applications or even across different phases of the can be implemented efficiently, and it gives the users same application. For this reason, a low-level more control over how and when transfers over the communication interface should support the unrestricted, network takes place. The interface is the lowest-level reliable exchange of variable-length messages. communication interface for the Nectar multicomputer.
例如,发送和接收都可以在两个应用程序上操作,应用程序具有非常不同的通信本地和远程缓冲区。虽然这种通信要求。虽然个别算法经常使用的模型并不直接对应底层的规则通信模式,很少有硬件支持的规则通信原语,但它可以跨应用程序甚至跨不同阶段高效地实现,并为用户提供相同的应用程序。出于这个原因,对如何以及何时通过通信接口进行传输的低级控制应该支持不受限制的网络。接口是可变长度消息的最低级别可靠交换。甘露多机通信接口。
{"title":"A Symmetrical Communication Interface for Distributed-Memory Computers","authors":"Peter Steenkiste","doi":"10.1109/DMCC.1991.633140","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633140","url":null,"abstract":"example, both sends and receives can operate on both Applications have very diverse communication local and remote buffers. Although this communication requirements. Although individual algorithms often use model does not correspond directly to the low-level regular communication patterns, there is little regularity communication primitives supported by the hardware, it across applications or even across different phases of the can be implemented efficiently, and it gives the users same application. For this reason, a low-level more control over how and when transfers over the communication interface should support the unrestricted, network takes place. The interface is the lowest-level reliable exchange of variable-length messages. communication interface for the Nectar multicomputer.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131637154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Mapping Techniques for Parallel 3D Coronary Arteriography 平行三维冠状动脉造影的制图技术
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633344
A. Sarwal, F. Ozguner, J. Ramanathan
The paper investigates schemes f o r implementing the 3 0 reconstruction of the Coronary Ar te r i e s o n a! MIMD sys t em. The performance of ihe: s y s t em f o r calculating the 3 0 descript ion of the uri'erial tree i s redated t o the mapping strategy selecte,d. The image processing algorithms can be parallelized t o provide fa-, vorable performance for the complete computat ion cy-, d e . Results are provided for t w o mappzng approaches; for an X r a y image, and an extension is proposed f o r the mult iv iew case.
本文探讨了冠状动脉重建的实施方案。MIMD对它们进行了描述。该方法的性能主要取决于计算uri材料树的30个描述,这与所选择的映射策略d有关。图像处理算法可以并行化,从而为完整的计算速度提供更好的性能。给出了两种映射方法的结果;对于一个X或a或y图像,并提出了多视图情况下的扩展。
{"title":"Mapping Techniques for Parallel 3D Coronary Arteriography","authors":"A. Sarwal, F. Ozguner, J. Ramanathan","doi":"10.1109/DMCC.1991.633344","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633344","url":null,"abstract":"The paper investigates schemes f o r implementing the 3 0 reconstruction of the Coronary Ar te r i e s o n a! MIMD sys t em. The performance of ihe: s y s t em f o r calculating the 3 0 descript ion of the uri'erial tree i s redated t o the mapping strategy selecte,d. The image processing algorithms can be parallelized t o provide fa-, vorable performance for the complete computat ion cy-, d e . Results are provided for t w o mappzng approaches; for an X r a y image, and an extension is proposed f o r the mult iv iew case.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134078324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Support for Data Distribution 自动支持数据分发
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633085
B. Chapman, H. Herbeck, H. Zima
A bst rac t I n current automatic parallelizlation systems for distributed-memory machines, the user must explicitly specify how the d a t a domain of the :iequential program is t o be partitioned and mapped to the processors. I n this paper, we outline the salient features of a new knowledge-based software tool that provides automatic support f o r this task. The basic guidelines f o r the design of the tool are discussed, and its major components are described.
在当前用于分布式内存机器的自动并行化系统中,最重要的一点是用户必须显式地指定如何将d分配到等价程序的一个域中,并将其分区和映射到处理器。在本文中,我们概述了一个新的基于知识的软件工具的显著特征,该工具为该任务提供了自动支持。讨论了该工具设计的基本原则,并对其主要部件进行了描述。
{"title":"Automatic Support for Data Distribution","authors":"B. Chapman, H. Herbeck, H. Zima","doi":"10.1109/DMCC.1991.633085","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633085","url":null,"abstract":"A bst rac t I n current automatic parallelizlation systems for distributed-memory machines, the user must explicitly specify how the d a t a domain of the :iequential program is t o be partitioned and mapped to the processors. I n this paper, we outline the salient features of a new knowledge-based software tool that provides automatic support f o r this task. The basic guidelines f o r the design of the tool are discussed, and its major components are described.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134417442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Approximate Analysis of the Binary d-Cube Network 二元d-Cube网络的近似分析
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633310
D.S. Holtsinger, E. Gehringer
Distributed memory computers require eficient, high-bandwidth networks to support fine-grain computation. In developing an analytic model .for 0 network, the underlying details of the aschitectiure are often abstracted to simplify the model and to facilitate a comparison with other networks. As a rt!sult i t becomes dificult to compare the relative meids of the architectural features in a particularr network. In this paper we present a detailed analysis of the binary dcube network. Our model has been shown to provide results that are very similar to those derived from a detailed simulation model. Among other things, our analysis shows that srnall increases in routing latency can significantly degrade throughput, but does not degrade the probability of acceptance of a mqessage. It atso shows that just a few buffers 0ii heavily congested destination links can improve performance greatly, almost as much as bzqfering on all destinalion links.
分布式存储计算机需要高效、高带宽的网络来支持细粒度计算。在为网络开发分析模型时,体系结构的底层细节通常被抽象出来,以简化模型并便于与其他网络进行比较。作为一个rt!结果:在一个特定的网络中,比较结构特征的相对媒介变得困难。本文对二进制dcube网络进行了详细的分析。我们的模型已被证明提供的结果与从详细的模拟模型中得出的结果非常相似。除此之外,我们的分析表明,路由延迟的小幅增加会显著降低吞吐量,但不会降低接受消息的概率。它还表明,仅在严重拥塞的目标链路上设置几个缓冲区就可以极大地提高性能,几乎与在所有目标链路上设置缓冲区一样多。
{"title":"Approximate Analysis of the Binary d-Cube Network","authors":"D.S. Holtsinger, E. Gehringer","doi":"10.1109/DMCC.1991.633310","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633310","url":null,"abstract":"Distributed memory computers require eficient, high-bandwidth networks to support fine-grain computation. In developing an analytic model .for 0 network, the underlying details of the aschitectiure are often abstracted to simplify the model and to facilitate a comparison with other networks. As a rt!sult i t becomes dificult to compare the relative meids of the architectural features in a particularr network. In this paper we present a detailed analysis of the binary dcube network. Our model has been shown to provide results that are very similar to those derived from a detailed simulation model. Among other things, our analysis shows that srnall increases in routing latency can significantly degrade throughput, but does not degrade the probability of acceptance of a mqessage. It atso shows that just a few buffers 0ii heavily congested destination links can improve performance greatly, almost as much as bzqfering on all destinalion links.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130974227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Sounds of Parallel Programs 并行程序的声音
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633316
J. Francioni, J. A. Jackson, L. Albright
Portraying the behavior of parallel programs is useful in pro8ram debuming and performance tuning. For the most part, researchers have focused on finding ways to visualize what happens during a program's execution. As an alternative to visualization, auralization can also be used to portray the behavior of parallel programs. This paper investigates whether or not sound can be used effectively t o depict dzferent events that take place during a parallel proBram's execution. In particular, we focus this discussion on distributedmemory parallel programs. Three mappings of execution behavior to sound were studied. ?'he first mapping tracks the load balance of the processors of a system. In the second mapping, the jlows-of-control of the parallel processes are mapped to related sounds. The third mapping is related t o process communication in a distributed-memory parallel program.
描述并行程序的行为在程序调试和性能调优中很有用。在很大程度上,研究人员专注于寻找可视化程序执行过程中发生的事情的方法。作为可视化的替代方案,可视化也可以用来描述并行程序的行为。本文研究了声音是否可以有效地用来描述并行程序执行过程中发生的不同事件。特别地,我们集中讨论分布式内存并行程序。研究了执行行为与声音的三种映射关系。第一个映射跟踪系统处理器的负载平衡。在第二个映射中,并行进程的控制颚被映射到相关的声音。第三个映射与分布式内存并行程序中的进程通信有关。
{"title":"The Sounds of Parallel Programs","authors":"J. Francioni, J. A. Jackson, L. Albright","doi":"10.1109/DMCC.1991.633316","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633316","url":null,"abstract":"Portraying the behavior of parallel programs is useful in pro8ram debuming and performance tuning. For the most part, researchers have focused on finding ways to visualize what happens during a program's execution. As an alternative to visualization, auralization can also be used to portray the behavior of parallel programs. This paper investigates whether or not sound can be used effectively t o depict dzferent events that take place during a parallel proBram's execution. In particular, we focus this discussion on distributedmemory parallel programs. Three mappings of execution behavior to sound were studied. ?'he first mapping tracks the load balance of the processors of a system. In the second mapping, the jlows-of-control of the parallel processes are mapped to related sounds. The third mapping is related t o process communication in a distributed-memory parallel program.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131090668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Hypertasking Support for Dynamically Redistributable and Resizeable Arrays on the iPSC iPSC上对动态可重分发和可调整大小数组的超任务支持
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633086
M. Baber
Static allocations of arrays on multicomputers have two major shortcomings. First, algorithms often employ more than one referencepattern for a given array, resulting in the need for more than one mapping between the array elements and the multicomputer nodes. Secondly, it is desirable to provide easily resizeable arrays, especially for multigrid algorithms. This paper describes extensions to the hypertasking paracompiler which provide both dynamically resizeable and redistributable arrays. Hypertasking is a parallel programming tool that transforms C programs containing comment-directives into SPMD Cprogirams that can be run on any size hypercube without recompilation for each cube size. Introduction This paper describes extensions tc~ hypertasking [ 11, a domain decomposition tool that operates on commentdirectives inserted into ordinary sequential C source code. The extensions support run-time redistribution and resizing of arrays. Hypertasking is one of seveial projects [4,5,6,8] that have proposed or produced sourceto-source compilers for parallel architectures. I refer to this class of software tools as paracompilers to distinguish them from the sequential source-to-object compilers they are built upon. A fundamental question for paracompiler designers is whether to make decisions about data and control decomposition at compile-time or at ruin-time. If decisions are made at compile-time, the logic does not have to be repeated every time the program is executed and it is possible to optimize the code for known parameters. * Supported in part by: Defense Advanced Research Projects Agency Information Science and Technology Office Research in Concurrent Computing Systems ARPA Order No. 6402.6402-1; Program Code No. 8E20 & 9E20 Issued by DARPAKMO under Contract #&IDA-972-89-C-0034 Unfortunately, compile-time decisions are also inflexible. Hypertasking nnakes all significant decisions about decomposition at ]run-time. A run-time initialization routine is called by each node to assign values to the members of an amay definition structure. The C code generated by the paracompiler references the values in the structure instead of constants chosen at compile-time. The resulting code is surprisingly efficient. Furthermore, because it is relatively straightforward to change the decomposition variables in the array definition structure, run -ti me decomposition great 1 y facilitates the implementation of dynamic array resizing and redistribution features such as those described in this paper. This paper will begin with an overview of the Hypertasking programming model to provide a framework for the new features. Beginning with redistributable arrays, the purpose and performance of the new features are discussed with reference to example programs. Finally, conclusions and goals for future research are presented. Hypertasking overview Hypertasking is; designed to make it easy for software developers to port their existing data parallel applications to a m
REDISTRIBUTE指令与原来的ARRAY指令类似,不同之处在于它是可执行的而不是声明性的。参数是相同的,允许用户指定保护包装器的厚度以及是否分配数组的每个维度。可以在单个节点上运行,作为加速的参考。C编译器和链接器
{"title":"Hypertasking Support for Dynamically Redistributable and Resizeable Arrays on the iPSC","authors":"M. Baber","doi":"10.1109/DMCC.1991.633086","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633086","url":null,"abstract":"Static allocations of arrays on multicomputers have two major shortcomings. First, algorithms often employ more than one referencepattern for a given array, resulting in the need for more than one mapping between the array elements and the multicomputer nodes. Secondly, it is desirable to provide easily resizeable arrays, especially for multigrid algorithms. This paper describes extensions to the hypertasking paracompiler which provide both dynamically resizeable and redistributable arrays. Hypertasking is a parallel programming tool that transforms C programs containing comment-directives into SPMD Cprogirams that can be run on any size hypercube without recompilation for each cube size. Introduction This paper describes extensions tc~ hypertasking [ 11, a domain decomposition tool that operates on commentdirectives inserted into ordinary sequential C source code. The extensions support run-time redistribution and resizing of arrays. Hypertasking is one of seveial projects [4,5,6,8] that have proposed or produced sourceto-source compilers for parallel architectures. I refer to this class of software tools as paracompilers to distinguish them from the sequential source-to-object compilers they are built upon. A fundamental question for paracompiler designers is whether to make decisions about data and control decomposition at compile-time or at ruin-time. If decisions are made at compile-time, the logic does not have to be repeated every time the program is executed and it is possible to optimize the code for known parameters. * Supported in part by: Defense Advanced Research Projects Agency Information Science and Technology Office Research in Concurrent Computing Systems ARPA Order No. 6402.6402-1; Program Code No. 8E20 & 9E20 Issued by DARPAKMO under Contract #&IDA-972-89-C-0034 Unfortunately, compile-time decisions are also inflexible. Hypertasking nnakes all significant decisions about decomposition at ]run-time. A run-time initialization routine is called by each node to assign values to the members of an amay definition structure. The C code generated by the paracompiler references the values in the structure instead of constants chosen at compile-time. The resulting code is surprisingly efficient. Furthermore, because it is relatively straightforward to change the decomposition variables in the array definition structure, run -ti me decomposition great 1 y facilitates the implementation of dynamic array resizing and redistribution features such as those described in this paper. This paper will begin with an overview of the Hypertasking programming model to provide a framework for the new features. Beginning with redistributable arrays, the purpose and performance of the new features are discussed with reference to example programs. Finally, conclusions and goals for future research are presented. Hypertasking overview Hypertasking is; designed to make it easy for software developers to port their existing data parallel applications to a m","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133137719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Hypercube Vs Cube-Connected Cycles: A Topological Evaluation 超立方与立方连接环:一个拓扑评价
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633358
S. Kambhatla
Hypercubes and cube-connected cycles di'er in the number of links per node which has fundamental implications on several issues including performance and ease of implementation. In this paper, we evaluate these networks with respect to a number of parameters including several topological characterizations, fault-tolerance, various broadcast and point-to-point communication primitives. In the process we also derive several lower bound figures and describe algorithms for communication in cube-connected cycles. We conclude that while having lower number of links per node in a CCC might not degrade performance drastically (especially for lowe,r dimensions) as compared to a hypercube of a similar size, this feature has several consequences which substantially aid its (VLSI and non- VLSI) implementation.
超立方体和立方体连接循环在每个节点的链接数量上有差异,这对几个问题有根本性的影响,包括性能和实现的便利性。在本文中,我们评估了这些网络的一些参数,包括一些拓扑特征,容错性,各种广播和点对点通信原语。在此过程中,我们还推导了几个下界图,并描述了在立方体连接循环中通信的算法。我们得出的结论是,虽然与类似大小的超立方体相比,CCC中每个节点的链路数量较低可能不会大幅降低性能(特别是对于低维度),但这一特性有几个后果,这大大有助于其(VLSI和非VLSI)的实现。
{"title":"Hypercube Vs Cube-Connected Cycles: A Topological Evaluation","authors":"S. Kambhatla","doi":"10.1109/DMCC.1991.633358","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633358","url":null,"abstract":"Hypercubes and cube-connected cycles di'er in the number of links per node which has fundamental implications on several issues including performance and ease of implementation. In this paper, we evaluate these networks with respect to a number of parameters including several topological characterizations, fault-tolerance, various broadcast and point-to-point communication primitives. In the process we also derive several lower bound figures and describe algorithms for communication in cube-connected cycles. We conclude that while having lower number of links per node in a CCC might not degrade performance drastically (especially for lowe,r dimensions) as compared to a hypercube of a similar size, this feature has several consequences which substantially aid its (VLSI and non- VLSI) implementation.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122289023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The Finite Difference Solution of Two- and Three-Dimensional Semiconductor Problems on the Connection Machine 二维和三维半导体问题在连接机上的有限差分解
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633216
K. Dalton, E. Hensel, S. Castillo, K. Ng
A study of the finite difSerence solution of the nonlinear partial differential equations governing twoand three-dimensional semiconductor devices is conducted on a SIMD computer. This nonlinear system is solved using Jacobi iteration and successive-under-relaxation. Row scaling and a zero order regularizer are used to aid in convergence. On a 16K CM-2 problems with up to 16.7 million unknowns have been solved. Problems of this size have not previously been reported. The ability to accurately model larger and more realistic three-dimensional devices is necessary to gain a greater physical understanding of their behavior.
在SIMD计算机上对二维和三维半导体器件非线性偏微分方程的有限差分解进行了研究。采用雅可比迭代法和连续欠松弛法求解该非线性系统。使用行缩放和零阶正则化器来帮助收敛。在16K CM-2上,已经解决了多达1670万个未知数的问题。这种规模的问题以前从未报道过。为了对它们的行为有更深入的物理理解,精确地模拟更大、更逼真的三维装置的能力是必要的。
{"title":"The Finite Difference Solution of Two- and Three-Dimensional Semiconductor Problems on the Connection Machine","authors":"K. Dalton, E. Hensel, S. Castillo, K. Ng","doi":"10.1109/DMCC.1991.633216","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633216","url":null,"abstract":"A study of the finite difSerence solution of the nonlinear partial differential equations governing twoand three-dimensional semiconductor devices is conducted on a SIMD computer. This nonlinear system is solved using Jacobi iteration and successive-under-relaxation. Row scaling and a zero order regularizer are used to aid in convergence. On a 16K CM-2 problems with up to 16.7 million unknowns have been solved. Problems of this size have not previously been reported. The ability to accurately model larger and more realistic three-dimensional devices is necessary to gain a greater physical understanding of their behavior.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127401838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Parallel Execution of IDA on Shared and Distributed Memory Multiprocessors 共享和分布式内存多处理器上IDA的高效并行执行
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633162
V. Saletore, L. Kalé
{"title":"Efficient Parallel Execution of IDA on Shared and Distributed Memory Multiprocessors","authors":"V. Saletore, L. Kalé","doi":"10.1109/DMCC.1991.633162","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633162","url":null,"abstract":"","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116563839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Optimal All-to-All Personalized Communication with Minimum Span on Boolean Cubes 布尔多维数据集上最小跨度的最优全对全个性化通信
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633150
S. Johnsson, Ching-Tien Ho
All-to-all personalized communication is a class, of permutations in which each processor sends a unique message to every other processor. We present optimal algorithms for concurrent communication on all channels in Boolean cube networks, both for the case with a single permutation, and the case where multiple permutations shall be performed on the same local data set, but on different sets of processors. For K elements per processor our algorithms give the optimal number of elements transfer, K/2. For a succession of all-to-all personalized communications on disjoint subcubes of p dimensions each, our best algorithm yields $.+c-p element exchanges in sequence, where cr is the total number of processor dimensions in the permutation. An implementation on the Connection Machine of one of the algorithms offers a maximum speed-up of 50% compared to the previously best known algorithm.
所有到所有的个性化通信是一种排列,其中每个处理器向每个其他处理器发送唯一的消息。我们提出了在布尔立方体网络中所有通道上的并发通信的最佳算法,既适用于单个排列的情况,也适用于在相同的本地数据集上执行多个排列的情况,但在不同的处理器集上。对于每个处理器K个元素,我们的算法给出了传输元素的最佳数量,K/2。对于每个p维的不相交子数据集上的所有对所有个性化通信的连续,我们的最佳算法产生$。+c-p元素按顺序交换,其中cr为该排列中的处理器尺寸总数。其中一种算法在连接机上的实现与之前最著名的算法相比,提供了50%的最大速度提升。
{"title":"Optimal All-to-All Personalized Communication with Minimum Span on Boolean Cubes","authors":"S. Johnsson, Ching-Tien Ho","doi":"10.1109/DMCC.1991.633150","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633150","url":null,"abstract":"All-to-all personalized communication is a class, of permutations in which each processor sends a unique message to every other processor. We present optimal algorithms for concurrent communication on all channels in Boolean cube networks, both for the case with a single permutation, and the case where multiple permutations shall be performed on the same local data set, but on different sets of processors. For K elements per processor our algorithms give the optimal number of elements transfer, K/2. For a succession of all-to-all personalized communications on disjoint subcubes of p dimensions each, our best algorithm yields $.+c-p element exchanges in sequence, where cr is the total number of processor dimensions in the permutation. An implementation on the Connection Machine of one of the algorithms offers a maximum speed-up of 50% compared to the previously best known algorithm.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128596445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
期刊
The Sixth Distributed Memory Computing Conference, 1991. Proceedings
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1