首页 > 最新文献

The Sixth Distributed Memory Computing Conference, 1991. Proceedings最新文献

英文 中文
The Sounds of Parallel Programs 并行程序的声音
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633316
J. Francioni, J. A. Jackson, L. Albright
Portraying the behavior of parallel programs is useful in pro8ram debuming and performance tuning. For the most part, researchers have focused on finding ways to visualize what happens during a program's execution. As an alternative to visualization, auralization can also be used to portray the behavior of parallel programs. This paper investigates whether or not sound can be used effectively t o depict dzferent events that take place during a parallel proBram's execution. In particular, we focus this discussion on distributedmemory parallel programs. Three mappings of execution behavior to sound were studied. ?'he first mapping tracks the load balance of the processors of a system. In the second mapping, the jlows-of-control of the parallel processes are mapped to related sounds. The third mapping is related t o process communication in a distributed-memory parallel program.
描述并行程序的行为在程序调试和性能调优中很有用。在很大程度上,研究人员专注于寻找可视化程序执行过程中发生的事情的方法。作为可视化的替代方案,可视化也可以用来描述并行程序的行为。本文研究了声音是否可以有效地用来描述并行程序执行过程中发生的不同事件。特别地,我们集中讨论分布式内存并行程序。研究了执行行为与声音的三种映射关系。第一个映射跟踪系统处理器的负载平衡。在第二个映射中,并行进程的控制颚被映射到相关的声音。第三个映射与分布式内存并行程序中的进程通信有关。
{"title":"The Sounds of Parallel Programs","authors":"J. Francioni, J. A. Jackson, L. Albright","doi":"10.1109/DMCC.1991.633316","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633316","url":null,"abstract":"Portraying the behavior of parallel programs is useful in pro8ram debuming and performance tuning. For the most part, researchers have focused on finding ways to visualize what happens during a program's execution. As an alternative to visualization, auralization can also be used to portray the behavior of parallel programs. This paper investigates whether or not sound can be used effectively t o depict dzferent events that take place during a parallel proBram's execution. In particular, we focus this discussion on distributedmemory parallel programs. Three mappings of execution behavior to sound were studied. ?'he first mapping tracks the load balance of the processors of a system. In the second mapping, the jlows-of-control of the parallel processes are mapped to related sounds. The third mapping is related t o process communication in a distributed-memory parallel program.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131090668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies 超立方体和网格拓扑中的高效全对全通信模式
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633174
D. Scott
Some application programs on distributed memory parallel computers occasionally require an "all-to-all" communication pattern, where each compute node must send a distinct message to each other compute node. Assuming that each node can send and receive only one message at a t ime, the all-to-all pattern must be implemented as a sequence of phases in which certain nodes send and receive messages. r f there are p compute nodes, then at least p-1 phases are needed to complete the operation. A proof of a schedule achieving this lower bound on a circuit switched hypercube with fuced routing is given. This lower bound cannot be achieved on a 2 dimensional mesh. On an axa mesh, dl4 is shown to be a lower bound and a schedule with this number of phases is given. Whether hypercubes or meshes are better for this algorithm depends on the relative bandwidths of the communication channels.
分布式内存并行计算机上的一些应用程序偶尔需要“全对全”通信模式,其中每个计算节点必须向其他计算节点发送不同的消息。假设每个节点一次只能发送和接收一条消息,那么所有到所有模式必须实现为一系列阶段,在这些阶段中,某些节点发送和接收消息。R如果有p个计算节点,则至少需要p-1个阶段来完成操作。给出了一个在带混合路由的电路交换超立方体上实现该下界的调度的证明。这个下界不能在二维网格上实现。在axa网格上,dl4被证明是一个下界,并给出了一个具有这个阶段数的时间表。超立方体和网格孰优孰弱取决于通信信道的相对带宽。
{"title":"Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies","authors":"D. Scott","doi":"10.1109/DMCC.1991.633174","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633174","url":null,"abstract":"Some application programs on distributed memory parallel computers occasionally require an \"all-to-all\" communication pattern, where each compute node must send a distinct message to each other compute node. Assuming that each node can send and receive only one message at a t ime, the all-to-all pattern must be implemented as a sequence of phases in which certain nodes send and receive messages. r f there are p compute nodes, then at least p-1 phases are needed to complete the operation. A proof of a schedule achieving this lower bound on a circuit switched hypercube with fuced routing is given. This lower bound cannot be achieved on a 2 dimensional mesh. On an axa mesh, dl4 is shown to be a lower bound and a schedule with this number of phases is given. Whether hypercubes or meshes are better for this algorithm depends on the relative bandwidths of the communication channels.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132329335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 146
A Symmetrical Communication Interface for Distributed-Memory Computers 分布式存储计算机的对称通信接口
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633140
Peter Steenkiste
example, both sends and receives can operate on both Applications have very diverse communication local and remote buffers. Although this communication requirements. Although individual algorithms often use model does not correspond directly to the low-level regular communication patterns, there is little regularity communication primitives supported by the hardware, it across applications or even across different phases of the can be implemented efficiently, and it gives the users same application. For this reason, a low-level more control over how and when transfers over the communication interface should support the unrestricted, network takes place. The interface is the lowest-level reliable exchange of variable-length messages. communication interface for the Nectar multicomputer.
例如,发送和接收都可以在两个应用程序上操作,应用程序具有非常不同的通信本地和远程缓冲区。虽然这种通信要求。虽然个别算法经常使用的模型并不直接对应底层的规则通信模式,很少有硬件支持的规则通信原语,但它可以跨应用程序甚至跨不同阶段高效地实现,并为用户提供相同的应用程序。出于这个原因,对如何以及何时通过通信接口进行传输的低级控制应该支持不受限制的网络。接口是可变长度消息的最低级别可靠交换。甘露多机通信接口。
{"title":"A Symmetrical Communication Interface for Distributed-Memory Computers","authors":"Peter Steenkiste","doi":"10.1109/DMCC.1991.633140","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633140","url":null,"abstract":"example, both sends and receives can operate on both Applications have very diverse communication local and remote buffers. Although this communication requirements. Although individual algorithms often use model does not correspond directly to the low-level regular communication patterns, there is little regularity communication primitives supported by the hardware, it across applications or even across different phases of the can be implemented efficiently, and it gives the users same application. For this reason, a low-level more control over how and when transfers over the communication interface should support the unrestricted, network takes place. The interface is the lowest-level reliable exchange of variable-length messages. communication interface for the Nectar multicomputer.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131637154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Hypertasking Support for Dynamically Redistributable and Resizeable Arrays on the iPSC iPSC上对动态可重分发和可调整大小数组的超任务支持
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633086
M. Baber
Static allocations of arrays on multicomputers have two major shortcomings. First, algorithms often employ more than one referencepattern for a given array, resulting in the need for more than one mapping between the array elements and the multicomputer nodes. Secondly, it is desirable to provide easily resizeable arrays, especially for multigrid algorithms. This paper describes extensions to the hypertasking paracompiler which provide both dynamically resizeable and redistributable arrays. Hypertasking is a parallel programming tool that transforms C programs containing comment-directives into SPMD Cprogirams that can be run on any size hypercube without recompilation for each cube size. Introduction This paper describes extensions tc~ hypertasking [ 11, a domain decomposition tool that operates on commentdirectives inserted into ordinary sequential C source code. The extensions support run-time redistribution and resizing of arrays. Hypertasking is one of seveial projects [4,5,6,8] that have proposed or produced sourceto-source compilers for parallel architectures. I refer to this class of software tools as paracompilers to distinguish them from the sequential source-to-object compilers they are built upon. A fundamental question for paracompiler designers is whether to make decisions about data and control decomposition at compile-time or at ruin-time. If decisions are made at compile-time, the logic does not have to be repeated every time the program is executed and it is possible to optimize the code for known parameters. * Supported in part by: Defense Advanced Research Projects Agency Information Science and Technology Office Research in Concurrent Computing Systems ARPA Order No. 6402.6402-1; Program Code No. 8E20 & 9E20 Issued by DARPAKMO under Contract #&IDA-972-89-C-0034 Unfortunately, compile-time decisions are also inflexible. Hypertasking nnakes all significant decisions about decomposition at ]run-time. A run-time initialization routine is called by each node to assign values to the members of an amay definition structure. The C code generated by the paracompiler references the values in the structure instead of constants chosen at compile-time. The resulting code is surprisingly efficient. Furthermore, because it is relatively straightforward to change the decomposition variables in the array definition structure, run -ti me decomposition great 1 y facilitates the implementation of dynamic array resizing and redistribution features such as those described in this paper. This paper will begin with an overview of the Hypertasking programming model to provide a framework for the new features. Beginning with redistributable arrays, the purpose and performance of the new features are discussed with reference to example programs. Finally, conclusions and goals for future research are presented. Hypertasking overview Hypertasking is; designed to make it easy for software developers to port their existing data parallel applications to a m
REDISTRIBUTE指令与原来的ARRAY指令类似,不同之处在于它是可执行的而不是声明性的。参数是相同的,允许用户指定保护包装器的厚度以及是否分配数组的每个维度。可以在单个节点上运行,作为加速的参考。C编译器和链接器
{"title":"Hypertasking Support for Dynamically Redistributable and Resizeable Arrays on the iPSC","authors":"M. Baber","doi":"10.1109/DMCC.1991.633086","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633086","url":null,"abstract":"Static allocations of arrays on multicomputers have two major shortcomings. First, algorithms often employ more than one referencepattern for a given array, resulting in the need for more than one mapping between the array elements and the multicomputer nodes. Secondly, it is desirable to provide easily resizeable arrays, especially for multigrid algorithms. This paper describes extensions to the hypertasking paracompiler which provide both dynamically resizeable and redistributable arrays. Hypertasking is a parallel programming tool that transforms C programs containing comment-directives into SPMD Cprogirams that can be run on any size hypercube without recompilation for each cube size. Introduction This paper describes extensions tc~ hypertasking [ 11, a domain decomposition tool that operates on commentdirectives inserted into ordinary sequential C source code. The extensions support run-time redistribution and resizing of arrays. Hypertasking is one of seveial projects [4,5,6,8] that have proposed or produced sourceto-source compilers for parallel architectures. I refer to this class of software tools as paracompilers to distinguish them from the sequential source-to-object compilers they are built upon. A fundamental question for paracompiler designers is whether to make decisions about data and control decomposition at compile-time or at ruin-time. If decisions are made at compile-time, the logic does not have to be repeated every time the program is executed and it is possible to optimize the code for known parameters. * Supported in part by: Defense Advanced Research Projects Agency Information Science and Technology Office Research in Concurrent Computing Systems ARPA Order No. 6402.6402-1; Program Code No. 8E20 & 9E20 Issued by DARPAKMO under Contract #&IDA-972-89-C-0034 Unfortunately, compile-time decisions are also inflexible. Hypertasking nnakes all significant decisions about decomposition at ]run-time. A run-time initialization routine is called by each node to assign values to the members of an amay definition structure. The C code generated by the paracompiler references the values in the structure instead of constants chosen at compile-time. The resulting code is surprisingly efficient. Furthermore, because it is relatively straightforward to change the decomposition variables in the array definition structure, run -ti me decomposition great 1 y facilitates the implementation of dynamic array resizing and redistribution features such as those described in this paper. This paper will begin with an overview of the Hypertasking programming model to provide a framework for the new features. Beginning with redistributable arrays, the purpose and performance of the new features are discussed with reference to example programs. Finally, conclusions and goals for future research are presented. Hypertasking overview Hypertasking is; designed to make it easy for software developers to port their existing data parallel applications to a m","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133137719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Mapping Techniques for Parallel 3D Coronary Arteriography 平行三维冠状动脉造影的制图技术
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633344
A. Sarwal, F. Ozguner, J. Ramanathan
The paper investigates schemes f o r implementing the 3 0 reconstruction of the Coronary Ar te r i e s o n a! MIMD sys t em. The performance of ihe: s y s t em f o r calculating the 3 0 descript ion of the uri'erial tree i s redated t o the mapping strategy selecte,d. The image processing algorithms can be parallelized t o provide fa-, vorable performance for the complete computat ion cy-, d e . Results are provided for t w o mappzng approaches; for an X r a y image, and an extension is proposed f o r the mult iv iew case.
本文探讨了冠状动脉重建的实施方案。MIMD对它们进行了描述。该方法的性能主要取决于计算uri材料树的30个描述,这与所选择的映射策略d有关。图像处理算法可以并行化,从而为完整的计算速度提供更好的性能。给出了两种映射方法的结果;对于一个X或a或y图像,并提出了多视图情况下的扩展。
{"title":"Mapping Techniques for Parallel 3D Coronary Arteriography","authors":"A. Sarwal, F. Ozguner, J. Ramanathan","doi":"10.1109/DMCC.1991.633344","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633344","url":null,"abstract":"The paper investigates schemes f o r implementing the 3 0 reconstruction of the Coronary Ar te r i e s o n a! MIMD sys t em. The performance of ihe: s y s t em f o r calculating the 3 0 descript ion of the uri'erial tree i s redated t o the mapping strategy selecte,d. The image processing algorithms can be parallelized t o provide fa-, vorable performance for the complete computat ion cy-, d e . Results are provided for t w o mappzng approaches; for an X r a y image, and an extension is proposed f o r the mult iv iew case.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134078324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Implementing the Perfect ARC2D Benchmark on the BBN TC2000 Parallel Supercomputer 在BBN TC2000并行超级计算机上实现完美的ARC2D基准
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633200
S. Breit
The TC.2000 is a MIMD parallel processor wi,th memory that is physically distributed memory, but logically shared. Interprocessor covnmunication, and therefore access to shared memory, is sufficiently fast that most applications can be ported to the TC.2000 without rewriting the code from scratch. This paper shows how this was done for the Perfect ARC'2D benchmark. The code was first restructured by changing the order of subroutine calls so that interprocessor communication would be reduced to the equivalent of three full transposes ofthe data per iteration. The parallel implementation was then completed by inserting shared data declarations and parallel extensions provided by the TC.2000 Fortran language. Thi:F approach was easier to implement than a domain decomposition technique, but requires more interprocessor communication. It is feasible only (because of the TC.2000'~ highspeed interprocessor communications network. References to shared memory take about 25% of the totai execution time for the parallel version of ARC2D. an acceptable amount considering the code did not have to be completely rewritten. High parallel efficiency was obtained using up
TC.2000是一种MIMD并行处理器,其内存在物理上是分布式的,但在逻辑上是共享的。处理器间通信以及对共享内存的访问速度足够快,因此大多数应用程序都可以移植到TC.2000上,而无需从头重写代码。本文将展示如何在《Perfect ARC》的2D基准测试中实现这一点。代码首先通过改变子程序调用的顺序进行重组,这样处理器间的通信将减少到相当于每次迭代三次完整的数据转置。然后通过插入共享数据声明和由TC.2000 Fortran语言提供的并行扩展来完成并行实现。这种方法比域分解技术更容易实现,但需要更多的处理器间通信。由于有TC.2000的高速处理器间通信网络,这是可行的。对于并行版本的ARC2D,对共享内存的引用大约占用总执行时间的25%。考虑到代码不必完全重写,这是一个可接受的数量。利用up获得了较高的并行效率
{"title":"Implementing the Perfect ARC2D Benchmark on the BBN TC2000 Parallel Supercomputer","authors":"S. Breit","doi":"10.1109/DMCC.1991.633200","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633200","url":null,"abstract":"The TC.2000 is a MIMD parallel processor wi,th memory that is physically distributed memory, but logically shared. Interprocessor covnmunication, and therefore access to shared memory, is sufficiently fast that most applications can be ported to the TC.2000 without rewriting the code from scratch. This paper shows how this was done for the Perfect ARC'2D benchmark. The code was first restructured by changing the order of subroutine calls so that interprocessor communication would be reduced to the equivalent of three full transposes ofthe data per iteration. The parallel implementation was then completed by inserting shared data declarations and parallel extensions provided by the TC.2000 Fortran language. Thi:F approach was easier to implement than a domain decomposition technique, but requires more interprocessor communication. It is feasible only (because of the TC.2000'~ highspeed interprocessor communications network. References to shared memory take about 25% of the totai execution time for the parallel version of ARC2D. an acceptable amount considering the code did not have to be completely rewritten. High parallel efficiency was obtained using up","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130116534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Optimal All-to-All Personalized Communication with Minimum Span on Boolean Cubes 布尔多维数据集上最小跨度的最优全对全个性化通信
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633150
S. Johnsson, Ching-Tien Ho
All-to-all personalized communication is a class, of permutations in which each processor sends a unique message to every other processor. We present optimal algorithms for concurrent communication on all channels in Boolean cube networks, both for the case with a single permutation, and the case where multiple permutations shall be performed on the same local data set, but on different sets of processors. For K elements per processor our algorithms give the optimal number of elements transfer, K/2. For a succession of all-to-all personalized communications on disjoint subcubes of p dimensions each, our best algorithm yields $.+c-p element exchanges in sequence, where cr is the total number of processor dimensions in the permutation. An implementation on the Connection Machine of one of the algorithms offers a maximum speed-up of 50% compared to the previously best known algorithm.
所有到所有的个性化通信是一种排列,其中每个处理器向每个其他处理器发送唯一的消息。我们提出了在布尔立方体网络中所有通道上的并发通信的最佳算法,既适用于单个排列的情况,也适用于在相同的本地数据集上执行多个排列的情况,但在不同的处理器集上。对于每个处理器K个元素,我们的算法给出了传输元素的最佳数量,K/2。对于每个p维的不相交子数据集上的所有对所有个性化通信的连续,我们的最佳算法产生$。+c-p元素按顺序交换,其中cr为该排列中的处理器尺寸总数。其中一种算法在连接机上的实现与之前最著名的算法相比,提供了50%的最大速度提升。
{"title":"Optimal All-to-All Personalized Communication with Minimum Span on Boolean Cubes","authors":"S. Johnsson, Ching-Tien Ho","doi":"10.1109/DMCC.1991.633150","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633150","url":null,"abstract":"All-to-all personalized communication is a class, of permutations in which each processor sends a unique message to every other processor. We present optimal algorithms for concurrent communication on all channels in Boolean cube networks, both for the case with a single permutation, and the case where multiple permutations shall be performed on the same local data set, but on different sets of processors. For K elements per processor our algorithms give the optimal number of elements transfer, K/2. For a succession of all-to-all personalized communications on disjoint subcubes of p dimensions each, our best algorithm yields $.+c-p element exchanges in sequence, where cr is the total number of processor dimensions in the permutation. An implementation on the Connection Machine of one of the algorithms offers a maximum speed-up of 50% compared to the previously best known algorithm.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128596445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Performance Visualization of SLALOM 激流回旋性能可视化
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633313
D. Rover, M. B. Carter, J. Gustafson
Performance visua1,ization provides insights about the complex operation of concurrent computer systems. SLAL O W M is a scalable, fuced-time coinputer benchmark. Each corresponds to U method of computer performance evaluation: monitoring and benchmarking, respectively. Whereas benchmark programs typically report singlenumber performance naetrics for ease of comparison among different machines, a perforfinance monitor (via instrumentation and visualization) gives (a detailed account of the dynamks of program execution. Using sofrware tools developed for the nCCBE 2 and the MasPar MP-1 distributed memory machines and applied to the SLALOM program, we demonstrate the utility of performance visualization for fine-tuning algorithms and understanding phenomena. The tools include PICL and ParaGraph and custom VISTA components.
性能可视化提供了对并发计算机系统复杂操作的洞察。sql sql是一个可扩展的、耗时的计算机基准测试。分别对应计算机性能评估的U方法:监测和基准测试。为了便于在不同机器之间进行比较,基准程序通常报告单个性能指标,而性能财务监视器(通过仪器和可视化)提供了程序执行动态的详细说明。使用为nCCBE 2和MasPar MP-1分布式内存机开发的软件工具,并将其应用于SLALOM程序,我们演示了性能可视化对微调算法和理解现象的效用。这些工具包括PICL和段落以及自定义的VISTA组件。
{"title":"Performance Visualization of SLALOM","authors":"D. Rover, M. B. Carter, J. Gustafson","doi":"10.1109/DMCC.1991.633313","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633313","url":null,"abstract":"Performance visua1,ization provides insights about the complex operation of concurrent computer systems. SLAL O W M is a scalable, fuced-time coinputer benchmark. Each corresponds to U method of computer performance evaluation: monitoring and benchmarking, respectively. Whereas benchmark programs typically report singlenumber performance naetrics for ease of comparison among different machines, a perforfinance monitor (via instrumentation and visualization) gives (a detailed account of the dynamks of program execution. Using sofrware tools developed for the nCCBE 2 and the MasPar MP-1 distributed memory machines and applied to the SLALOM program, we demonstrate the utility of performance visualization for fine-tuning algorithms and understanding phenomena. The tools include PICL and ParaGraph and custom VISTA components.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129138244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Efficient Parallel Execution of IDA on Shared and Distributed Memory Multiprocessors 共享和分布式内存多处理器上IDA的高效并行执行
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633162
V. Saletore, L. Kalé
{"title":"Efficient Parallel Execution of IDA on Shared and Distributed Memory Multiprocessors","authors":"V. Saletore, L. Kalé","doi":"10.1109/DMCC.1991.633162","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633162","url":null,"abstract":"","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116563839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Hypercube Vs Cube-Connected Cycles: A Topological Evaluation 超立方与立方连接环:一个拓扑评价
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633358
S. Kambhatla
Hypercubes and cube-connected cycles di'er in the number of links per node which has fundamental implications on several issues including performance and ease of implementation. In this paper, we evaluate these networks with respect to a number of parameters including several topological characterizations, fault-tolerance, various broadcast and point-to-point communication primitives. In the process we also derive several lower bound figures and describe algorithms for communication in cube-connected cycles. We conclude that while having lower number of links per node in a CCC might not degrade performance drastically (especially for lowe,r dimensions) as compared to a hypercube of a similar size, this feature has several consequences which substantially aid its (VLSI and non- VLSI) implementation.
超立方体和立方体连接循环在每个节点的链接数量上有差异,这对几个问题有根本性的影响,包括性能和实现的便利性。在本文中,我们评估了这些网络的一些参数,包括一些拓扑特征,容错性,各种广播和点对点通信原语。在此过程中,我们还推导了几个下界图,并描述了在立方体连接循环中通信的算法。我们得出的结论是,虽然与类似大小的超立方体相比,CCC中每个节点的链路数量较低可能不会大幅降低性能(特别是对于低维度),但这一特性有几个后果,这大大有助于其(VLSI和非VLSI)的实现。
{"title":"Hypercube Vs Cube-Connected Cycles: A Topological Evaluation","authors":"S. Kambhatla","doi":"10.1109/DMCC.1991.633358","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633358","url":null,"abstract":"Hypercubes and cube-connected cycles di'er in the number of links per node which has fundamental implications on several issues including performance and ease of implementation. In this paper, we evaluate these networks with respect to a number of parameters including several topological characterizations, fault-tolerance, various broadcast and point-to-point communication primitives. In the process we also derive several lower bound figures and describe algorithms for communication in cube-connected cycles. We conclude that while having lower number of links per node in a CCC might not degrade performance drastically (especially for lowe,r dimensions) as compared to a hypercube of a similar size, this feature has several consequences which substantially aid its (VLSI and non- VLSI) implementation.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122289023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
The Sixth Distributed Memory Computing Conference, 1991. Proceedings
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1