首页 > 最新文献

The Sixth Distributed Memory Computing Conference, 1991. Proceedings最新文献

英文 中文
Adaptive Optics Calculations Using the Connection Machine 使用连接机的自适应光学计算
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633209
R. Firestone, Eric N. Opp
The performance of reflecting optical telescopes located on the surface of the earth are subject to distortions due to the force of gravity on the mirror and the turbulence of the atmosphere on the light path. Reflective optics are also planned for use in high-powered laser systems, where the intensity of the light itself is capable of producing distortions in the air within the instrument, thereby affecting the shape of the focused wavefront. A solution proposed by optical designers is the use of adaptive optics: an optical system in which the figure of the mirror is deformable to the extent necessary to correct for the distortions mentioned. An adaptive optical system uses a feedback loop concept, in which the distortions of the optical wavefront are measured, the necessary corrections are computed, and a set of actuators is moved to provide those corrections. The calculation of the corrections is computationally intense. Specifically, the measurement of the distortions provides a collection of phase differences between measuring points corresponding to the actuator positions. This set of phase differences is larger than the number of actuators, leading to an overdetermined problem. As physical systems have some amount of noise present, the technique of least-squares solution serves both to provide the best choice of actuator positions for this overdetermined problem and to suppress the noise in the measurements. The necessary algorithms for solving the computation portion of the adaptive optics problem consist of a matrix generator to derive the computational representation of the physical system, a matrix inversion routine, and a high-speed least-squares solver. In the optical astronomy paradigm, the computational requirement is for a small number of adjustments per second, due to the rate of atmospheric turbulence. For the laser system, with more stringent requirements, we demonstrate an improvement of 11 2 orders of magnitude, made possible only through the use of supercomputer methods. Extrapolation of these results indicates that even greater acceleration is possible if the interprocessor communication is minimized; in other words, supercomputer designers have not yet solved the problem of making interprocessor communication as efficient as that within processors (or, in the present case, between processors on a single chip).
位于地球表面的反射式光学望远镜,由于反射镜上的重力作用和光路上大气的湍流,其性能会受到畸变。反射光学也计划用于高功率激光系统,其中光本身的强度能够在仪器内的空气中产生扭曲,从而影响聚焦波前的形状。光学设计师提出的一种解决方案是使用自适应光学:一种光学系统,其中镜子的形状可以变形到必要的程度,以纠正所提到的畸变。自适应光学系统使用反馈回路概念,测量光波前的畸变,计算必要的校正,并移动一组致动器来提供这些校正。修正的计算需要大量的计算。具体来说,对畸变的测量提供了与致动器位置对应的测量点之间相位差的集合。这组相位差大于执行器的数量,导致过定问题。由于物理系统存在一定数量的噪声,最小二乘解决技术既可以为这种超定问题提供执行器位置的最佳选择,又可以抑制测量中的噪声。解决自适应光学问题计算部分的必要算法包括导出物理系统计算表示的矩阵生成器、矩阵反演程序和高速最小二乘求解器。在光学天文学范式中,由于大气湍流的速率,计算要求是每秒进行少量调整。对于要求更严格的激光系统,我们展示了11.2个数量级的改进,只有通过使用超级计算机方法才能实现。这些结果的外推表明,如果处理器间通信最小化,甚至可能有更大的加速;换句话说,超级计算机设计者还没有解决如何使处理器间的通信像处理器内部的通信那样高效的问题(或者,在目前的情况下,在单个芯片上的处理器之间的通信)。
{"title":"Adaptive Optics Calculations Using the Connection Machine","authors":"R. Firestone, Eric N. Opp","doi":"10.1109/DMCC.1991.633209","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633209","url":null,"abstract":"The performance of reflecting optical telescopes located on the surface of the earth are subject to distortions due to the force of gravity on the mirror and the turbulence of the atmosphere on the light path. Reflective optics are also planned for use in high-powered laser systems, where the intensity of the light itself is capable of producing distortions in the air within the instrument, thereby affecting the shape of the focused wavefront. A solution proposed by optical designers is the use of adaptive optics: an optical system in which the figure of the mirror is deformable to the extent necessary to correct for the distortions mentioned. An adaptive optical system uses a feedback loop concept, in which the distortions of the optical wavefront are measured, the necessary corrections are computed, and a set of actuators is moved to provide those corrections. The calculation of the corrections is computationally intense. Specifically, the measurement of the distortions provides a collection of phase differences between measuring points corresponding to the actuator positions. This set of phase differences is larger than the number of actuators, leading to an overdetermined problem. As physical systems have some amount of noise present, the technique of least-squares solution serves both to provide the best choice of actuator positions for this overdetermined problem and to suppress the noise in the measurements. The necessary algorithms for solving the computation portion of the adaptive optics problem consist of a matrix generator to derive the computational representation of the physical system, a matrix inversion routine, and a high-speed least-squares solver. In the optical astronomy paradigm, the computational requirement is for a small number of adjustments per second, due to the rate of atmospheric turbulence. For the laser system, with more stringent requirements, we demonstrate an improvement of 11 2 orders of magnitude, made possible only through the use of supercomputer methods. Extrapolation of these results indicates that even greater acceleration is possible if the interprocessor communication is minimized; in other words, supercomputer designers have not yet solved the problem of making interprocessor communication as efficient as that within processors (or, in the present case, between processors on a single chip).","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130910711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access 面向广义低冲突存储器访问的柔性交错存储器设计
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633349
L. S. Kaplan
High bandwidth delivery of data to the processor(s) is critical for good perforniance in highly parallel computer systems. To increase memory throughput, many systems make use of interleaved parallel memory banks. An implementation must provide uniform throughput with little or no contention at the memory banks for a wide variety of algorithms and access patterns. This paper proposes an implementation for an interleaved memory system that exhibits extremely low contention for the memoiry banks during virtually all patterned accesses. It also has the advantage that, due to its programmability, it imposes few requirements on the configuration of the machines in which it is used. The hardware to implement the design is dliscussed along with address space considerations. A variant of this design is currently in use on the BBN TC2000 (tm) parallel computer.
在高度并行的计算机系统中,向处理器提供高带宽的数据传输对于良好的性能至关重要。为了增加内存吞吐量,许多系统使用交错并行内存库。实现必须为各种各样的算法和访问模式提供统一的吞吐量,在内存库中很少或没有争用。本文提出了一种交错存储系统的实现方法,该系统在几乎所有的模式访问过程中都表现出极低的内存争用。由于它的可编程性,它还有一个优点,那就是它对使用它的机器的配置要求很少。讨论了实现该设计的硬件以及地址空间方面的考虑。这种设计的一个变体目前在BBN TC2000 (tm)并行计算机上使用。
{"title":"A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access","authors":"L. S. Kaplan","doi":"10.1109/DMCC.1991.633349","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633349","url":null,"abstract":"High bandwidth delivery of data to the processor(s) is critical for good perforniance in highly parallel computer systems. To increase memory throughput, many systems make use of interleaved parallel memory banks. An implementation must provide uniform throughput with little or no contention at the memory banks for a wide variety of algorithms and access patterns. This paper proposes an implementation for an interleaved memory system that exhibits extremely low contention for the memoiry banks during virtually all patterned accesses. It also has the advantage that, due to its programmability, it imposes few requirements on the configuration of the machines in which it is used. The hardware to implement the design is dliscussed along with address space considerations. A variant of this design is currently in use on the BBN TC2000 (tm) parallel computer.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116122453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Performance Visualization of SLALOM 激流回旋性能可视化
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633313
D. Rover, M. B. Carter, J. Gustafson
Performance visua1,ization provides insights about the complex operation of concurrent computer systems. SLAL O W M is a scalable, fuced-time coinputer benchmark. Each corresponds to U method of computer performance evaluation: monitoring and benchmarking, respectively. Whereas benchmark programs typically report singlenumber performance naetrics for ease of comparison among different machines, a perforfinance monitor (via instrumentation and visualization) gives (a detailed account of the dynamks of program execution. Using sofrware tools developed for the nCCBE 2 and the MasPar MP-1 distributed memory machines and applied to the SLALOM program, we demonstrate the utility of performance visualization for fine-tuning algorithms and understanding phenomena. The tools include PICL and ParaGraph and custom VISTA components.
性能可视化提供了对并发计算机系统复杂操作的洞察。sql sql是一个可扩展的、耗时的计算机基准测试。分别对应计算机性能评估的U方法:监测和基准测试。为了便于在不同机器之间进行比较,基准程序通常报告单个性能指标,而性能财务监视器(通过仪器和可视化)提供了程序执行动态的详细说明。使用为nCCBE 2和MasPar MP-1分布式内存机开发的软件工具,并将其应用于SLALOM程序,我们演示了性能可视化对微调算法和理解现象的效用。这些工具包括PICL和段落以及自定义的VISTA组件。
{"title":"Performance Visualization of SLALOM","authors":"D. Rover, M. B. Carter, J. Gustafson","doi":"10.1109/DMCC.1991.633313","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633313","url":null,"abstract":"Performance visua1,ization provides insights about the complex operation of concurrent computer systems. SLAL O W M is a scalable, fuced-time coinputer benchmark. Each corresponds to U method of computer performance evaluation: monitoring and benchmarking, respectively. Whereas benchmark programs typically report singlenumber performance naetrics for ease of comparison among different machines, a perforfinance monitor (via instrumentation and visualization) gives (a detailed account of the dynamks of program execution. Using sofrware tools developed for the nCCBE 2 and the MasPar MP-1 distributed memory machines and applied to the SLALOM program, we demonstrate the utility of performance visualization for fine-tuning algorithms and understanding phenomena. The tools include PICL and ParaGraph and custom VISTA components.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129138244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A Comparison of Particle Simulation Implementations on Two Different Parallel Architect ures 两种不同并行架构下粒子仿真实现的比较
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633198
J. Mcdonald, L. Dagum
Direct particle simur'ation is a powerful method for analyzing low density, hypersonic re-entry flows. The method involves following a large sample of representative gas molecules through motion and collision with other molecules or with surfaces in the simulated flow. In this paper, two very different parallel architectures are examined for their suitability an particle samulation computations, na;mely the Connection Machine CM-2 and the Intel iPSC/860. The difference in architectures has resulted in very diferent parallel decompositions. The two implementations are described and performance results are given. Both implementations achieve performance comparable iio a single Cray-2 CPU, however, this performance is obtained at the cost of greatly increased programming complexity.
直接粒子模拟是分析低密度高超声速再入流的一种有效方法。该方法涉及跟踪大量代表性气体分子样本,通过运动和碰撞与其他分子或与模拟流动中的表面。本文考察了两种非常不同的并行架构在粒子模拟计算中的适用性,即连接机CM-2和英特尔iPSC/860。体系结构的不同导致了非常不同的并行分解。描述了这两种实现,并给出了性能结果。这两种实现都实现了与单个Cray-2 CPU相当的性能,然而,这种性能是以大大增加编程复杂性为代价获得的。
{"title":"A Comparison of Particle Simulation Implementations on Two Different Parallel Architect ures","authors":"J. Mcdonald, L. Dagum","doi":"10.1109/DMCC.1991.633198","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633198","url":null,"abstract":"Direct particle simur'ation is a powerful method for analyzing low density, hypersonic re-entry flows. The method involves following a large sample of representative gas molecules through motion and collision with other molecules or with surfaces in the simulated flow. In this paper, two very different parallel architectures are examined for their suitability an particle samulation computations, na;mely the Connection Machine CM-2 and the Intel iPSC/860. The difference in architectures has resulted in very diferent parallel decompositions. The two implementations are described and performance results are given. Both implementations achieve performance comparable iio a single Cray-2 CPU, however, this performance is obtained at the cost of greatly increased programming complexity.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131751130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Communication Abstraction and Process Refinement 通信抽象和过程细化
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633097
J. Yantchev
Concurrent systems are collections of data, processes, and communication channels. Top-down, hierarchical design of concurrent systems needs powerful abstraction facilities provided by the implementation language. While most languages provide some structuring mechanisms for data and process abstraction, none seems to provide any equivalent mechanisms for communication structuring. Communication channels are to communicate data and, therefore, all data structuring mechanisms provided by a programming language must be available to structure channels as well. In order to preserve behaviour through successive levels of design refinement, these means of communication structuring must preserve the abstraction of atomic transfers of values of arbitrary types. Int r o duct ion Most concurrent programming languages [5, 6, 7, 11 support the abstraction of concurrent systems as collections of data, processes, and communication channels. However, while they provide some structuring mechanisms for data and process abstraction, none seems to provide any equivalent mechanisms for communication structuring. Interprocess communication is almost universally viewed as a synchronised atomic exchange of values between two concurrently active processes. This affects the whole design process and intervenes with the freedom and ease in the refinement of the process structure. The design transformation steps may be non-trivial in some cases and, therefore, difficult to arrive at and verify. In addition, the implementation may be less efficient, both in storage and speed, because of unnecessary data copying and context creation for process spawning. The data structuring mechanisms supported by the contemporary programming languages provide a uniform view on data and data types. Structured data types may consist of components of arbitrary types, including themselves, and values of such types are treated as wholes and may be passed as parameters, returned as results of functions, and assigned to variables. The same applies to processes [5, 71. No distinction of kind need be made between systems with and without substructure and, indeed, a system which at one level of abstraction may be considered to consist of a process and the environment in which it evolves, may be considered as a single system at a higher level of abstraction. A process which for one purpose is taken to be atomic
并发系统是数据、进程和通信通道的集合。自顶向下、分层的并发系统设计需要实现语言提供强大的抽象功能。虽然大多数语言为数据和过程抽象提供了一些结构化机制,但似乎没有一种语言为通信结构化提供了任何等效的机制。通信通道是用来通信数据的,因此,编程语言提供的所有数据结构机制也必须对结构化通道可用。为了通过连续的设计细化级别来保持行为,这些通信结构手段必须保持任意类型值的原子传输的抽象。大多数并发编程语言[5,6,7,11]都支持将并发系统抽象为数据、进程和通信通道的集合。然而,虽然它们为数据和过程抽象提供了一些结构化机制,但似乎没有一个为通信结构化提供任何等效的机制。进程间通信几乎被普遍视为两个并发活动进程之间同步的原子交换值。这影响了整个设计过程,妨碍了过程结构的自由和简化。在某些情况下,设计转换步骤可能非常重要,因此很难到达和验证。此外,由于不必要的数据复制和进程生成的上下文创建,该实现在存储和速度方面可能效率较低。现代编程语言支持的数据结构机制提供了数据和数据类型的统一视图。结构化数据类型可以由任意类型的组件组成,包括它们自己,这些类型的值被视为整体,可以作为参数传递,作为函数的结果返回,并分配给变量。这同样适用于进程[5,71]。不需要区分有子结构和没有子结构的系统,事实上,一个系统在一个抽象层次上可以被认为是由一个过程和它所处的环境组成的,在更高的抽象层次上可以被认为是一个单一的系统。为了一个目的而被认为是原子的过程
{"title":"Communication Abstraction and Process Refinement","authors":"J. Yantchev","doi":"10.1109/DMCC.1991.633097","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633097","url":null,"abstract":"Concurrent systems are collections of data, processes, and communication channels. Top-down, hierarchical design of concurrent systems needs powerful abstraction facilities provided by the implementation language. While most languages provide some structuring mechanisms for data and process abstraction, none seems to provide any equivalent mechanisms for communication structuring. Communication channels are to communicate data and, therefore, all data structuring mechanisms provided by a programming language must be available to structure channels as well. In order to preserve behaviour through successive levels of design refinement, these means of communication structuring must preserve the abstraction of atomic transfers of values of arbitrary types. Int r o duct ion Most concurrent programming languages [5, 6, 7, 11 support the abstraction of concurrent systems as collections of data, processes, and communication channels. However, while they provide some structuring mechanisms for data and process abstraction, none seems to provide any equivalent mechanisms for communication structuring. Interprocess communication is almost universally viewed as a synchronised atomic exchange of values between two concurrently active processes. This affects the whole design process and intervenes with the freedom and ease in the refinement of the process structure. The design transformation steps may be non-trivial in some cases and, therefore, difficult to arrive at and verify. In addition, the implementation may be less efficient, both in storage and speed, because of unnecessary data copying and context creation for process spawning. The data structuring mechanisms supported by the contemporary programming languages provide a uniform view on data and data types. Structured data types may consist of components of arbitrary types, including themselves, and values of such types are treated as wholes and may be passed as parameters, returned as results of functions, and assigned to variables. The same applies to processes [5, 71. No distinction of kind need be made between systems with and without substructure and, indeed, a system which at one level of abstraction may be considered to consist of a process and the environment in which it evolves, may be considered as a single system at a higher level of abstraction. A process which for one purpose is taken to be atomic","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133378155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Parallel BFGS Method for Unconstrained Minimization 无约束最小化的并行BFGS方法
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633160
C. Still
{"title":"The Parallel BFGS Method for Unconstrained Minimization","authors":"C. Still","doi":"10.1109/DMCC.1991.633160","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633160","url":null,"abstract":"","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133484473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Spare Allocation and Reconfiguration in a Fault Tolerant Hypercube with Direct Connect Capability 具有直接连接能力的容错超立方体中的备用分配和重新配置
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633360
B. Izadi, F. Ozguner
This paper investigates hardware reconjiguratzon schemes to make the hypercube multicomputer fault tolerant. Two schemes are proposed; the Cluster Approach and the Enhanced Cluster Approach. The approaches are shown to be able to tolerate large number of failures without any performance deg,radation. It is further demonstrated that no modification to either the existing communication or computaitional algorithm is needed. Finally a gracefully degmdable approach is presented to reconfigure when the number of faulty nodes are more than the available spares.
研究了实现超立方体多机容错的硬件重构方案。提出了两种方案;集群方法和增强集群方法。这些方法被证明能够承受大量的故障而不会有任何性能下降。进一步证明,不需要对现有的通信和计算算法进行修改。最后,提出了一种优雅的可重构方法,用于故障节点数量大于可用备用节点数量时的重新配置。
{"title":"Spare Allocation and Reconfiguration in a Fault Tolerant Hypercube with Direct Connect Capability","authors":"B. Izadi, F. Ozguner","doi":"10.1109/DMCC.1991.633360","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633360","url":null,"abstract":"This paper investigates hardware reconjiguratzon schemes to make the hypercube multicomputer fault tolerant. Two schemes are proposed; the Cluster Approach and the Enhanced Cluster Approach. The approaches are shown to be able to tolerate large number of failures without any performance deg,radation. It is further demonstrated that no modification to either the existing communication or computaitional algorithm is needed. Finally a gracefully degmdable approach is presented to reconfigure when the number of faulty nodes are more than the available spares.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123649585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Fault Tolerance of the Cyclic Buddy Subcube Location Scheme in Hypercubes
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633075
M. Livingston, Q. Stout
This paper examines the problem of locating large fault-free subcubes in multiuser hypercube systems. We analyze a new location strategy, the cyclic buddy system, and compare its performance to the buddy system, the gray-coded buddy system, and several variants of them. We show that the cyclic buddy system gives a striking improvement in expected fault tolerance over the above schemes and, since it can easily be implemented in parallel with little overhead, it provides an attractive alternative to these schemes. We also investigate the behavior of these location systems in the folded, or projective, hypercube, and find that the cyclic buddy system, which adapts naturally to this enhancement, significantly outperforms the other schemes. A combination of analytic techniques and simulation is used to examine both worst case and expected case performance.
本文研究了多用户超立方体系统中大型无故障子立方体的定位问题。本文分析了一种新的定位策略——循环伙伴系统,并将其与伙伴系统、灰色编码伙伴系统及其几种变体的性能进行了比较。我们表明,循环伙伴系统在预期容错性方面比上述方案有显著的改进,并且由于它可以很容易地并行实现,开销很小,因此它提供了这些方案的一个有吸引力的替代方案。我们还研究了这些定位系统在折叠或投影超立方体中的行为,发现自然适应这种增强的循环伙伴系统明显优于其他方案。分析技术和模拟相结合,用于检查最坏情况和预期情况的性能。
{"title":"Fault Tolerance of the Cyclic Buddy Subcube Location Scheme in Hypercubes","authors":"M. Livingston, Q. Stout","doi":"10.1109/DMCC.1991.633075","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633075","url":null,"abstract":"This paper examines the problem of locating large fault-free subcubes in multiuser hypercube systems. We analyze a new location strategy, the cyclic buddy system, and compare its performance to the buddy system, the gray-coded buddy system, and several variants of them. We show that the cyclic buddy system gives a striking improvement in expected fault tolerance over the above schemes and, since it can easily be implemented in parallel with little overhead, it provides an attractive alternative to these schemes. We also investigate the behavior of these location systems in the folded, or projective, hypercube, and find that the cyclic buddy system, which adapts naturally to this enhancement, significantly outperforms the other schemes. A combination of analytic techniques and simulation is used to examine both worst case and expected case performance.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121643148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Performance and Assembly Language Programming of the iPSC/860 System iPSC/860系统的性能与汇编语言编程
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633312
D. Scott, G. Withers
In the world of supercomputers, the goal is higher and higher performance. To obtain the highest performance on a particular computational kernel, it is usually necessary to write assembly language. Compiler technology has not matured to the point of being able to take advantage many of the features of the i860 automatically. To approach the peak performance of the chip, it is currently necessary to use custom assembly language code. It is important to know which combinations of assembly instructions offer the highest performance, and which combinations cannot run at full speed. This paper assumes that you are already acquainted with the basics of the i860 microprocessor assembly language, and concentrates on describing how to enhance the performance of your code using 860 assembly language.
在超级计算机的世界里,目标是越来越高的性能。为了在特定的计算内核上获得最高的性能,通常需要编写汇编语言。编译器技术还没有成熟到能够自动利用i860的许多特性的程度。为了接近芯片的峰值性能,目前有必要使用自定义的汇编语言代码。重要的是要知道哪些汇编指令的组合能提供最高的性能,哪些组合不能全速运行。本文假设您已经熟悉i860微处理器汇编语言的基础知识,并集中描述如何使用860汇编语言增强代码的性能。
{"title":"Performance and Assembly Language Programming of the iPSC/860 System","authors":"D. Scott, G. Withers","doi":"10.1109/DMCC.1991.633312","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633312","url":null,"abstract":"In the world of supercomputers, the goal is higher and higher performance. To obtain the highest performance on a particular computational kernel, it is usually necessary to write assembly language. Compiler technology has not matured to the point of being able to take advantage many of the features of the i860 automatically. To approach the peak performance of the chip, it is currently necessary to use custom assembly language code. It is important to know which combinations of assembly instructions offer the highest performance, and which combinations cannot run at full speed. This paper assumes that you are already acquainted with the basics of the i860 microprocessor assembly language, and concentrates on describing how to enhance the performance of your code using 860 assembly language.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116785116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Efficient Communication Primitives on Circuit-Switched Hypercubes 电路交换超立方体上的高效通信原语
Pub Date : 1991-04-28 DOI: 10.1109/DMCC.1991.633172
Ching-Tien Ho, M. Raghunath
We give practical algorithms, complexity analysis and implementation for all-to-all personalized communication and matrix transpose (with two-dimensional partitioning of the matrix) on hypercubes. We assume the following communication characteristics: circuitswitched e-cube routing and one-port communication model. For all-to-all personalized communication, we propose a hybrid algorithm that combines the well-known recursive doubling algorithm [22,12] and a direct-route algorithm [26,23]. Our hybrid algorithm balances between data transfer time and start-up time of these two algorithms, and its communication complexity is estimated to be better than the two previous algorithms for a range of machine parameters. For matrix transpose with two-dimensional partitioning of the matrix, our algorithm is measured to be better than the recursive transpose algorithm [8] by n nearest-neighbor communications [12]. Our algorithm takes advantage of circuit-switched routing and is congestion-free for a hypercube with e-cube routing. We also suggest a way of storing the matrix such that the transpose operation can take advantage of the routing of the machine.
我们给出了实用的算法、复杂性分析和实现,用于所有对所有的个性化通信和超立方体上的矩阵转置(与矩阵的二维划分)。我们假设以下通信特性:电路交换的e-cube路由和单端口通信模型。对于所有对所有的个性化通信,我们提出了一种混合算法,该算法结合了众所周知的递归加倍算法[22,12]和直接路由算法[26,23]。我们的混合算法平衡了这两种算法的数据传输时间和启动时间,并且在一定的机器参数范围内,估计其通信复杂度优于前两种算法。对于矩阵进行二维划分的矩阵转置,通过n次最近邻通信[12],我们的算法优于递归转置算法[8]。我们的算法利用了电路交换路由的优势,对于具有e-cube路由的超立方体来说是无拥塞的。我们还提出了一种存储矩阵的方法,使转置操作可以利用机器的路由。
{"title":"Efficient Communication Primitives on Circuit-Switched Hypercubes","authors":"Ching-Tien Ho, M. Raghunath","doi":"10.1109/DMCC.1991.633172","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633172","url":null,"abstract":"We give practical algorithms, complexity analysis and implementation for all-to-all personalized communication and matrix transpose (with two-dimensional partitioning of the matrix) on hypercubes. We assume the following communication characteristics: circuitswitched e-cube routing and one-port communication model. For all-to-all personalized communication, we propose a hybrid algorithm that combines the well-known recursive doubling algorithm [22,12] and a direct-route algorithm [26,23]. Our hybrid algorithm balances between data transfer time and start-up time of these two algorithms, and its communication complexity is estimated to be better than the two previous algorithms for a range of machine parameters. For matrix transpose with two-dimensional partitioning of the matrix, our algorithm is measured to be better than the recursive transpose algorithm [8] by n nearest-neighbor communications [12]. Our algorithm takes advantage of circuit-switched routing and is congestion-free for a hypercube with e-cube routing. We also suggest a way of storing the matrix such that the transpose operation can take advantage of the routing of the machine.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121069900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
The Sixth Distributed Memory Computing Conference, 1991. Proceedings
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1