Proceedings of the IEEE/ACM SC95 Conference最新文献

英文中文

The Emperor Has No Clothes: What HPC Users Need to Say and HPC Vendors Need to Hear 皇帝没有衣服:HPC用户需要说什么和HPC供应商需要听到什么

Proceedings of the IEEE/ACM SC95 Conference

Pub Date : 1995-12-08 DOI: 10.1145/224170.224172

C. Pancake

A decade ago, high-performance computing (HPC) was about to "come of age" and we were convinced it would have significant impact throughout the computing industry. Instead, the HPC community has remained small and elitist. The rate at which technical applications have been ported to parallel and distributed platforms is distressingly slow, given that the availability of key applications is precisely the mechanism needed to drive the growth of the community. When major software vendors state publicly that their products will never be parallelized - as some have in recent months - it's time for us to take a hard look at reality. Marketing and PR claims to the contrary, HPC is not a success story. Although our capabilities continue to expand, we have not found a way to make HPC improve our productivity.

十年前，高性能计算(HPC)即将“成熟”，我们确信它将对整个计算行业产生重大影响。相反，高性能计算社区一直保持着小规模和精英化。考虑到关键应用程序的可用性正是推动社区增长所需的机制，技术应用程序移植到并行和分布式平台的速度慢得令人沮丧。当主要的软件供应商公开声明他们的产品永远不会并行化时——就像最近几个月有些人说的那样——我们是时候认真审视现实了。与市场营销和PR的说法相反，HPC并不是一个成功的故事。虽然我们的能力在不断扩大，但我们还没有找到一种方法来使高性能计算提高我们的生产力。

引用次数: 6

Storm Watch: A Tool for Visualizing Memory System Protocols Storm Watch:一个可视化内存系统协议的工具

Proceedings of the IEEE/ACM SC95 Conference

Pub Date : 1995-12-08 DOI: 10.1145/224170.224287

Trishul M. Chilimbi, T. Ball, S. Eick, J. Larus

Recent research has offered programmers increased options for programming parallel computers by exposing system policies (e.g., memory coherence protocols) or by providing several programming paradigms (e.g. message passing and shared memory) on the same platform. Increased flexibility can lead to higher performance, but it is also a double-edged sword that demands a programmer understand his or her application and system at a more fundamental level. Our system, Tempest, allows a programmer to select or implement communication and memory coherence policies that fit an application's communication patterns. With it, we have achieved substantial performance gains without making major changes in programs. However, the process of selecting, designing, and implementing coherence protocols is difficult and time consuming, without tools to supply detailed information about an application's behavior and interaction with the memory system. StormWatch is a new visualization tool that aids a programmer through four mechanisms: tightly-coupled bidirectionally linked views, interactive filters, animation, and performance slicing. Multiple views present several aspects of program behavior simultaneously and show the same phenomenon from different perspectives. Real-time linking between views enables a programmer to explore levels of abstraction by changing a view and observing the effect on other views. Interactive filters, along with bidirectional linking, can isolate the effects of statements, loops, procedures, or files. StormWatch can also animate a program's dynamic behavior to show the evolution of program execution and communication. Finally, performance slicing captures causality among events. The examples in the paper illustrate how StormWatch helped us substantially improve the performance of two applications.

最近的研究通过公开系统策略(例如，内存一致性协议)或在同一平台上提供多个编程范例(例如消息传递和共享内存)，为程序员提供了更多的并行计算机编程选择。增加的灵活性可以带来更高的性能，但它也是一把双刃剑，要求程序员在更基本的层面上理解他或她的应用程序和系统。我们的系统Tempest允许程序员选择或实现适合应用程序通信模式的通信和内存一致性策略。有了它，我们在没有对程序进行重大更改的情况下实现了显著的性能提升。然而，选择、设计和实现一致性协议的过程是困难和耗时的，没有工具来提供有关应用程序行为和与内存系统交互的详细信息。StormWatch是一个新的可视化工具，它通过四种机制来帮助程序员:紧密耦合的双向链接视图、交互式过滤器、动画和性能切片。多视角同时呈现程序行为的几个方面，从不同的角度展示同一现象。视图之间的实时链接使程序员能够通过更改视图和观察对其他视图的影响来探索抽象层次。交互式过滤器以及双向链接可以隔离语句、循环、过程或文件的效果。StormWatch还可以动画程序的动态行为，以显示程序执行和通信的演变。最后，性能切片捕获事件之间的因果关系。本文中的示例说明了StormWatch如何帮助我们大幅提高两个应用程序的性能。

{"title":"Storm Watch: A Tool for Visualizing Memory System Protocols","authors":"Trishul M. Chilimbi, T. Ball, S. Eick, J. Larus","doi":"10.1145/224170.224287","DOIUrl":"https://doi.org/10.1145/224170.224287","url":null,"abstract":"Recent research has offered programmers increased options for programming parallel computers by exposing system policies (e.g., memory coherence protocols) or by providing several programming paradigms (e.g. message passing and shared memory) on the same platform. Increased flexibility can lead to higher performance, but it is also a double-edged sword that demands a programmer understand his or her application and system at a more fundamental level. Our system, Tempest, allows a programmer to select or implement communication and memory coherence policies that fit an application's communication patterns. With it, we have achieved substantial performance gains without making major changes in programs. However, the process of selecting, designing, and implementing coherence protocols is difficult and time consuming, without tools to supply detailed information about an application's behavior and interaction with the memory system. StormWatch is a new visualization tool that aids a programmer through four mechanisms: tightly-coupled bidirectionally linked views, interactive filters, animation, and performance slicing. Multiple views present several aspects of program behavior simultaneously and show the same phenomenon from different perspectives. Real-time linking between views enables a programmer to explore levels of abstraction by changing a view and observing the effect on other views. Interactive filters, along with bidirectional linking, can isolate the effects of statements, loops, procedures, or files. StormWatch can also animate a program's dynamic behavior to show the evolution of program execution and communication. Finally, performance slicing captures causality among events. The examples in the paper illustrate how StormWatch helped us substantially improve the performance of two applications.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133231841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Wide-Area Gigabit Networking: Los Alamos HIPPI-SONET Gateway 广域千兆网络:洛斯阿拉莫斯hipi - sonet网关

Proceedings of the IEEE/ACM SC95 Conference

Pub Date : 1995-12-08 DOI: 10.1145/224170.224313

W. S. John, D. DuBois

This paper describes a HIPPI-SONET Gateway which has been designed by members of the Computer Network Engineering Group at Los Alamos National Laboratory. The Gateway has been used in the CASA Gigabit Testbed at Caltech, Los Alamos National Laboratory, and the San Diego Supercomputer Center to provide communications between the sites. This paper will also make some qualitative statements as to lessons learned during the deployment and maintenance of this wide area network. We report record throughput for transmission of data across a wide area network. We have sustained data rates using the TCP/IP protocol of 550 Mbits/second and the rate of 792 Mbits/second for raw HIPPI data transfer over the 2,000 kilometers from the San Diego Supercomputer Center to the Los Alamos National Laboratory.

本文介绍了由洛斯阿拉莫斯国家实验室计算机网络工程小组成员设计的一种hipi - sonet网关。该网关已用于加州理工学院、洛斯阿拉莫斯国家实验室和圣地亚哥超级计算机中心的CASA千兆试验台，以提供站点之间的通信。本文还将对广域网的部署和维护过程中的经验教训作一些定性的陈述。我们报告了在广域网上传输数据的创纪录吞吐量。从圣地亚哥超级计算机中心到洛斯阿拉莫斯国家实验室，我们使用TCP/IP协议保持了550兆比特/秒的数据速率和792兆比特/秒的原始HIPPI数据传输速率，传输距离为2000公里。

引用次数: 1

The Synergetic Effect of Compiler, Architecture, and Manual Optimizations on the Performance of CFD on Multiprocessors 编译器、体系结构和人工优化对多处理器CFD性能的协同效应

Proceedings of the IEEE/ACM SC95 Conference

Pub Date : 1995-12-08 DOI: 10.1145/224170.224426

M. Kuba, C. Polychronopoulos, K. Gallivan

This paper discusses the comprehensive performance profiling, improvement and benchmarking of a Computational Fluid Dynamics code, one of the Grand Challenge applications, on three popular multiprocessors. In the process of analyzing performance we considered language, compiler, architecture, and algorithmic changes and quantified each of them and their incremental contribution to bottom-line performance. We demonstrate that parallelization alone cannot result in significant gains if the granularity of parallel threads and the effect of parallelization on data locality are not taken into account. Unlike benchmarking studies that often focus on the performance or effectiveness of parallelizing compilers on specific loop kernels, we used the entire CFD code to measure the global effectiveness of compilers and parallel architectures. We probed the performance bottlenecks in each case and derived solutions which eliminate or neutralize the performance inhibiting factors. The major conclusion of our work is that overall performance is extremely sensitive to the synergetic effects of compiler optimizations, algorithmic and code tuning, and architectural idiosyncrasies.

本文讨论了大挑战应用程序之一的计算流体动力学代码在三种流行的多处理器上的综合性能分析、改进和基准测试。在分析性能的过程中，我们考虑了语言、编译器、架构和算法的变化，并量化了每一个变化以及它们对底线性能的增量贡献。我们证明，如果不考虑并行线程的粒度和并行化对数据局部性的影响，单独的并行化不能带来显著的收益。与通常关注特定循环内核上并行编译器的性能或有效性的基准测试研究不同，我们使用整个CFD代码来测量编译器和并行架构的全局有效性。我们探讨了每种情况下的性能瓶颈，并推导了消除或抵消性能抑制因素的解决方案。我们工作的主要结论是，总体性能对编译器优化、算法和代码调优以及架构特性的协同效应极其敏感。

{"title":"The Synergetic Effect of Compiler, Architecture, and Manual Optimizations on the Performance of CFD on Multiprocessors","authors":"M. Kuba, C. Polychronopoulos, K. Gallivan","doi":"10.1145/224170.224426","DOIUrl":"https://doi.org/10.1145/224170.224426","url":null,"abstract":"This paper discusses the comprehensive performance profiling, improvement and benchmarking of a Computational Fluid Dynamics code, one of the Grand Challenge applications, on three popular multiprocessors. In the process of analyzing performance we considered language, compiler, architecture, and algorithmic changes and quantified each of them and their incremental contribution to bottom-line performance. We demonstrate that parallelization alone cannot result in significant gains if the granularity of parallel threads and the effect of parallelization on data locality are not taken into account. Unlike benchmarking studies that often focus on the performance or effectiveness of parallelizing compilers on specific loop kernels, we used the entire CFD code to measure the global effectiveness of compilers and parallel architectures. We probed the performance bottlenecks in each case and derived solutions which eliminate or neutralize the performance inhibiting factors. The major conclusion of our work is that overall performance is extremely sensitive to the synergetic effects of compiler optimizations, algorithmic and code tuning, and architectural idiosyncrasies.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114780606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Case Study in Parallel Scientific Computing: The Boundary Element Method on a Distributed-Memory Multicomputer 并行科学计算的实例研究:边界元法在分布式存储多计算机上的应用

Proceedings of the IEEE/ACM SC95 Conference

Pub Date : 1995-12-08 DOI: 10.1145/224170.224277

R. Natarajan, D. Krishnaswamy

The Boundary Element Method is a widely-used discretization technique for solving boundary-value problems in engineering analysis. The solution of large problems by this method is limited by the storage and computational requirements for the generation and solution of large matrix systems resulting from the discretization. We discuss the implementation of these computations on the IBM SP-2 distributed-memory parallel computer, for applications involving the 3DD Laplace and Helmholtz equations.

边界元法是解决工程分析中边值问题的一种广泛应用的离散化技术。这种方法在求解大型问题时，由于离散化导致的大型矩阵系统的生成和求解需要存储空间和计算量的限制。我们讨论了这些计算在IBM SP-2分布式内存并行计算机上的实现，用于涉及3DD拉普拉斯方程和亥姆霍兹方程的应用。

引用次数: 16

Analysis of Multilevel Graph Partitioning 多层图划分分析

Proceedings of the IEEE/ACM SC95 Conference

Pub Date : 1995-12-08 DOI: 10.1145/224170.224229

G. Karypis, Vipin Kumar

Recently, a number of researchers have investigated a class of algorithms that are based on multilevel graph partitioning that have moderate computational complexity, and provide excellent graph partitions. However, there exists little theoretical analysis that could explain the ability of multilevel algorithms to produce good partitions. In this paper we present such an analysis. Weshow under certain reasonable assumptions that even if no refinement is used in the uncoarsening phase, a good bisection of the coarser graph is worse than a good bisection of the finer graph by at most a small factor. We also show that for planar graphs, the size of a good vertex-separator of the coarse graph projected to the finer graph (without performing refinement in the uncoarsening phase) is higher than the size of a good vertex-separator of the finer graph by at most a small factor.

近年来，许多研究人员研究了一类基于多层图划分的算法，这些算法具有中等的计算复杂度，并提供了良好的图划分。然而，很少有理论分析可以解释多层算法产生良好分区的能力。在本文中，我们提出了这样一个分析。我们显示，在某些合理的假设下，即使在非粗化阶段不使用精化，粗图的良好平分也最多比精细图的良好平分差一个小因子。我们还表明，对于平面图，粗图的良好顶点分隔符的大小投影到细图(在非粗化阶段不执行细化)比细图的良好顶点分隔符的大小最多高一个小因子。

引用次数: 360

Efficient Algorithms for Atmospheric Correction of Remotely Sensed Data 遥感数据大气校正的有效算法

Proceedings of the IEEE/ACM SC95 Conference

Pub Date : 1995-12-08 DOI: 10.1145/224170.224194

Hassan Fallah-Adl, J. JáJá, S. Liang, Y. Kaufman, J. Townshend

Remotely sensed imagery has been used for developing and validating various studies regarding land cover dynamics. However, the large amounts of imagery collected by the satellites are largely contaminated by the effects of atmospheric particles. The objective of atmospheric correction is to retrieve the surface reflectance from remotely sensed imagery by removing the atmospheric effects. We introduce a number of computational techniques that lead to a substantial speedup of an atmospheric correction algorithm based on using look-up tables. Excluding I/O time, the previous known implementation processes one pixel at a time and requires about 2.63 seconds per pixel on a SPARC-10 machine, while our implementation is based on processing the whole image and takes about 4-20 microseconds per pixel on the same machine. We also develop a parallel version of our algorithm that is scalable in terms of both computation and I/O. Experimental results obtained show that a Thematic Mapper (TM) image (36 MB per band, 5 bands need to be corrected) can be handled in less than 4.3 minutes on a 32-node CM-5 machine, including I/O time.

遥感影像已被用于发展和验证关于土地覆盖动态的各种研究。然而，卫星收集的大量图像在很大程度上受到大气颗粒影响的污染。大气校正的目的是通过去除大气影响，从遥感影像中恢复地表反射率。我们介绍了一些计算技术，这些技术可以大大提高基于查找表的大气校正算法的速度。除去I/O时间，以前已知的实现一次处理一个像素，在SPARC-10机器上每像素需要大约2.63秒，而我们的实现基于处理整个图像，在同一台机器上每像素需要大约4-20微秒。我们还开发了我们算法的并行版本，它在计算和I/O方面都是可扩展的。实验结果表明，在32节点的CM-5机器上，包括I/O时间在内，处理一张Thematic Mapper (TM)图像(每波段36 MB，需要校正5个波段)的时间不到4.3分钟。

引用次数: 9

Mobile Robots Teach Machine-Level Programming 移动机器人教授机器级编程

Proceedings of the IEEE/ACM SC95 Conference

Pub Date : 1995-12-08 DOI: 10.1145/224170.224205

P. Teller, T. Dunning

We feel strongly that a contemporary introductory course in machine organization and assembly language should focus on the essentials of how computers execute programs, and not be distracted by the complications of the extraordinarily sophisticated microprocessors that are available today. These essentials should form a strong base of knowledge from which students can draw as they continue their education in computer science. Ideally these goals should be attained in an environment that fosters experimentation and cooperation, and with the aid of projects that generate interest and enthusiasm among the students. We have developed and are currently teaching a course at New Mexico State University that meets many of these goals. The course concentrates on a simple but relatively complete microprocessor architecture, that of the Motorola 68HC11 processor. Three different teaching techniques are used to encourage experimentation and team work: learning sessions, simulator labs, and microprocessor labs. New concepts are introduced in learning sessions, which combine traditional lecturing with student exploration. The understanding of these new concepts is strengthened through labs and assignments. Simulator labs and assignments, which require interaction with a simulator of the Motorola 68HC11 microprocessor, focus on the 68HC11's instruction set architecture. Microprocessor labs and assignments, which essentially are designing and building sessions, focus on the use of a 68HC11 microprocessor to control a motorized vehicle. During microprocessor labs students populate printed circuit cards, build motorized vehicles (or other roboticized exotica), and design and implement assembly language programs that provide communication between a personal computer and a 68HC11 processor, and a 68HC11 processor and a motorized vehicle. We have found that the costs of running this course are minimal and the results are very favorable in terms of student enthusiasm and achievement.

我们强烈认为，当代的机器组织和汇编语言入门课程应该集中在计算机如何执行程序的基本要素上，而不是被当今可获得的极其复杂的微处理器的复杂性分散了注意力。这些要点应该形成一个坚实的知识基础，学生在继续学习计算机科学时可以从中汲取知识。理想情况下，这些目标应该在一个促进实验和合作的环境中实现，并借助能够激发学生兴趣和热情的项目。我们已经开发了一门课程，目前正在新墨西哥州立大学教授，该课程满足了许多这些目标。本课程集中于一个简单但相对完整的微处理器架构，即摩托罗拉68HC11处理器。三种不同的教学方法用于鼓励实验和团队合作:学习会议、模拟器实验室和微处理器实验室。在学习环节中引入新概念，将传统的讲授与学生的探索相结合。通过实验和作业加强对这些新概念的理解。模拟器实验和作业需要与摩托罗拉68HC11微处理器的模拟器进行交互，重点关注68HC11的指令集体系结构。微处理器实验室和作业，本质上是设计和建设会议，重点是使用68HC11微处理器来控制机动车辆。在微处理器实验期间，学生们制作印刷电路板，制造机动车辆(或其他机器人化的舶来品)，并设计和实现汇编语言程序，提供个人计算机和68HC11处理器之间的通信，以及68HC11处理器和机动车辆之间的通信。我们发现，开设这门课程的成本非常低，而且在学生的积极性和成绩方面效果非常好。

{"title":"Mobile Robots Teach Machine-Level Programming","authors":"P. Teller, T. Dunning","doi":"10.1145/224170.224205","DOIUrl":"https://doi.org/10.1145/224170.224205","url":null,"abstract":"We feel strongly that a contemporary introductory course in machine organization and assembly language should focus on the essentials of how computers execute programs, and not be distracted by the complications of the extraordinarily sophisticated microprocessors that are available today. These essentials should form a strong base of knowledge from which students can draw as they continue their education in computer science. Ideally these goals should be attained in an environment that fosters experimentation and cooperation, and with the aid of projects that generate interest and enthusiasm among the students. We have developed and are currently teaching a course at New Mexico State University that meets many of these goals. The course concentrates on a simple but relatively complete microprocessor architecture, that of the Motorola 68HC11 processor. Three different teaching techniques are used to encourage experimentation and team work: learning sessions, simulator labs, and microprocessor labs. New concepts are introduced in learning sessions, which combine traditional lecturing with student exploration. The understanding of these new concepts is strengthened through labs and assignments. Simulator labs and assignments, which require interaction with a simulator of the Motorola 68HC11 microprocessor, focus on the 68HC11's instruction set architecture. Microprocessor labs and assignments, which essentially are designing and building sessions, focus on the use of a 68HC11 microprocessor to control a motorized vehicle. During microprocessor labs students populate printed circuit cards, build motorized vehicles (or other roboticized exotica), and design and implement assembly language programs that provide communication between a personal computer and a 68HC11 processor, and a 68HC11 processor and a motorized vehicle. We have found that the costs of running this course are minimal and the results are very favorable in terms of student enthusiasm and achievement.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123556845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Controlling Application Grain Size on a Network of Workstations 在工作站网络上控制应用程序粒度

Proceedings of the IEEE/ACM SC95 Conference

Pub Date : 1995-12-08 DOI: 10.1145/224170.224497

B. Siegell, P. Steenkiste

An important challenge in the area of distributed computing is to automate the selection of the parameters that control the distributed computation. A performance-critical parameter is the grain size of the computation, i.e., the interval between successive synchronization points in the application. This parameter is hard to select since it depends both on compile time (loop structure and data dependences, computational complexity) and run time components (speed of compute nodes and network). On networks of workstations that are shared with other users, the run-time parameters can change over time. As a result, it is also necessary to consider the interactions with dynamic load balancing, which is needed to achieve good performance in this environment. In this paper we present a method for automatically selecting the grain size of the computation consisting of nested DO loops. The method is based on close cooperation between the compiler and the runtime system. We evaluate the method using both simulation and measurements for an implementation on the Nectar multicomputer.

分布式计算领域的一个重要挑战是如何自动选择控制分布式计算的参数。性能关键参数是计算的粒度，即应用程序中连续同步点之间的间隔。这个参数很难选择，因为它既取决于编译时(循环结构和数据依赖性、计算复杂性)，也取决于运行时组件(计算节点和网络的速度)。在与其他用户共享的工作站网络上，运行时参数可以随时间变化。因此，还需要考虑与动态负载平衡的交互，这是在此环境中实现良好性能所必需的。本文提出了一种由嵌套DO循环组成的计算粒度自动选择的方法。该方法基于编译器和运行时系统之间的密切合作。我们使用仿真和测量来评估该方法在Nectar多计算机上的实现。

引用次数: 7

Architecture-Adaptable Finite Element Modelling: A Case Study Using an Ocean Circulation Simulation 建筑适应性有限元模型:使用海洋环流模拟的案例研究

Proceedings of the IEEE/ACM SC95 Conference

Pub Date : 1995-12-08 DOI: 10.1145/224170.224501

S. Kumaran, Robert N. Miller, M. J. Quinn

We describe an architecture-adaptable methodology for the parallel implementation of finite element numerical models of physical systems. We use a model of time-dependent ocean currents as our working example. The heart of the computation is the solution of a banded linear system, and we describe an algorithm based on the domain decompositionmethod to solve the banded system. The algorithm is represented in a divide-and-conquer framework facilitates easy implementation of various algorithmic options. The process is straightforward and amenable to automation. We demonstrate the validity of this approach using two radically different target machine, a workstation network and a supercomputer. Our results show very good speedup on both platforms.

我们描述了一种架构适应性的方法，用于物理系统的有限元数值模型的并行实现。我们使用一个随时间变化的洋流模型作为我们的工作例子。计算的核心是带状线性系统的求解，我们描述了一种基于域分解方法的带状系统求解算法。该算法在分治框架中表示，便于实现各种算法选项。这个过程很简单，可以自动化。我们使用两个完全不同的目标机器，工作站网络和超级计算机来证明这种方法的有效性。我们的结果显示在两个平台上都有很好的加速。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the IEEE/ACM SC95 Conference

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀