ACM/IEEE SC 2000 Conference (SC'00)最新文献

英文中文

High Performance Visualization of Time-Varying Volume Data over a Wide-Area Network 广域网上时变体积数据的高性能可视化

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10000

K. Ma, David Camp

This paper presents an end-to-end, low-cost solution for visualizing time-varying volume data rendered on a parallel computer located at a remote site. Pipelining and careful grouping of processors are used to hide I/O time and to maximize processors utilization. Compression is used to significantly cut down the cost of transferring output images from the parallel computer to a display device through a widearea network. This complete rendering pipeline makes possible highly efficient rendering and remote viewing of high resolution time-varying data sets in the absence of high-speed network and parallel I/O support. To study the performance of this rendering pipeline and to demonstrate high-performance remote visualization, tests were conducted on a PC cluster in Japan as well as an SGI Origin 2000 operated at the NASA Ames Research Center with the display located at UC Davis.

本文提出了一个端到端的低成本解决方案，用于在位于远程站点的并行计算机上呈现时变体积数据。流水线和处理器的仔细分组用于隐藏I/O时间和最大化处理器利用率。压缩用于显著降低通过广域网将输出图像从并行计算机传输到显示设备的成本。这个完整的呈现管道使得在没有高速网络和并行I/O支持的情况下，高效呈现和远程查看高分辨率时变数据集成为可能。为了研究该渲染管道的性能并演示高性能远程可视化，在日本的PC集群以及位于加州大学戴维斯分校的美国宇航局艾姆斯研究中心运行的SGI Origin 2000上进行了测试。

引用次数: 119

Architectural and Performance Evaluation of GigaNet and Myrinet Interconnects on Clusters of Small-Scale SMP Servers 小型SMP服务器集群上GigaNet和Myrinet互连的体系结构和性能评估

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10027

J. Hsieh, T. Leng, V. Mashayekhi, R. Rooholamini

GigaNet and Myrinet are two of the leading interconnects for clusters of commodity computer systems. Both provide memory-protected user-level network interface access, and deliver low-latency and high-bandwidth communication to applications. GigaNet is a connection-oriented interconnect based on a hardware implementation of Virtual Interface (VI) Architecture and Asynchronous Transfer Mode (ATM) technologies. Myrinet is a connection-less interconnect which leverages packet switching technologies from experimental Massively Parallel Processors (MPP) networks. This paper investigates their architectural differences and evaluates their performance on two commodity clusters based on two generations of Symmetric Multiple Processors (SMP) servers. The performance measurements reported here suggest that the implementation of Message Passing Interface (MPI) significantly affects the cluster performance. Although MPICH-GM over Myrinet demonstrates lower latency with small messages, the polling-driven implementation of MPICH-GM often leads to tight synchronization between communication processes and higher CPU overhead.

GigaNet和Myrinet是商用计算机系统集群的两个主要互连网络。两者都提供内存保护的用户级网络接口访问，并为应用程序提供低延迟和高带宽通信。GigaNet是基于虚拟接口(VI)架构和异步传输模式(ATM)技术的硬件实现的面向连接的互连。Myrinet是一种无连接互连，它利用了来自实验性大规模并行处理器(MPP)网络的分组交换技术。本文研究了它们的架构差异，并在基于两代对称多处理器(SMP)服务器的两个商用集群上评估了它们的性能。这里报告的性能度量表明，消息传递接口(Message Passing Interface, MPI)的实现会显著影响集群性能。尽管基于Myrinet的MPICH-GM在处理小消息时显示了较低的延迟，但轮询驱动的MPICH-GM实现通常会导致通信进程之间的紧密同步和更高的CPU开销。

{"title":"Architectural and Performance Evaluation of GigaNet and Myrinet Interconnects on Clusters of Small-Scale SMP Servers","authors":"J. Hsieh, T. Leng, V. Mashayekhi, R. Rooholamini","doi":"10.1109/SC.2000.10027","DOIUrl":"https://doi.org/10.1109/SC.2000.10027","url":null,"abstract":"GigaNet and Myrinet are two of the leading interconnects for clusters of commodity computer systems. Both provide memory-protected user-level network interface access, and deliver low-latency and high-bandwidth communication to applications. GigaNet is a connection-oriented interconnect based on a hardware implementation of Virtual Interface (VI) Architecture and Asynchronous Transfer Mode (ATM) technologies. Myrinet is a connection-less interconnect which leverages packet switching technologies from experimental Massively Parallel Processors (MPP) networks. This paper investigates their architectural differences and evaluates their performance on two commodity clusters based on two generations of Symmetric Multiple Processors (SMP) servers. The performance measurements reported here suggest that the implementation of Message Passing Interface (MPI) significantly affects the cluster performance. Although MPICH-GM over Myrinet demonstrates lower latency with small messages, the polling-driven implementation of MPICH-GM often leads to tight synchronization between communication processes and higher CPU overhead.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114417659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Computer Simulations of Cardiac Electrophysiology 心脏电生理的计算机模拟

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10032

J. Pormann, C. Henriquez, J. Board, D. Rose, D. Harrild, Alexandra P. Henriquez

CardioWave is a modular system for simulating wavefront conduction in the heart. These simulations may be used to investigate the factors that generate and sustain life-threatening arrhythmias such as ventricular fibrillation. The user selects a set of modules which most closely reflects the simulation they are interested in and the simulator is built automatically. Thus, we do not present one monolithic simulator, but rather a simulator-generator which allows the researcher to make the trade-offs of complexity versus performance. The results presented here are from simulations run on an IBM SP parallel computer and a cluster of workstations. The performance numbers show excellent scalability up through 128 processors. With the larger memory of the parallel machines, we have been able to perform highly realistic simulations of the human atria. These simulations include realistic, 3-D geometries with inhomogeneity and anisotropy as we as highly complex membrane dynamics.

CardioWave是一个模拟心脏波前传导的模块化系统。这些模拟可用于研究产生和维持危及生命的心律失常(如心室颤动)的因素。用户选择一组最能反映他们感兴趣的仿真的模块，并自动构建仿真器。因此，我们不提供一个单片模拟器，而是一个模拟器生成器，它允许研究人员在复杂性和性能之间进行权衡。本文给出的结果来自在IBM SP并行计算机和一个工作站集群上运行的模拟。性能数据显示了出色的可伸缩性，最高可达128个处理器。由于并行机器的内存更大，我们已经能够对人类心房进行高度逼真的模拟。这些模拟包括具有非均匀性和各向异性的现实三维几何形状，以及高度复杂的膜动力学。

引用次数: 9

Automatically Tuned Collective Communications 自动调谐的集体通信

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10024

Sathish S. Vadhiyar, G. Fagg, J. Dongarra

The performance of the MPI's collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not give good performance on all systems due to the differences in architectures, network parameters and the storage capacity of the underlying MPI implementation. In this paper, we discuss an approach in which the collective communications are tuned for a given system by conducting a series of experiments on the system. We also discuss a dynamic topology method that uses the tuned static topology shape, but re-orders the logical addresses to compensate for changing run time variations. A series of experiments were conducted comparing our tuned collective communication operations to various native vendor MPI implementations. The use of the tuned collective communications resulted in about 30%-650% improvement in performance over the native MPI implelementations.

在大多数基于MPI的应用程序中，MPI集体通信的性能至关重要。由于体系结构、网络参数和底层MPI实现的存储容量的差异，对于给定的集体通信操作的通用算法可能无法在所有系统上提供良好的性能。在本文中，我们讨论了一种方法，该方法通过在系统上进行一系列实验来调整给定系统的集体通信。我们还讨论了一种动态拓扑方法，该方法使用调优的静态拓扑形状，但重新排序逻辑地址以补偿不断变化的运行时变化。进行了一系列的实验，将我们调优的集体通信操作与各种本地供应商MPI实现进行了比较。经过调优的集体通信的使用使性能比本机MPI实现提高了30%-650%。

引用次数: 181

PARALLEL UNSTEADY TURBOPUMP SIMULATIONS FOR LIQUID ROCKET ENGINES 液体火箭发动机并联非定常涡轮泵模拟

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10053

C. Kiris, D. Kwak, W. Chan

This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. The Space Shuttle Main Engine (SSME) turbo-pump impeller is used as a test case for the performance evaluation of the MPI, hybrid MPI/Open-MP, and MLP versions of the INS3D code. Then, a computational model of a turbo-pump has been developed for the shuttle upgrade program. Relative motion of the grid system for rotor-stator interaction was obtained by employing overset grid techniques. Unsteady computations for SSME turbo-pump, which contains 101 zones with 31 Million grid points, are carried on Origin 2000 systems at NASA Ames Research Center. The approach taken for these simulations, and the performance of the parallel versions of the code are presented.

本文报道了液体火箭发动机涡轮泵完全模拟能力的研究进展。使用航天飞机主发动机(SSME)涡轮泵叶轮作为测试用例，对INS3D代码的MPI、混合MPI/Open-MP和MLP版本进行性能评估。然后，针对航天飞机升级方案，建立了涡轮泵的计算模型。采用复置网格技术，得到了转子-定子相互作用网格系统的相对运动。SSME涡轮泵的非定常计算在NASA Ames研究中心的Origin 2000系统上进行，该系统包含101个区域和3100万个网格点。给出了这些模拟所采用的方法，以及代码并行版本的性能。

引用次数: 11

Scalable Fault-Tolerant Distributed Shared Memory 可扩展的容错分布式共享内存

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10014

F. Sultan, Thu D. Nguyen, L. Iftode

This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be efficiently extended to tolerate single-node failures. In particular, we extend a home-based lazy release consistency (HLRC) DSM system with independent check- pointing and logging to volatile memory, targeting shared-memory computing on very large LAN-based clusters. In these environments, where global coordination may be expensive, independent checkpointing becomes critical to scalability. However, independent checkpointing is only practical if we can control the size of the log and checkpoints in the absence of global coordination. In this paper we describe the design of our fault-tolerant DSM system and present our solutions to the problems of checkpoint and log management. We also present experimental results showing that our fault tolerance support is light-weight, adding only low messaging, logging and checkpointing overheads, and that our management algorithms can be expected to effectively bound the size of the checkpoints and logs or real applications.

本文展示了如何有效地扩展最先进的软件分布式共享内存(DSM)协议以容忍单节点故障。特别是，我们扩展了一个基于家庭的延迟释放一致性(HLRC) DSM系统，该系统具有独立的检查指向和对易失性内存的日志记录，目标是在非常大的基于lan的集群上进行共享内存计算。在这些环境中，全局协调的成本可能很高，因此独立检查点对可伸缩性至关重要。然而，独立检查点只有在没有全局协调的情况下才能控制日志和检查点的大小。本文描述了我们的容错DSM系统的设计，并提出了我们对检查点和日志管理问题的解决方案。我们还提供了实验结果，表明我们的容错支持是轻量级的，只增加了较低的消息传递、日志记录和检查点开销，并且我们的管理算法可以有效地约束检查点和日志或实际应用程序的大小。

引用次数: 48

Randomization, Speculation, and Adaptation in Batch Schedulers 批调度程序中的随机化、推测和自适应

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.5555/370049.370063

Dejan Perkovic, P. Keleher

This paper proposes extensions to the backfilling job-scheduling algorithm that significantly improve its performance. We introduce variations that sort the "backfilling order" in priority-based and randomized fashions. We examine the effectiveness of guarantees present in conservative backfilling and find that initial guarantees have limited practical value, while the performance of a "no-guarantee" algorithm can be significantly better when combined with extensions that we introduce. Our study differs from many similar studies in using traces that contain user estimates. We find that actual overestimates are large and significantly different from simple models. We propose the use of speculative backfilling and speculative test runs to counteract these large overestimations. Finally, we explore the impact of dynamic, system-directed adaptation of application parallelism. The cumulative improvements of these techniques decrease the bounded slowdown, our primary metric, to less then 15% of conservative backfilling.

本文对回填作业调度算法进行了扩展，显著提高了算法的性能。我们引入了以基于优先级和随机的方式对“回填顺序”进行排序的变化。我们研究了保守回填中存在的保证的有效性，发现初始保证的实用价值有限，而“无保证”算法的性能在与我们引入的扩展相结合时可以显着更好。我们的研究在使用包含用户估计的痕迹方面与许多类似的研究不同。我们发现实际的高估很大，与简单的模型有很大的不同。我们建议使用投机性回填和投机性试运行来抵消这些巨大的高估。最后，我们探讨了动态的、系统导向的应用程序并行化的影响。这些技术的累积改进将有界减速(我们的主要指标)降低到保守回填的15%以下。

引用次数: 64

A Wrapper Generator for Wrapping High Performance Legacy Codes as Java/CORBA Components 用于将高性能遗留代码包装为Java/CORBA组件的包装器生成器

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.5555/370049.370072

Maozhen Li, O. Rana, Matthew S. Shields, D. Walker

This paper describes a Wrapper Generator for wrapping high performance legacy codes as Java/CORBA components for use in a distributed component-based problem- solving environment. Using the Wrapper Generator we ave automatically wrapped an MPI-based legacycode as a single CORBA object, and implemented a problem- solving environment for molecular dynamic simulations. Performance comparisons between runs of the CORBA object and the original legacy code on a cluster of workstations and on a parallel computer are also presented.

本文描述了一个包装器生成器，用于将高性能遗留代码包装为Java/CORBA组件，以便在基于分布式组件的问题解决环境中使用。使用Wrapper Generator，我们自动将基于mpi的遗留代码包装为单个CORBA对象，并实现了用于分子动力学模拟的问题解决环境。还介绍了在工作站集群和并行计算机上运行CORBA对象和原始遗留代码之间的性能比较。

引用次数: 30

Distributed Rendering for Scalable Displays 可扩展显示的分布式渲染

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10003

G. Humphreys, I. Buck, Matthew Eldridge, P. Hanrahan

We describe a novel distributed graphics system that allows an application to render to a large tiled display. Our system, called WireGL, uses a cluster of off-the-shelf PCs connected with a high-speed network. WireGL allows an unmodified existing application to achieve scalable output resolution on such a display. This paper presents an efficient sorting algorithm which minimizes the network traffic for a scalable display. We will demonstrate that for most applications, our system provides scalable output resolution with minimal performance impact.

我们描述了一个新颖的分布式图形系统，它允许应用程序渲染到一个大的平铺显示。我们的系统名为WireGL，使用一组现成的个人电脑与高速网络相连。WireGL允许未经修改的现有应用程序在这样的显示器上实现可伸缩的输出分辨率。本文提出了一种有效的排序算法，该算法可以最大限度地减少可扩展显示的网络流量。我们将演示，对于大多数应用程序，我们的系统以最小的性能影响提供可扩展的输出分辨率。

引用次数: 143

Efficient Wire Formats for High Performance Computing 高效线格式的高性能计算

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10046

F. Bustamante, G. Eisenhauer, K. Schwan, Patrick M. Widener

High performance computing is being increasingly utilized in non-traditional circumstances where it must interoperate with other applications. For example, online visualization is being used to monitor the progress of applications, and real-world sensors are used as inputs to simulations. Whenever these situations arise, there is a question of what communications infrastructure should be used to link the different components. Traditional HPC-style communications systems such as MPI offer relatively high performance, but are poorly suited for developing these less tightly-coupled cooperating applications. Object-based systems and meta-data formats like XML offer substantial plug-and-play flexibility, but with substantially lower performance. We observe that the flexibility and baseline performance of all these systems is strongly determined by their `wire format', or how they represent data for transmission in a heterogeneous environment. We examine the performance implications of different wire formats and present an alternative with significant advantages in terms of both performance and flexibility.

高性能计算越来越多地用于非传统环境，在这些环境中，高性能计算必须与其他应用程序进行互操作。例如，在线可视化被用于监控应用程序的进度，真实世界的传感器被用作模拟的输入。每当出现这些情况时，就会出现应该使用什么通信基础设施来连接不同组件的问题。传统的高性能计算机风格的通信系统(如MPI)提供了相对较高的性能，但不适合开发这些不太紧密耦合的协作应用程序。基于对象的系统和元数据格式(如XML)提供了大量即插即用的灵活性，但性能却低得多。我们观察到，所有这些系统的灵活性和基准性能在很大程度上取决于它们的“线格式”，或者它们如何在异构环境中表示传输数据。我们研究了不同连线格式对性能的影响，并提出了一种在性能和灵活性方面都具有显著优势的替代方案。

引用次数: 105

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM/IEEE SC 2000 Conference (SC'00)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀