ACM/IEEE SC 2000 Conference (SC'00)最新文献

英文中文

Improving Fine-Grained Irregular Shared-Memory Benchmarks by Data Reordering 通过数据重排序改进细粒度不规则共享内存基准测试

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10009

Y. C. Hu, A. Cox, W. Zwaenepoel

We demonstrate that data reordering can substantially improve the performance of fine-grained irregular shared-memory benchmarks, on both hardware and software shared-memory systems. In particular, we evaluate two distinct data reordering techniques that seek to co-locate in memory objects that are in close proximity in the physical system modeled by the computation. The effects of these techniques are increased spatial locality and reduced false sharing. We evaluate the effectiveness of the data reordering techniques on a set of five irregular applications from SPLASH-2 and Chaos. We implement both techniques in a small library, allowing us to enable them in an application by adding less than 10 lines of code. Our results on one hardware and two software shared-memory systems show that, with data reordering during initialization, the performance of these applications is improved by 12%-99% on the Origin 2000, 30%-366% on TreadMarks, and 14%-269% on HLRC.

我们证明了数据重新排序可以在硬件和软件共享内存系统上大大提高细粒度不规则共享内存基准测试的性能。特别地，我们评估了两种不同的数据重排序技术，它们寻求在由计算建模的物理系统中靠近的内存对象中进行共定位。这些技术的效果是增加空间局部性和减少虚假共享。我们以SPLASH-2和Chaos的一组5个不规则应用程序为例，评估了数据重排技术的有效性。我们在一个小库中实现这两种技术，允许我们通过添加不到10行代码在应用程序中启用它们。我们在一个硬件和两个软件共享内存系统上的结果表明，在初始化期间进行数据重排序，这些应用程序的性能在Origin 2000上提高了12%-99%，在TreadMarks上提高了30%-366%，在HLRC上提高了14%-269%。

引用次数: 48

Parallel Algorithms for Radiation Transport on Unstructured Grids 非结构网格上辐射输运的并行算法

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10030

S. Plimpton, B. Hendrickson, S. Burns, William C. McLendon

The method of discrete ordinates is commonly used to solve the Boltzmann radiation transport equation for applications ranging from simulations of fires to weapons effects. The equations are most efficiently solved by sweeping the radiation flux across the computational grid. For unstructured grids this poses several interesting challenges, particularly when implemented on distributed-memory parallel machines where the grid geometry is spread across processors. We describe a asynchronous, parallel, message-passing algorithm that performs sweeps simultaneously from many directions across unstructured grids. We identify key factors that limit the algorithm’s parallel scalability and discuss two enhancements we have made to the basic algorithm: one to prioritize the work within a processor’s subdomain and the other to better decompose the unstructured grid across processors. Performance results are give for the basic and enhanced algorithms implemented withi a radiation solver running on hundreds of processors of Sandia’s Intel Tflops machine and DEC-Alpha CPlant cluster.

离散坐标法是求解波尔兹曼辐射输运方程的常用方法，其应用范围从火灾模拟到武器效应模拟。通过在计算网格上扫描辐射通量，可以最有效地求解这些方程。对于非结构化网格，这带来了几个有趣的挑战，特别是在分布式内存并行机器上实现时，网格几何分布在多个处理器上。我们描述了一种异步、并行、消息传递算法，该算法可以在非结构化网格上从多个方向同时执行扫描。我们确定了限制算法并行可伸缩性的关键因素，并讨论了我们对基本算法所做的两个增强:一个是在处理器的子域内优先处理工作，另一个是更好地分解跨处理器的非结构化网格。给出了在Sandia的Intel Tflops机器和DEC-Alpha CPlant集群的数百个处理器上运行的辐射求解器实现的基本算法和增强算法的性能结果。

引用次数: 55

A scalable SNMP-based distributed monitoring system for heterogeneous network computing 一个可扩展的基于snmp的异构网络计算分布式监控系统

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10058

R. Subramanyan, J. Miguel-Alonso, J. Fortes

Traditional centralized monitoring systems do not scale to present-day large, complex, network- computing systems. Based on recent SNMP standards for distributed management, this paper addresses the scalability problem through distribution of monitoring tasks, applicable for tools such as SI- MONE (SNMP-based monitoring prototype implemented by the authors). Distribution is achieved by introducing one or more levels of a dual entity called the Intermediate Level Manager (ILM) between a manager and the agents. The ILM accepts monitoring tasks described in the form of scripts and delegated by the next higher entity. The solution is flexible and integratable into a SNMP tool without altering other system components. A testbed of up to 1024 monitoring elements is used to assess scalability. Noticeable improvements in the round trip delay (from seconds to less than tenth of a second) were observed when more than 200 monitoring elements are present and as few as 2 ILM's are used.

传统的集中式监控系统无法适应当今大型、复杂的网络计算系统。本文以最新的SNMP分布式管理标准为基础，通过分配监控任务来解决可扩展性问题，适用于SI- MONE(作者实现的基于SNMP的监控原型)等工具。分发是通过在管理器和代理之间引入一个或多个称为中间级别管理器(ILM)的双重实体来实现的。工业光魔接受以脚本形式描述并由下一级实体委派的监控任务。该解决方案灵活且可集成到SNMP工具中，而无需更改其他系统组件。一个多达1024个监控元素的测试平台用于评估可伸缩性。当存在200多个监视元件并且只使用2个ILM时，可以观察到往返延迟的显著改善(从几秒到不到十分之一秒)。

引用次数: 46

Data Access Performance in a Large and Dynamic Pharmaceutical Drug Candidate Database 大型动态候选药物数据库中的数据访问性能

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10049

Zina Ben-Miled, Yang Liu, D. Powers, O. Bukhres, Michael Bem, Robert Jones, Robert J. Oppelt, Sam A. Milosevich

An explosion in the amount of data generated through chemical and biological experimentation has been observed in recent years. This rapid proliferation of vast amounts of data has led to a set of cheminformatics and bioinformatics applications that manipulate dynamic, heterogeneous, and massive data. An example of such applications in the pharmaceutical industry is the computational process involved in the early discovery of lead drug candidates for a given target disease. This computational process includes repeated sequential and random accesses to a drug candidate database. Using the above pharmaceutical application, an experimental study was conducted in this paper that shows that for optimal performance, the degree of parallelism exploited in the application should be adjusted according to the drug candidate database instance size and the machine size. Additionally, different degrees of parallelism should be used depending on whether the access to the drug candidate database is random or sequential.

近年来，通过化学和生物实验产生的数据量呈爆炸式增长。大量数据的快速扩散导致了一系列化学信息学和生物信息学应用程序，这些应用程序可以操纵动态、异构和大量数据。这种应用在制药工业中的一个例子是涉及到针对特定目标疾病的先导候选药物的早期发现的计算过程。该计算过程包括对候选药物数据库的重复顺序和随机访问。本文针对上述制药应用程序进行了实验研究，结果表明，为了获得最佳性能，应用程序中利用的并行度应根据候选药物数据库实例大小和机器大小进行调整。此外，根据对候选药物数据库的访问是随机的还是顺序的，应该使用不同程度的并行性。

引用次数: 2

The Failure of TCP in High-Performance Computational Grids 高性能计算网格中TCP的失效

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-08-01 DOI: 10.1109/SC.2000.10039

Wu-chun Feng, P. Tinnakornsrisuphap

Distributed computational grids depend on TCP to ensure reliable end-to-end communication between nodes across the wide-area network (WAN). Unfortunately, TCP performance can be abysmal even when buffers on the end hosts are manually optimized. Recent studies blame the self-similar nature of aggregate network traffic for TCP’s poor performance because such traffic is not readily amenable to statistical multiplexing in the Internet, and hence computational grids. In this paper, we identify a source of self-similarity previously ignored, a source that is readily controllable - TCP. Via an experimental study, we examine the effects of the TCP stack on network traffic using different implementations of TCP. We show that even when aggregate application traffic ought to smooth out as more applications’ traffic are multiplexed, TCP induces burstiness into the aggregate traffic load, thus adversely impacting network performance. Furthermore, our results indicate that TCP performance will worsen as WAN speeds continue to increase.

分布式计算网格依赖TCP来确保广域网(WAN)节点之间可靠的端到端通信。不幸的是，即使手动优化了终端主机上的缓冲区，TCP性能也可能非常糟糕。最近的研究将TCP较差的性能归咎于聚合网络流量的自相似特性，因为这种流量不容易适应Internet中的统计多路复用，因此也不适合计算网格。在本文中，我们确定了一个以前被忽略的自相似源，一个易于控制的源- TCP。通过实验研究，我们研究了TCP栈对使用不同TCP实现的网络流量的影响。我们表明，即使当更多应用程序的流量被多路复用时，聚合应用程序流量应该平滑，TCP也会导致聚合流量负载的突发，从而对网络性能产生不利影响。此外，我们的结果表明，随着广域网速度的不断提高，TCP性能将会恶化。

引用次数: 115

Extending OpenMP For NUMA Machines 为NUMA机器扩展OpenMP

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-08-01 DOI: 10.1109/SC.2000.10019

John Bircsak, Peter Craig, RaeLyn Crowell, Z. Cvetanovic, Jonathan Harris, C. A. Nelson, Carl D. Offner

This paper describes extensions to OpenMP that implemen data placemen features needed for NUMA architectures. OpenMP is a collection of compiler directives and library routines used to write portable parallel programs for shared-memory architectures. Writing efficient parallel programs for NUMA architectures, which have characteristics of both shared-memory and distributed-memory architectures, requires that a programmer control the placement of data in memory and the placement of computations that operate on that data. Optimal performance is obtained when computations occur on processors that have fast access to the data needed by those computations. OpenMP-designed for shared-memory architectures-does not by itself address these issues. The extensions to OpenMP Fortran presented here have been mainly taken from High Performance Fortran. The paper describes some of the techniques that the Compaq Fortran compiler uses to generate efficient code based on these extensions. I also describes some additional compiler optimizations, and concludes with some preliminary results.

本文描述了OpenMP的扩展，这些扩展实现了NUMA架构所需的数据放置特性。OpenMP是编译器指令和库例程的集合，用于为共享内存体系结构编写可移植的并行程序。NUMA体系结构具有共享内存和分布式内存体系结构的特点，为其编写高效的并行程序要求程序员控制内存中数据的位置以及对该数据进行操作的计算的位置。当计算发生在能够快速访问这些计算所需数据的处理器上时，可以获得最佳性能。openmp是为共享内存架构设计的，它本身并不能解决这些问题。这里介绍的OpenMP Fortran扩展主要来自高性能Fortran。本文描述了Compaq Fortran编译器用来基于这些扩展生成高效代码的一些技术。我还描述了一些额外的编译器优化，并给出了一些初步结果。

引用次数: 129

Scalable Algorithms for Adaptive Statistical Designs 自适应统计设计的可扩展算法

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-08-01 DOI: 10.1155/2000/508081

R. Oehmke, J. Hardwick, Q. Stout

We present a scalable, high-performance solution to multidimensional recurrences that arise in adaptive statistical designs. Adaptive designs are an important class of learning algorithms for a stochastic environment, and we focus on the problem of optimally assigning patients to treatments in clinical trials. While adaptive designs have significant ethical and cost advantages, they are rarely utilized because of the complexity of optimizing and analyzing them. Computational challenges include massive memory requirements, few calculations per memory access, and multiply-nested loops with dynamic indices. We analyze the effects of various parallelization options, and while standard approaches do not work well, with effort an efficient, highly scalable program can be developed. This allows us to solve problems thousands of times more complex than those solved previously, which helps make adaptive designs practical. Further, our work applies to many other problems involving neighbor recurrences, such as generalized string matching.

我们提出了一个可扩展的，高性能的解决方案，以多维递归出现在自适应统计设计。自适应设计是随机环境下重要的一类学习算法，我们关注的是临床试验中最佳分配患者治疗的问题。虽然自适应设计具有显著的伦理和成本优势，但由于优化和分析它们的复杂性，它们很少被利用。计算方面的挑战包括大量内存需求、每次内存访问的计算量很少，以及带有动态索引的多重嵌套循环。我们分析了各种并行化选项的影响，虽然标准方法不能很好地工作，但通过努力可以开发出高效，高度可扩展的程序。这使我们能够解决比以前解决的问题复杂数千倍的问题，这有助于使自适应设计变得实用。此外，我们的工作适用于许多其他涉及邻居递归的问题，例如广义字符串匹配。

引用次数: 7

Scalable Molecular Dynamics for Large Biomolecular Systems 大型生物分子系统的可扩展分子动力学

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-08-01 DOI: 10.1155/2000/750827

R. Brunner, James C. Phillips, L. Kalé

We present an optimized parallelization scheme for molecular dynamics simulations of large biomolecular systems, implemented in the production-quality molecular dynamics program NAMD. With an object-based hybrid force and spatial decomposition scheme, and an aggressive measurement-based predictive load balancing framework, we have attained speeds and speedups that are much higher than any reported in literature so far. The paper first summarizes the broad methodology we are pursuing, and the basic parallelization scheme we used. It then describes the optimizations that were instrumental in increasing performance, and presents performance results on benchmark simulations.

我们提出了一个优化的并行方案，用于大型生物分子系统的分子动力学模拟，在生产质量的分子动力学程序NAMD中实现。通过基于对象的混合力和空间分解方案，以及基于积极测量的预测负载平衡框架，我们获得了比迄今为止文献报道的任何速度和加速都要高得多的速度和加速。本文首先总结了我们所追求的广泛的方法，以及我们使用的基本并行方案。然后描述了对提高性能有帮助的优化，并在基准模拟中给出了性能结果。

引用次数: 35

Self-Consistent Langevin Simulation of Coulomb Collisions in Charged-Particle Beams 带电粒子束中库仑碰撞的自洽朗格万模拟

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-05-01 DOI: 10.1109/SC.2000.10047

J. Qiang, R. Ryne, S. Habib

In many plasma physics and changed-particle beam dynamics problems, Coulomb collisions are modeled by a Fokker-Planck equation. In order to incorporate these collisions, we present a three-dimensional parallel Langevin simulation method using a Particle-In-Cell (PIC) approach implemented on high-performance parallel computers. We perform, for the first time, a fully self-consistent simulation, in which the friction and diffusion coefficients are computed from first principles. We employ a two-dimensional domain decomposition approach within a message passing programming paradigm along with dynamic load balancing. Object oriented programming is used to encapsulate details of the communication syntax as well as to enhance reusability and extensibility. Performance tests on the SGI Origin 2000, IBM SP RS/6000 and the Cray T3E-900 have demonstrated good scalability. As a test example, we demonstrate the collisional relaxation to a final thermal equilibrium of a beam with an initially anisotropic velocity distribution.

在许多等离子体物理和变粒子束动力学问题中，库仑碰撞是用福克-普朗克方程来模拟的。为了结合这些碰撞，我们提出了一种三维并行朗格万模拟方法，该方法采用了在高性能并行计算机上实现的粒子-细胞(PIC)方法。我们首次进行了一个完全自洽的模拟，其中摩擦系数和扩散系数是根据第一性原理计算的。我们在消息传递编程范例中使用二维域分解方法以及动态负载平衡。面向对象编程用于封装通信语法的细节，并增强可重用性和可扩展性。在SGI Origin 2000、IBM SP RS/6000和Cray T3E-900上的性能测试证明了良好的可扩展性。作为一个测试实例，我们证明了具有初始各向异性速度分布的光束的碰撞松弛到最终热平衡。

引用次数: 10

Using High-Speed WANs and Network Data Caches to Enable Remote and Distributed Visualization 使用高速广域网和网络数据缓存实现远程和分布式可视化

ACM/IEEE SC 2000 Conference (SC'00)

Pub Date : 2000-04-18 DOI: 10.1109/SC.2000.10002

W. Bethel, B. Tierney, Jason R. Lee, D. Gunter, Stephen Lau

Visapult is a prototype application and framework for remote visualization of large scientific datasets. We approach the technical challenges of tera-scale visualization with a unique architecture that employs high speed WANs and network data caches for data staging and transmission. This architecture allows for the use of available cache and compute resources at arbitrary locations on the network. High data throughput rates and network utilization are achieved by parallelizing I/O at each stage in the application, and by pipelining the visualization process. On the desktop, the graphics interactivity is effectively decoupled from the latency inherent in network applications. We present a detailed performance analysis of the application, and improvements resulting from field-test analysis conducted as part of the DOE Combustion Corridor project.

Visapult是一个用于大型科学数据集远程可视化的原型应用程序和框架。我们采用了一种独特的架构，采用高速广域网和网络数据缓存进行数据分段和传输，从而解决了太量级可视化的技术挑战。这种体系结构允许在网络上的任意位置使用可用的缓存和计算资源。高数据吞吐率和网络利用率是通过在应用程序的每个阶段并行I/O和可视化过程的流水线来实现的。在桌面上，图形交互可以有效地与网络应用程序中固有的延迟解耦。我们对该应用程序进行了详细的性能分析，并通过作为美国能源部燃烧走廊项目的一部分进行的现场测试分析进行了改进。

引用次数: 141

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM/IEEE SC 2000 Conference (SC'00)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀