ACM/IEEE SC 2002 Conference (SC'02)最新文献

英文中文

Collaborative Simulation Grid: Multiscale Quantum-Mechanical/Classical Atomistic Simulations on Distributed PC Clusters in the US and Japan 协同模拟网格:美国和日本分布式PC集群上的多尺度量子力学/经典原子模拟

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10013

H. Kikuchi, R. Kalia, A. Nakano, P. Vashishta, H. Iyetomi, S. Ogata, T. Kouno, F. Shimojo, K. Tsuruta, S. Saini

A multidisciplinary,collaborative simulation has been performed on a Grid of geographically distributed PC clusters.The multiscale simulation approach seamlessly combines i) atomistic simulation based on the molecular dynamics (MD) method and ii) quantum mechanical (QM) calculation based on the density functional theory (DFT), so that accurate but less scalable computations are performed only where they are needed. The multiscale MD/QM simulation code has been Grid-enabled using i) a modular, additive hybridization scheme, ii) multiple QM clustering, and iii) computation/communication overlapping. The Gridified MD/QM simulation code has been used to study environmental effects of water molecules on fracture in silicon. A preliminary run of the code has achieved a parallel efficiency of 94% on 25 PCs distributed over 3 PC clusters in the US and Japan, and a larger test involving 154 processors on 5 distributed PC clusters is in progress.

在地理分布的PC集群网格上进行了多学科协作仿真。多尺度模拟方法无缝地结合了i)基于分子动力学(MD)方法的原子模拟和ii)基于密度泛函数理论(DFT)的量子力学(QM)计算，因此只有在需要的地方才执行精确但可扩展性较低的计算。多尺度MD/QM仿真代码已经使用i)模块化，加性杂交方案，ii)多个QM聚类，以及iii)计算/通信重叠实现网格化。采用栅格化MD/QM模拟程序研究了水分子对硅断裂的环境影响。代码的初步运行在分布在美国和日本的3个PC集群上的25台PC上实现了94%的并行效率，在5个分布式PC集群上进行的涉及154个处理器的更大测试正在进行中。

引用次数: 14

Distributed Dynamic Hash Tables Using IBM LAPI 使用IBM LAPI的分布式动态哈希表

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10041

J. Malard, R. Stewart

An asynchronous communication library for accessing and managing dynamic hash tables over a network of Symmetric Multiprocessors (SMP) is presented. A blocking factor is shown experimentally to reduce the variance of the wall clock time. It is also shown that remote accesses to a distributed hash table can be as effective and scalable as the one-sided operations of the low-level communication middleware on an IBM SP.

提出了一个异步通信库，用于在对称多处理器(SMP)网络上访问和管理动态哈希表。实验表明，阻塞因子可以减小挂钟时间的方差。本文还表明，对分布式散列表的远程访问可以与IBM SP上的低级通信中间件的单向操作一样有效和可扩展。

引用次数: 6

Scalable Analysis Techniques for Microprocessor Performance Counter Metrics 微处理器性能计数器指标的可扩展分析技术

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10066

D. Ahn, J. Vetter

Contemporary microprocessors provide a rich set of integrated performance counters that allow application developers and system architects alike the opportunity to gather important information about workload behaviors. Current techniques for analyzing data produced from these counters use raw counts, ratios, and visualization techniques help users make decisions about their application performance. While these techniques are appropriate for analyzing data from one process, they do not scale easily to new levels demanded by contemporary computing systems. Very simply, this paper addresses these concerns by evaluating several multivariate statistical techniques on these datasets. We find that several techniques, such as statistical clustering, can automatically extract important features from the data. These derived results can, in turn, be fed directly back to an application developer, or used as input to a more comprehensive performance analysis environment, such as a visualization or an expert system.

现代微处理器提供了一组丰富的集成性能计数器，使应用程序开发人员和系统架构师都有机会收集有关工作负载行为的重要信息。当前用于分析这些计数器产生的数据的技术使用原始计数、比率和可视化技术，帮助用户对其应用程序性能做出决策。虽然这些技术适合于分析来自一个进程的数据，但它们不容易扩展到现代计算系统所要求的新水平。很简单，本文通过评估这些数据集上的几种多元统计技术来解决这些问题。我们发现一些技术，如统计聚类，可以自动从数据中提取重要特征。这些导出的结果可以直接反馈给应用程序开发人员，或者用作更全面的性能分析环境(如可视化或专家系统)的输入。

引用次数: 67

NAMD: Biomolecular Simulation on Thousands of Processors 数千个处理器上的生物分子模拟

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10019

James C. Phillips, G. Zheng, Sameer Kumar, L. Kalé

NAMD is a fully featured, production molecular dynamics program for high performance simulation of large biomolecular systems. We have previously, at SC2000, presented scaling results for simulations with cutoff electrostatics on up to 2048 processors of the ASCI Red machine, achieved with an object-based hybrid force and spatial decomposition scheme and an aggressive measurement-based predictive load balancing framework. We extend this work by demonstrating similar scaling on the much faster processors of the PSC Lemieux Alpha cluster, and for simulations employing efficient (order N log N) particle mesh Ewald full electrostatics. This unprecedented scalability in a biomolecular simulation code has been attained through latency tolerance, adaptation to multiprocessor nodes, and the direct use of the Quadrics Elan library in place of MPI by the Charm++/Converse parallel runtime system.

NAMD是一个功能齐全的生产分子动力学程序，用于高性能模拟大型生物分子系统。我们之前在SC2000上展示了在ASCI Red机器的多达2048个处理器上具有截止静电的模拟缩放结果，通过基于对象的混合力和空间分解方案以及基于积极测量的预测负载平衡框架实现。我们通过在PSC Lemieux Alpha集群的更快的处理器上演示类似的缩放，以及采用高效(on log N)粒子网格Ewald全静电的模拟来扩展这项工作。在生物分子模拟代码中，这种前所未有的可扩展性是通过延迟容忍、对多处理器节点的适应以及直接使用Quadrics Elan库代替MPI(由Charm++/Converse并行运行时系统实现的)来实现的。

引用次数: 284

Early Evaluation of the IBM p690 IBM p690的早期评估

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10000

P. Worley, T. Dunigan, M. Fahey, James B. White, Arthur S. Bland

Oak Ridge National Laboratory recently received 27 32-way IBM pSeries 690 SMP nodes. In this paper, we describe our initial evaluation of the p690 architecture, focusing on the performance of benchmarks and applications that are representative of the expected production workload.

橡树岭国家实验室最近收到了27个32路IBM pSeries 690 SMP节点。在本文中，我们描述了我们对p690体系结构的初步评估，重点关注代表预期生产工作负载的基准测试和应用程序的性能。

引用次数: 9

A 26.58 Tflops Global Atmospheric Simulation with the Spectral Transform Method on the Earth Simulator 在地球模拟器上用光谱变换方法模拟26.58 Tflops全球大气

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10053

S. Shingu, H. Takahara, H. Fuchigami, M. Yamada, Yoshinori Tsuda, W. Ohfuchi, Yuji Sasaki, Kazuo Kobayashi, Takashi Hagiwara, S. Habata, M. Yokokawa, Hiroyuki Itoh, K. Otsuka

A spectral atmospheric general circulation model called AFES (AGCM for Earth Simulator) was developed and optimized for the architecture of the Earth Simulator (ES). The ES is a massively parallel vector supercomputer that consists of 640 processor nodes interconnected by a single stage crossbar network with its total peak performance of 40.96 Tflops was achieved for a high resolution simulation (T1279L96) with AFES by utilizing the full 640-node configuration of the ES. The resulting computing efficiency is 64.9% of the peak performance, well surpassing that of conventional weather/climate applications having just 25-50% efficiency even on vector parallel computers. This remarkable performance proves the effectiveness of the ES as a viable means for practical applications.

针对地球模拟器(ES)的结构，开发并优化了光谱大气环流模式AFES (AGCM for Earth Simulator)。ES是一种大规模并行矢量超级计算机，由640个处理器节点组成，通过单级横杆网络相互连接，利用ES的全部640个节点配置，利用AFES进行高分辨率模拟(T1279L96)，其总峰值性能达到40.96 Tflops。由此产生的计算效率是峰值性能的64.9%，远远超过了传统的天气/气候应用程序，即使在矢量并行计算机上也只有25-50%的效率。这一显著的性能证明了ES在实际应用中的有效性。

引用次数: 87

Implementation and Evaluation of A QoS-Capable Cluster-Based IP Router 一种具有qos功能的集群IP路由器的实现与评价

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10026

P. Pradhan, T. Chiueh

A major challenge in Internet edge router design is to support both high packet forwarding performance and versatile and efficient packet processing capabilities. The thesis of this research project is that a cluster of PCs connected by a high-speed system area network provides an effective hardware platform for building routers to be used at the edges of the Internet. This paper describes a scalable and extensible edge router architecture called Panama, which supports a novel aggregate route caching scheme, a real-time link scheduling algorithm whose performance overhead is independent of the number of real-time flows, a highly efficient kernel extension mechanism to safely load networking software extensions dynamically, and an integrated resource scheduler which ensures that real-time flows with additional packet processing requirements still meet their end-to-end performance requirements. This paper describes the implementation and evaluation of the first Panama prototype based on a cluster of PCs and Myrinet.

互联网边缘路由器设计的一个主要挑战是支持高数据包转发性能和通用高效的数据包处理能力。本研究项目的主题是，通过高速系统局域网连接的pc集群为构建用于互联网边缘的路由器提供了有效的硬件平台。本文介绍了一种可扩展的边缘路由器体系结构——巴拿马，它支持一种新颖的聚合路由缓存方案、一种性能开销与实时流数量无关的实时链路调度算法、一种安全动态加载网络软件扩展的高效内核扩展机制。集成的资源调度器确保具有额外数据包处理要求的实时流仍然满足端到端性能要求。本文描述了基于pc机集群和Myrinet的第一个Panama原型的实现和评估。

引用次数: 6

On Increasing Architecture Awareness in Program Optimizations to Bridge the Gap between Peak and Sustained Processor Performance — Matrix-Multiply Revisited 提高程序优化中的体系结构意识以弥合峰值和持续处理器性能之间的差距&#8212矩阵相乘重新审视

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.1109/SC.2002.10054

David Parello, O. Temam, J. Verdun

As the complexity of processor architectures increases, there is a widening gap between peak processor performance and sustained processor performance so that programs now tend to exploit only a fraction of available performance. While there is a tremendous amount of literature on program optimizations, compiler optimizations lack efficiency because they are plagued by three flaws: (1) they often implicitly use simplified, if not simplistic, models of processor architecture, (2) they usually focus on a single processor component (e.g., cache) and ignore the interactions among multiple components, (3) the most heavily nvestigated components (e.g., caches) sometimes have only a small impact on overall performance. Through the in-depth analysis of a simple program kernel, we want to show that understanding the complex interactions between programs and the numerous processor architecture components is both feasible and critical to design efficient program optimizations.

随着处理器体系结构复杂性的增加，峰值处理器性能和持续处理器性能之间的差距越来越大，因此程序现在倾向于只利用可用性能的一小部分。虽然有大量关于程序优化的文献，但编译器优化缺乏效率，因为它们受到三个缺陷的困扰:(1)它们通常隐式地使用简化的处理器体系结构模型，(2)它们通常关注单个处理器组件(例如缓存)而忽略多个组件之间的交互，(3)最深入研究的组件(例如缓存)有时对整体性能只有很小的影响。通过对一个简单程序内核的深入分析，我们希望表明，理解程序与众多处理器体系结构组件之间的复杂交互对于设计有效的程序优化既可行又至关重要。

引用次数: 28

Parallel Multiscale Gauss-Newton-Krylov Methods for Inverse Wave Propagation 反波传播的平行多尺度高斯-牛顿-克雷洛夫方法

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.5555/762761.762827

V. Akçelik, G. Biros, O. Ghattas

One of the outstanding challenges of computational science and engineering is large-scale nonlinear parameter estimation of systems governed by partial differential equations. These are known as inverse problems, in contradistinction to the forward problems that usually characterize large-scale simulation. Inverse problems are significantly more difficult to solve than forward problems, due to ill-posedness, large dense ill-conditioned operators, multiple minima, space-time coupling, and the need to solve the forward problem repeatedly. We present a parallel algorithm for inverse problems governed by time-dependent PDEs, and scalability results for an inverse wave propagation problem of determining the material field of an acoustic medium. The difficulties mentioned above are addressed through a combination of total variation regularization, preconditioned matrix-free Gauss-Newton-Krylov iteration, algorithmic checkpointing, and multiscale continuation. We are able to solve a synthetic inverse wave propagation problem though a pelvic bone geometry involving 2.1 million inversion parameters in 3 hours on 256 processors of the Terascale Computing System at the Pittsburgh Supercomputing Center.

计算科学和工程的突出挑战之一是由偏微分方程控制的系统的大规模非线性参数估计。这些被称为逆问题，与通常具有大规模模拟特征的正问题相反。由于病态性、大密集病态算子、多重极小值、时空耦合以及需要反复求解正向问题，逆问题的求解难度明显高于正向问题。我们提出了一种求解时变偏微分方程反问题的并行算法，并给出了确定声介质物质场的反波传播问题的可扩展性结果。上述困难是通过总变分正则化、预条件无矩阵高斯-牛顿-克雷洛夫迭代、算法点检和多尺度延拓的组合来解决的。我们能够在匹兹堡超级计算中心的256个太斯卡尔计算系统的处理器上，在3小时内解决一个包含210万个反演参数的骨盆骨几何合成逆波传播问题。

{"title":"Parallel Multiscale Gauss-Newton-Krylov Methods for Inverse Wave Propagation","authors":"V. Akçelik, G. Biros, O. Ghattas","doi":"10.5555/762761.762827","DOIUrl":"https://doi.org/10.5555/762761.762827","url":null,"abstract":"One of the outstanding challenges of computational science and engineering is large-scale nonlinear parameter estimation of systems governed by partial differential equations. These are known as inverse problems, in contradistinction to the forward problems that usually characterize large-scale simulation. Inverse problems are significantly more difficult to solve than forward problems, due to ill-posedness, large dense ill-conditioned operators, multiple minima, space-time coupling, and the need to solve the forward problem repeatedly. We present a parallel algorithm for inverse problems governed by time-dependent PDEs, and scalability results for an inverse wave propagation problem of determining the material field of an acoustic medium. The difficulties mentioned above are addressed through a combination of total variation regularization, preconditioned matrix-free Gauss-Newton-Krylov iteration, algorithmic checkpointing, and multiscale continuation. We are able to solve a synthetic inverse wave propagation problem though a pelvic bone geometry involving 2.1 million inversion parameters in 3 hours on 256 processors of the Terascale Computing System at the Pittsburgh Supercomputing Center.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134100327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 163

Owner Prediction for Accelerating Cache-to-Cache Transfer Misses in a cc-NUMA Architecture cc-NUMA架构中加速缓存到缓存传输失误的所有者预测

ACM/IEEE SC 2002 Conference (SC'02)

Pub Date : 2002-11-16 DOI: 10.5555/762761.762762

M. Acacio, José González, José M. García, J. Duato

Cache misses for which data must be obtained from a remote cache (cache-to-cache transfer misses) account for an important fraction of the total miss rate. Unfortunately, cc-NUMA designs put the access to the directory information into the critical path of 3-hop misses, which significantly penalizes them compared to SMP designs. This work studies the use of owner prediction as a means of providing cc-NUMA multiprocessors with a more efficient support for cache-to-cache transfer misses. Our proposal comprises an effective prediction scheme as well as a coherence protocol designed to support the use of prediction. Results indicate that owner prediction can significantly reduce the latency of cache-to-cache transfer misses, which translates into speed-ups on application performance up to 12%. In order to also accelerate most of those 3-hop misses that are either not predicted or mispredicted, the inclusion of a small and fast directory cache in every node is evaluated, leading to improvements up to 16% on the final performance.

必须从远程缓存获取数据的缓存丢失(缓存到缓存的传输丢失)占总丢失率的重要部分。不幸的是，cc-NUMA设计将对目录信息的访问置于3跳未命中的关键路径中，与SMP设计相比，这明显不利于它们。这项工作研究了所有者预测的使用，作为一种为cc-NUMA多处理器提供更有效的支持缓存到缓存传输失误的手段。我们的建议包括一个有效的预测方案以及一个旨在支持预测使用的一致性协议。结果表明，所有者预测可以显著减少缓存到缓存传输失败的延迟，这可以将应用程序性能提高12%。为了加速大多数未预测或错误预测的3跳丢失，在每个节点中包含一个小而快速的目录缓存进行了评估，最终性能提高了16%。

引用次数: 63

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM/IEEE SC 2002 Conference (SC'02)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀