Proceedings. 15th Symposium on Computer Architecture and High Performance Computing最新文献

英文中文

Finite difference simulations of the Navier-Stokes equations using parallel distributed computing 应用并行分布式计算的Navier-Stokes方程的有限差分模拟

Proceedings. 15th Symposium on Computer Architecture and High Performance Computing

Pub Date : 2003-11-10 DOI: 10.1109/CAHPC.2003.1250333

J. P. B. Angeli, A. Valli, N. C. Reis, A. D. Souza

We discuss the implementation of a numerical algorithm for simulating incompressible fluid flows based on the finite difference method and designed for parallel computing platforms with distributed-memory, particularly for clusters of workstations. The solution algorithm for the Navier-Stokes equations utilizes an explicit scheme for pressure and an implicit scheme for velocities, i. e., the velocity field at a new time step can be computed once the corresponding pressure is known. The parallel implementation is based on domain decomposition, where the original calculation domain is decomposed into several blocks, each of which given to a separate processing node. All nodes then execute computations in parallel, each node on its associated subdomain. The parallel computations include initialization, coefficient generation, linear solution on the subdomain, and inter-node communication. The exchange of information across the subdomains, or processors, is achieved using the message passing interface standard, MPI. The use of MPI ensures portability across different computing platforms ranging from massively parallel machines to clusters of workstations. The execution time and speed-up are evaluated through comparing the performance of different numbers of processors. The results indicate that the parallel code can significantly improve prediction capability and efficiency for large-scale simulations.

我们讨论了一种基于有限差分法的模拟不可压缩流体流动的数值算法的实现，该算法专为具有分布式内存的并行计算平台，特别是工作站集群而设计。Navier-Stokes方程的求解算法采用了压力的显式格式和速度的隐式格式，即一旦相应的压力已知，就可以计算出新的时间步长的速度场。并行实现基于域分解，将原始计算域分解为几个块，每个块分配给一个单独的处理节点。然后，所有节点并行执行计算，每个节点在其关联的子域上执行计算。并行计算包括初始化、系数生成、子域线性求解和节点间通信。跨子域或处理器的信息交换是使用消息传递接口标准MPI实现的。MPI的使用确保了从大规模并行机器到工作站集群的不同计算平台之间的可移植性。通过比较不同数量的处理器的性能来评估执行时间和加速。结果表明，并行代码可以显著提高大规模仿真的预测能力和效率。

{"title":"Finite difference simulations of the Navier-Stokes equations using parallel distributed computing","authors":"J. P. B. Angeli, A. Valli, N. C. Reis, A. D. Souza","doi":"10.1109/CAHPC.2003.1250333","DOIUrl":"https://doi.org/10.1109/CAHPC.2003.1250333","url":null,"abstract":"We discuss the implementation of a numerical algorithm for simulating incompressible fluid flows based on the finite difference method and designed for parallel computing platforms with distributed-memory, particularly for clusters of workstations. The solution algorithm for the Navier-Stokes equations utilizes an explicit scheme for pressure and an implicit scheme for velocities, i. e., the velocity field at a new time step can be computed once the corresponding pressure is known. The parallel implementation is based on domain decomposition, where the original calculation domain is decomposed into several blocks, each of which given to a separate processing node. All nodes then execute computations in parallel, each node on its associated subdomain. The parallel computations include initialization, coefficient generation, linear solution on the subdomain, and inter-node communication. The exchange of information across the subdomains, or processors, is achieved using the message passing interface standard, MPI. The use of MPI ensures portability across different computing platforms ranging from massively parallel machines to clusters of workstations. The execution time and speed-up are evaluated through comparing the performance of different numbers of processors. The results indicate that the parallel code can significantly improve prediction capability and efficiency for large-scale simulations.","PeriodicalId":433002,"journal":{"name":"Proceedings. 15th Symposium on Computer Architecture and High Performance Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128915761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Fast parallel FFT on a reconfigurable computation platform 基于可重构计算平台的快速并行FFT

Proceedings. 15th Symposium on Computer Architecture and High Performance Computing

Pub Date : 2003-11-10 DOI: 10.1109/CAHPC.2003.1250345

A. Kamalizad, Chengzhi Pan, N. Bagherzadeh

We present implementation of a very fast parallel complex FFT on M2, the second generation of MorphoSys reconfigurable computation platform, which is targeting on streamed applications such as multimedia and DSP. The proposed mapping comprises fast presorting, cascaded radix-2 stages, and postreordering. Data and twiddle factors are 16-bit real and 16-bit imaginary in 2's complement format and scaling is performed to avoid overflow. The mapping is tested on our cycle-accurate simulator, "mulate", and the performance is encouragingly better than other architectures such as Imagine and VIRAM. Moreover, the performance is scalable according to FFT sizes. Since there is no functionality specifically tailored to FFT, the results demonstrate the capability of MorphoSys architecture to extract parallelism from streamed applications. Further rationales are given based on the concepts of scalar operand networks and memory hierarchy.

我们在第二代MorphoSys可重构计算平台M2上实现了一个非常快速的并行复杂FFT，该平台针对多媒体和DSP等流应用。提出的映射包括快速预排序、级联基数2阶段和后排序。数据和旋转因子在2的补码格式中为16位实数和16位虚数，并执行缩放以避免溢出。该映射在我们的周期精确模拟器“mulate”上进行了测试，其性能比其他架构(如Imagine和VIRAM)要好得多。此外，性能可以根据FFT的大小进行扩展。由于没有专门针对FFT的功能，因此结果证明了MorphoSys架构能够从流应用程序中提取并行性。基于标量操作数网络和内存层次结构的概念给出了进一步的理论基础。

引用次数: 42

Optimizing packet capture on symmetric multiprocessing machines 在对称多处理机上优化数据包捕获

Proceedings. 15th Symposium on Computer Architecture and High Performance Computing

Pub Date : 2003-11-10 DOI: 10.1109/CAHPC.2003.1250328

Gianluca Varenni, M. Baldi, Loris Degioanni, Fulvio Risso

Traffic monitoring and analysis based on general purpose systems with high speed interfaces, such as Gigabit Ethernet and 10 Gigabit Ethernet, requires carefully designed software in order to achieve the needed performance. One approach to attain such a performance relies on deploying multiple processors. This work analyses some general issues in multiprocessor systems that are particularly critical in the context of packet capture and network monitoring applications. More important, a new algorithm is proposed to coordinate multiple producers concurrently accessing a shared buffer, which is instrumental in packet capture on symmetrical multiprocessor machines.

基于具有高速接口的通用系统(如千兆以太网和万兆以太网)的流量监控和分析需要精心设计软件才能达到所需的性能。实现这种性能的一种方法依赖于部署多个处理器。本工作分析了多处理器系统中的一些一般问题，这些问题在数据包捕获和网络监控应用中尤为重要。更重要的是，提出了一种协调多个生产者同时访问共享缓冲区的新算法，这有助于在对称多处理机上进行数据包捕获。

引用次数: 13

On the implementation of SPMD applications using Haskell/sub #/ 关于使用Haskell/ sub# /实现SPMD应用程序

Proceedings. 15th Symposium on Computer Architecture and High Performance Computing

Pub Date : 2003-11-10 DOI: 10.1109/CAHPC.2003.1250321

Francisco Heron de Carvalho Junior, R. Lins, N. Quental

Commodities-built clusters, a low cost alternative for distributed parallel processing, brought high-performance computing to a wide range of users. However, the existing widespread tools for distributed parallel programming, such as messaging passing libraries, does not attend new software engineering requirements that have emerged due to increase in complexity of applications. Haskell/sub #/ is a parallel programming language intending to reconcile higher abstraction and modularity with scalable performance. It is demonstrated the use of Haskell/sub #/ in the programming of three SPMD benchmark programs, which have lower-level MPI implementations available.

商品构建的集群是分布式并行处理的一种低成本替代方案，它为广泛的用户带来了高性能计算。然而，现有的用于分布式并行编程的广泛使用的工具，如消息传递库，并不能满足由于应用程序复杂性增加而出现的新软件工程需求。Haskell/sub /是一种并行编程语言，旨在将更高的抽象和模块化与可扩展的性能相协调。本文演示了在三个SPMD基准程序的编程中使用Haskell/ sub# /，这些程序具有较低级的MPI实现。

引用次数: 2

Performance analysis of DECK collective communication service DECK集体通信业务性能分析

Proceedings. 15th Symposium on Computer Architecture and High Performance Computing

Pub Date : 2003-11-10 DOI: 10.1109/CAHPC.2003.1250322

Rafael Ennes Silva, Delcino Picinin, Marcos E. Barreto, R. Ávila, T. A. Diverio, P. Navaux

Collective communication is very useful for parallel applications, especially those in which matrix and vector data structures need to be manipulated by a group of processes. We present a performance analysis of collective communication primitives designed for the DECK parallel programming environment, with the aid of different numerical methods used to solve hydrodynamics and mass transportation models.

集体通信对于并行应用程序非常有用，特别是那些需要由一组进程操作矩阵和向量数据结构的应用程序。本文利用求解流体动力学和质量输运模型的不同数值方法，对为DECK并行编程环境设计的集体通信原语进行了性能分析。

引用次数: 1

ProGrid: a proxy-based architecture for grid operation and management ProGrid:用于网格操作和管理的基于代理的架构

Proceedings. 15th Symposium on Computer Architecture and High Performance Computing

Pub Date : 2003-11-10 DOI: 10.1109/CAHPC.2003.1250327

Paulo Vicente Capellotto Costa, S. Zorzo, H. Guardia

We introduce the ProGrid system, an architecture for computational grids, whose communication and resource management infrastructure is used transparently by the applications. Unlike other grid approaches, either application-centric or system-centric, the model relies on the use of proxy servers to perform additional communications and authentication procedures on behalf of client applications. The purpose of this mechanism is to enable parallel applications to be executed in geographically distributed environments interlinked by an open communication network, such as the Internet, meeting the security requisites desirable for computational grids. Among the common services of a grid, we focus on safe communication and the controlled sharing of available resources. To identify the resources, standards under development are considered for the specification of objects in grids. We also discuss an extension of the functionality of proxy servers to include support for the standardized management of the grid and of the available objects.

我们介绍了ProGrid系统，一个计算网格的体系结构，其通信和资源管理基础设施被应用程序透明地使用。与其他以应用程序为中心或以系统为中心的网格方法不同，该模型依赖于使用代理服务器来代表客户端应用程序执行额外的通信和身份验证过程。这种机制的目的是使并行应用程序能够在地理上分布的环境中执行，这些环境由开放的通信网络(如Internet)相互连接，从而满足计算网格所需的安全要求。在网格的公共服务中，重点关注安全通信和可用资源的可控共享。为了识别资源，正在开发的标准被考虑用于规范网格中的对象。我们还讨论了代理服务器功能的扩展，包括对网格和可用对象的标准化管理的支持。

引用次数: 8

Hybrid task scheduling: integrating static and dynamic heuristics 混合任务调度:集成静态和动态启发式

Proceedings. 15th Symposium on Computer Architecture and High Performance Computing

Pub Date : 2003-11-10 DOI: 10.1109/CAHPC.2003.1250339

Cristina Boeres, Alexandre A. B. Lima, Vinod E. F. Rebello

Researchers are constantly looking for ways to improve the execution time of parallel applications on distributed systems. Although compile-time static scheduling heuristics employ complex mechanisms, the quality of their schedules are handicapped by estimated run-time costs. On the other hand, while dynamic schedulers use actual run-time costs, they have to be of low complexity in order to reduce the scheduling overhead. We investigate the viability of integrating these two approaches into a hybrid scheduling framework. The relationship between static schedulers, dynamic heuristics and scheduling events are examined. The results show that a hybrid scheduler can indeed improve the schedules produced by good traditional static list scheduling algorithms.

研究人员一直在寻找改进分布式系统上并行应用程序执行时间的方法。尽管编译时静态调度启发式采用了复杂的机制，但其调度的质量受到估计的运行时成本的限制。另一方面，虽然动态调度器使用实际运行时成本，但为了减少调度开销，它们必须具有较低的复杂性。我们研究了将这两种方法集成到混合调度框架中的可行性。研究了静态调度器、动态启发式和调度事件之间的关系。结果表明，混合调度程序确实可以改善传统的静态列表调度算法所产生的调度。

引用次数: 41

Enabling dual-core mode in BlueGene/L: challenges and solutions 在BlueGene/L中实现双核模式:挑战和解决方案

Proceedings. 15th Symposium on Computer Architecture and High Performance Computing

Pub Date : 2003-11-10 DOI: 10.1109/CAHPC.2003.1250317

G. Almási, Leonardo R. Bachega, S. Chatterjee, Manish Gupta, D. Lieber, X. Martorell, J. Moreira

BlueGene/L is a massively parallel computer system with 65536 dual-processor compute nodes. The peak performance of BlueGene/L is in excess of 360 TFLOP/s if both processor cores in a node are used for computation. The main challenge of deploying this dual-core mode of operation is that the L1 caches in each core are not hardware coherent. This forces a software-based approach to cache coherence and guides our design of a programming model for dual-core mode. We describe the design, implementation, and performance evaluation of system software for enabling the use of dual-core mode on BlueGene/L. Our preliminary performance results show that our approach to dual-core mode is effective for key numerical kernels.

BlueGene/L是一个拥有65536个双处理器计算节点的大规模并行计算机系统。如果使用一个节点的两个处理器内核进行计算，BlueGene/L的峰值性能可以达到360 TFLOP/s以上。部署这种双核操作模式的主要挑战是每个核中的L1缓存不是硬件一致的。这迫使基于软件的方法来实现缓存一致性，并指导我们设计双核模式的编程模型。我们描述了在BlueGene/L上使用双核模式的系统软件的设计、实现和性能评估。我们的初步性能结果表明，我们的双核模式对关键的数值核是有效的。

引用次数: 2

Applying scheduling by edge reversal to constraint partitioning 边反转调度在约束分区中的应用

Proceedings. 15th Symposium on Computer Architecture and High Performance Computing

Pub Date : 2003-11-10 DOI: 10.1109/CAHPC.2003.1250331

M. Pereira, P. Vargas, F. França, M. D. Castro, I. Dutra

Scheduling by edge reversal (SER) is a fully distributed scheduling mechanism based on the manipulation of acyclic orientations of a graph. This work uses SER to perform constraint partitioning of constraint satisfaction problems (CSP). In order to apply the SER mechanism, the graph representing the constraints must receive an acyclic orientation. Since obtaining an optimal acyclic orientation is an NP-hard problem, we study three nondeterministic strategies known in the literature: Alg-Neigh, Alg-Edges, and Alg-Colour. We implemented the three algorithms and the SER scheduling mechanism, applying them to the CSP constraint networks generated from 3 applications. Our results show that SER has a great potential to perform a good partitioning of the constraint graphs.

边缘反转调度(SER)是一种基于图的无环方向操作的全分布式调度机制。本研究使用SER对约束满足问题(CSP)进行约束划分。为了应用SER机制，表示约束的图必须接受非循环方向。由于获得最优无环定向是一个np困难问题，我们研究了文献中已知的三种不确定性策略:Alg-Neigh, Alg-Edges和Alg-Colour。我们实现了这三种算法和SER调度机制，并将它们应用于由三个应用程序生成的CSP约束网络。我们的结果表明，SER在执行约束图的良好划分方面具有很大的潜力。

引用次数: 6

X4CP32: a new parallel/reconfigurable general-purpose processor X4CP32:一种新的并行/可重构通用处理器

Proceedings. 15th Symposium on Computer Architecture and High Performance Computing

Pub Date : 2003-11-10 DOI: 10.1109/CAHPC.2003.1250346

R. Soares, A. Azevedo, Ivan Saraiva Silva

The X4CP32 is a parallel/reconfigurable microprocessor with 2 programming levels. Although it is a general-purpose microprocessor, it has the reliable performance of a reconfigurable architecture. We expose its architecture and programming levels, and discuss the powerful interaction between parallel programming and reconfiguration. It shows two performance-optimized implementations of matrix multiplication using both parallel and reconfigurable paradigms and a parallel implementation of miner intelligent agents.

X4CP32是一种并行/可重构微处理器，具有2个编程级别。虽然它是一个通用微处理器，但它具有可重构体系结构的可靠性能。我们揭示了它的体系结构和编程层次，并讨论了并行编程和重构之间的强大交互。它展示了使用并行和可重构范式的两种性能优化的矩阵乘法实现，以及矿工智能代理的并行实现。

引用次数: 3

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. 15th Symposium on Computer Architecture and High Performance Computing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀