Proceedings 11th International Parallel Processing Symposium最新文献

英文中文

A compiler-directed cache coherence scheme using data prefetching 使用数据预取的编译器定向缓存一致性方案

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580970

Hock-Beng Lim, P. Yew

Cache coherence enforcement and memory latency reduction and hiding are very important problems in the design of large-scale shared-memory multiprocessors. The authors propose a compiler-directed cache coherence scheme which makes use of data prefetching. The cache coherence with data prefetching (CCDP) scheme uses compiler analysis techniques to identify potentially-stale data references, which are references to invalid copies of cached data. The key idea of the CCDP scheme is to enforce cache coherence by prefetching the up-to-date data corresponding to these potentially-stale references from the main memory. Application case studies were conducted to gain a quantitative idea of the performance potential of the CCDP scheme on a real system. They applied the CCDP scheme on four benchmark programs from the SPEC CFP95 and CFP92 suites, and executed them on the Cray T3D. The experimental results show that for the programs studied, the scheme provides significant performance improvements by caching shared data and reducing the remote shared-memory access penalty incurred by the programs.

在大规模共享内存多处理器的设计中，实现缓存一致性、降低内存延迟和隐藏是非常重要的问题。作者提出了一种利用数据预取的编译器定向缓存一致性方案。缓存一致性与数据预取(CCDP)方案使用编译器分析技术来识别潜在的过期数据引用，这些引用是对缓存数据的无效副本的引用。CCDP方案的关键思想是通过从主存中预取与这些可能过时的引用相对应的最新数据来强制缓存一致性。应用实例研究是为了定量地了解CCDP方案在实际系统上的性能潜力。他们将CCDP方案应用于来自SPEC CFP95和CFP92套件的四个基准测试程序，并在Cray T3D上执行。实验结果表明，对于所研究的程序，该方案通过缓存共享数据和减少程序产生的远程共享内存访问惩罚，提供了显着的性能改进。

引用次数: 4

Broadcasting and multicasting in cut-through routed networks 直通路由网络中的广播和多播

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580989

Johanne Cohen, P. Fraigniaud, J. König, A. Raspaud

This paper addresses the one-to-all broadcasting problem, and the one-to-many broadcasting problem, usually simply called broadcasting and multicasting, respectively. In this paper, we study these problems under both line model, and cut-through model. The former assumes long distance calls between non neighboring processors. The latter completes the line model by taking into account the use of a routing function. It is known that one can find time optimal broadcast and multicast protocols in the line model in polynomial time. We present a new time optimal broadcasting and multicasting algorithm in the line model. This algorithm efficiently uses the bandwidth of the network. Moreover, it also applies to the cut-through model as soon as the routing function generates shortest paths only.

本文解决了一对所有广播问题和一对多广播问题，通常分别简单地称为广播和多播。本文分别在直线模型和穿透模型下研究了这些问题。前者假设在非相邻处理器之间进行长途调用。后者通过考虑路由函数的使用来完成线路模型。已知在多项式时间内，可以在直线模型中找到时间最优的广播和组播协议。提出了一种新的线模型时间最优广播和组播算法。该算法有效地利用了网络带宽。此外，只要路由函数只生成最短路径，它也适用于直通模型。

引用次数: 8

Optimizing parallel bitonic sort 优化并行双元排序

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580914

M. Ionescu, K. Schauser

Sorting is an important component of many applications, and parallel sorting algorithms have been studied extensively in the last three decades. One of the earliest parallel sorting algorithms is bitonic sort, which is represented by a sorting network consisting of multiple butterfly stages. The paper studies bitonic sort on modern parallel machines which are relatively coarse grained and consist of only a modest number of nodes, thus requiring the mapping of many data elements to each processor. Under such a setting optimizing the bitonic sort algorithm becomes a question of mapping the data elements to processing nodes (data layout) such that communication is minimized. The authors developed a bitonic sort algorithm which minimizes the number of communication steps and optimizes the local computation. The resulting algorithm is faster than previous implementations, as experimental results collected on a 64 node Meiko CS-2 show.

排序是许多应用程序的重要组成部分，并行排序算法在过去三十年中得到了广泛的研究。最早的并行排序算法之一是双步排序算法，它由多个蝴蝶阶段组成的排序网络来表示。本文研究了现代并行机器上的双元排序问题，这些并行机器的粒度比较粗，节点数量不多，因此需要将许多数据元素映射到每个处理器。在这种情况下，优化双元排序算法就变成了将数据元素映射到处理节点(数据布局)从而使通信最小化的问题。作者提出了一种双元排序算法，该算法可以最大限度地减少通信步骤，并优化局部计算。正如在64节点Meiko CS-2上收集的实验结果所示，所得到的算法比以前的实现要快。

引用次数: 57

Low latency MPI for Meiko CS/2 and ATM clusters 低延迟MPI为Meiko CS/2和ATM集群

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580929

Chris R. Jones, Ambuj K. Singh, D. Agrawal

MPI (Message Passing Interface) is a proposed message-passing standard for the development of efficient and portable parallel programs. An implementation of MPI is presented and evaluated for the Meiko CS/2, a 64-node parallel computer, and a network of 8 SGI workstations connected by an ATM switch and an Ethernet.

MPI(消息传递接口)是为开发高效、可移植的并行程序而提出的消息传递标准。提出了一种MPI的实现，并对64节点并行计算机Meiko CS/2和一个由ATM交换机和以太网连接的8个SGI工作站组成的网络进行了评估。

引用次数: 1

Semantics and implementation of a generalized forall statement for parallel languages 并行语言通用forall语句的语义和实现

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580953

P. Dechering, L. Breebaart, F. Kuijlman, K. V. Reeuwijk, H. Sips

In this paper we present a generalized forall statement for parallel languages. The forall statement occurs in many (data) parallel languages and specifies which computations can be performed independently. Many different definitions of such a construct can be found in literature, with different conditions and execution models. We will show how forall constructs of a wide class of parallel languages can be mapped to this generalized forall statement. In addition, the forall statement we propose has the ability to spawn more complex independent activities than can be found in these languages. Denotational semantics are used to define the meaning of the forall and define only one possible program state change. It is shown that it is easy to use and that it is feasible to implement this forall efficiently.

本文给出了并行语言的一个广义的forall命题。forall语句出现在许多(数据)并行语言中，并指定哪些计算可以独立执行。在文献中可以找到这种构造的许多不同的定义，具有不同的条件和执行模型。我们将展示如何将一类广泛的并行语言的forall构造映射到这个广义的forall语句。此外，我们提出的forall语句有能力生成比这些语言中更复杂的独立活动。指称语义用于定义forall的含义，并且只定义一种可能的程序状态更改。实验结果表明，该方法易于使用，是可行的。

引用次数: 4

Comparing gang scheduling with dynamic space sharing on symmetric multiprocessors using automatic self-allocating threads (ASAT) 基于自动自分配线程(ASAT)的对称多处理机动态空间共享与组调度的比较

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580911

C. Severance, R. Enbody

The work considers the best way to handle a diverse mix of multi-threaded and single-threaded jobs running on a single symmetric parallel processing system. The traditional approaches to this problem are free scheduling, gang scheduling, or space sharing. The paper examines a less common technique called dynamic space sharing. One approach to dynamic space sharing, automatic self allocating threads (ASAT), is compared to all of the traditional approaches to scheduling a mixed load of jobs. Performance results for ASAT scheduling, gang scheduling, and free scheduling are presented. ASAT scheduling is shown to be the superior approach to mixing multi-threaded work with single threaded work.

这项工作考虑了处理在单个对称并行处理系统上运行的多线程和单线程作业的各种混合的最佳方法。解决这个问题的传统方法是自由调度、组调度或空间共享。本文研究了一种不太常见的技术，称为动态空间共享。动态空间共享的一种方法是自动自分配线程(ASAT)，它与调度混合作业负载的所有传统方法进行了比较。给出了ASAT调度、组调度和自由调度的性能结果。ASAT调度被证明是混合多线程工作和单线程工作的优越方法。

引用次数: 9

On privatization of variables for data-parallel execution 关于数据并行执行的变量私营化

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580952

Manish Gupta

Privatization of data is an important technique that has been used by compilers to parallelize loops by eliminating storage-related dependences. When a compiler partitions computations based on the ownership of data, selecting a proper mapping of privatizable data is crucial to obtaining the benefits of privatization. This paper presents a novel framework for privatizing scalar and array variables in the context of a data-driven approach to parallelization. We show that there are numerous alternatives available for mapping privatized variables and the choice of mapping can significantly affect the performance of the program. We present an algorithm that attempts to preserve parallelism and minimize communication overheads. We also introduce the concept of partial privatization of arrays that combines data partitioning and privatization, and enables efficient handling of a class of codes with multi-dimensional data distribution that was not previously possible. Finally, we show how the ideas of privatization apply to the execution of control flow statements as well. An implementation of these ideas in the pHPF prototype compiler for High Performance Fortran on the IBM SP2 machine has shown impressive results.

数据私营化是一项重要的技术，编译器通过消除与存储相关的依赖关系来并行化循环。当编译器根据数据的所有权对计算进行分区时，选择可私有化数据的适当映射对于获得私有化的好处至关重要。本文提出了一种新的框架，用于在数据驱动的并行化方法中私有化标量和数组变量。我们表明，有许多可用于映射私有变量的替代方法，并且映射的选择可以显着影响程序的性能。我们提出了一种尝试保持并行性和最小化通信开销的算法。我们还介绍了数组部分私有化的概念，它结合了数据分区和私有化，并且能够有效地处理具有多维数据分布的一类代码，这在以前是不可能的。最后，我们将展示私有化的思想如何应用于控制流语句的执行。这些思想在IBM SP2机器上用于高性能Fortran的pHPF原型编译器中的实现显示了令人印象深刻的结果。

{"title":"On privatization of variables for data-parallel execution","authors":"Manish Gupta","doi":"10.1109/IPPS.1997.580952","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580952","url":null,"abstract":"Privatization of data is an important technique that has been used by compilers to parallelize loops by eliminating storage-related dependences. When a compiler partitions computations based on the ownership of data, selecting a proper mapping of privatizable data is crucial to obtaining the benefits of privatization. This paper presents a novel framework for privatizing scalar and array variables in the context of a data-driven approach to parallelization. We show that there are numerous alternatives available for mapping privatized variables and the choice of mapping can significantly affect the performance of the program. We present an algorithm that attempts to preserve parallelism and minimize communication overheads. We also introduce the concept of partial privatization of arrays that combines data partitioning and privatization, and enables efficient handling of a class of codes with multi-dimensional data distribution that was not previously possible. Finally, we show how the ideas of privatization apply to the execution of control flow statements as well. An implementation of these ideas in the pHPF prototype compiler for High Performance Fortran on the IBM SP2 machine has shown impressive results.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114272923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

A hybrid interconnection network for integrated communication services 用于综合通信业务的混合互联网络

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580924

Yilong Chen, Jyh-Charn S. Liu

This paper presents a hybrid interconnection network architecture to support integrated communication services for multicomputer-based database and multimedia systems. Our study shows that existing wormhole routing networks are inefficient in transfer of long files. We demonstrate the feasibility of integrating different network techniques based on virtual channels and flexible routing mechanisms.

本文提出了一种支持多计算机数据库和多媒体系统综合通信业务的混合互连网络体系结构。我们的研究表明，现有的虫洞路由网络在传输长文件时效率低下。我们论证了基于虚拟信道和灵活路由机制集成不同网络技术的可行性。

引用次数: 6

A tool for on-line visualization and interactive steering of parallel HPC applications 并行HPC应用程序的在线可视化和交互式转向工具

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580882

S. Rathmayer

Tools for parallel systems today range from specification over debugging to performance analysis and more. Typically, they help the programmers of parallel algorithms from the early development stages to a certain level of program optimization. However in HPC (High Performance Computing) today the end-user of massively parallel CFD (Computational Fluid Dynamics)-programs has little or no support in his work. The scientific engineer who often runs his application on a parallel computer somewhere in the WAN (Wide Area Network) and visualizes the enormous amounts of simulation data on a graphical workstation in his LAN (Local Area Network) has needs which are by far not covered by state of the art visualization systems. The tool proposed here follows a strategy which differs completely from existing, batch-oriented and strictly sequential methods of the working process in the application cycle of parallel HPC applications. It allows both on-line visualization and interactive program steering of massively parallel CFD-applications. The parameters of the mathematical model and the numerical methods build objects of a database which can be accessed by an object-oriented graphical user interface via visualization and modification operators. Experiences with this new tool concept VIPER (VIsualization of Parallel numerical simulation algorithms for Extended Research) applied on a real-world and industrial scientific application will be shown.

目前用于并行系统的工具包括从规范调试到性能分析等等。通常，他们帮助并行算法的程序员从早期开发阶段到一定程度的程序优化。然而，在高性能计算的今天，大规模并行计算流体动力学(CFD)程序的最终用户在他的工作中很少或根本没有得到支持。科学工程师经常在WAN(广域网)某处的并行计算机上运行他的应用程序，并在LAN(局域网)的图形工作站上可视化大量模拟数据，这些需求是目前为止最先进的可视化系统所无法满足的。本文提出的工具遵循一种完全不同于现有的并行高性能计算应用程序应用周期中工作过程的面向批处理和严格顺序方法的策略。它允许大规模并行cfd应用程序的在线可视化和交互式程序控制。数学模型和数值方法的参数构成数据库对象，数据库对象可通过可视化和修改操作符被面向对象的图形用户界面访问。将展示将这种新的工具概念VIPER(用于扩展研究的并行数值模拟可视化算法)应用于现实世界和工业科学应用的经验。

{"title":"A tool for on-line visualization and interactive steering of parallel HPC applications","authors":"S. Rathmayer","doi":"10.1109/IPPS.1997.580882","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580882","url":null,"abstract":"Tools for parallel systems today range from specification over debugging to performance analysis and more. Typically, they help the programmers of parallel algorithms from the early development stages to a certain level of program optimization. However in HPC (High Performance Computing) today the end-user of massively parallel CFD (Computational Fluid Dynamics)-programs has little or no support in his work. The scientific engineer who often runs his application on a parallel computer somewhere in the WAN (Wide Area Network) and visualizes the enormous amounts of simulation data on a graphical workstation in his LAN (Local Area Network) has needs which are by far not covered by state of the art visualization systems. The tool proposed here follows a strategy which differs completely from existing, batch-oriented and strictly sequential methods of the working process in the application cycle of parallel HPC applications. It allows both on-line visualization and interactive program steering of massively parallel CFD-applications. The parameters of the mathematical model and the numerical methods build objects of a database which can be accessed by an object-oriented graphical user interface via visualization and modification operators. Experiences with this new tool concept VIPER (VIsualization of Parallel numerical simulation algorithms for Extended Research) applied on a real-world and industrial scientific application will be shown.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123529551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Alias analysis for Fortran90 array slices 别名分析Fortran90阵列切片

Proceedings 11th International Parallel Processing Symposium

Pub Date : 1997-04-01 DOI: 10.1109/IPPS.1997.580967

K. Gopinath, R. Seshadri

Most alias analyses produce approximate results in the presence of array slices. This may lead to inefficient code which is of concern, especially, in languages like Fortran90. The authors present an overview of a static alias analysis that gives accurate results in the presence of array slices in Fortran90.

大多数别名分析在存在数组切片的情况下产生近似结果。这可能会导致低效的代码，这是一个值得关注的问题，特别是在Fortran90这样的语言中。作者介绍了静态别名分析的概述，该分析在Fortran90中存在数组切片时给出了准确的结果。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings 11th International Parallel Processing Symposium

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀