首页 > 最新文献

ACM/IEEE SC 2000 Conference (SC'00)最新文献

英文 中文
MPICH-GQ: Quality-of-Service for Message Passing Programs 消息传递程序的服务质量
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10017
A. Roy, Ian T Foster, W. Gropp, N. Karonis, V. Sander, B. Toonen
Parallel programmers typically assume that all resources required for a program’s execution are dedicated to that purpose. However, in local and wide area networks, contention for shared networks, CPUs, and I/O systems can result in significant variations in availability, with consequent adverse effects on overall performance. We describe a new message-passing architecture, MPICH-GQ, that uses quality of service (QoS) mechanisms to manage contention and hence improve performance of message passing interface (MPI) applications. MPICH-GQ combines new QoS specification, traffic shaping, QoS reservation, and QoS implementation techniques to deliver QoS capabilities to the high-bandwidth bursty flows, complex structures, and reliable protocols used in high-performance applications-characteristics very different from the low-bandwidth, constant bit-rate media flows and unreliable protocols for which QoS mechanisms were designed. Results obtained on a differentiated services testbed demonstrate our ability to maintain application performance in the face of heavy network contention.
并行程序员通常假设程序执行所需的所有资源都专门用于该目的。但是,在局域网和广域网中,对共享网络、cpu和I/O系统的争用可能导致可用性的显著变化,从而对总体性能产生不利影响。我们描述了一种新的消息传递体系结构MPICH-GQ,它使用服务质量(QoS)机制来管理争用,从而提高消息传递接口(MPI)应用程序的性能。MPICH-GQ结合了新的QoS规范、流量整形、QoS保留和QoS实现技术,为高性能应用程序中使用的高带宽突发流、复杂结构和可靠协议提供QoS功能,这些特性与为QoS机制设计的低带宽、恒定比特率媒体流和不可靠协议非常不同。在一个差异化服务测试平台上获得的结果证明了我们在面对严重的网络竞争时保持应用程序性能的能力。
{"title":"MPICH-GQ: Quality-of-Service for Message Passing Programs","authors":"A. Roy, Ian T Foster, W. Gropp, N. Karonis, V. Sander, B. Toonen","doi":"10.1109/SC.2000.10017","DOIUrl":"https://doi.org/10.1109/SC.2000.10017","url":null,"abstract":"Parallel programmers typically assume that all resources required for a program’s execution are dedicated to that purpose. However, in local and wide area networks, contention for shared networks, CPUs, and I/O systems can result in significant variations in availability, with consequent adverse effects on overall performance. We describe a new message-passing architecture, MPICH-GQ, that uses quality of service (QoS) mechanisms to manage contention and hence improve performance of message passing interface (MPI) applications. MPICH-GQ combines new QoS specification, traffic shaping, QoS reservation, and QoS implementation techniques to deliver QoS capabilities to the high-bandwidth bursty flows, complex structures, and reliable protocols used in high-performance applications-characteristics very different from the low-bandwidth, constant bit-rate media flows and unreliable protocols for which QoS mechanisms were designed. Results obtained on a differentiated services testbed demonstrate our ability to maintain application performance in the face of heavy network contention.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134608578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
An Object-Oriented Job Execution Environment 一个面向对象的作业执行环境
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10036
L. Smith, R. Fatoohi
This is a project for developing a distributed job execution environment for highly iterative jobs. An iterative job is one where the same binary code is run hundreds of times with incremental changes in the input values for each run. An execution environment is a set of resources on a computing platform that can be made available to run the job and hold the output until it is collected. The goal is to design a complete, object-oriented scheduling system that will run a variety of jobs with minimal changes. Areas of code that are unique to one specific type of job are decoupled from the rest. The system allows for fine-grained job control, timely status notification and dynamic registration and deregistration of execution platforms depending on resources available. Several objected-oriented technologies are employed: Java, CORBA, UML, and software design patterns. The environment has been tested using a CFD code, INS2D.
这是一个为高度迭代作业开发分布式作业执行环境的项目。迭代作业是将相同的二进制代码运行数百次,每次运行的输入值都会发生增量更改的作业。执行环境是计算平台上的一组资源,可用于运行作业并保存输出,直到收集完成。目标是设计一个完整的、面向对象的调度系统,该系统将以最小的更改运行各种作业。特定类型作业特有的代码区域与其他部分解耦。该系统允许细粒度的作业控制、及时的状态通知以及根据可用资源动态注册和注销执行平台。采用了几种面向对象的技术:Java、CORBA、UML和软件设计模式。该环境已经使用CFD代码INS2D进行了测试。
{"title":"An Object-Oriented Job Execution Environment","authors":"L. Smith, R. Fatoohi","doi":"10.1109/SC.2000.10036","DOIUrl":"https://doi.org/10.1109/SC.2000.10036","url":null,"abstract":"This is a project for developing a distributed job execution environment for highly iterative jobs. An iterative job is one where the same binary code is run hundreds of times with incremental changes in the input values for each run. An execution environment is a set of resources on a computing platform that can be made available to run the job and hold the output until it is collected. The goal is to design a complete, object-oriented scheduling system that will run a variety of jobs with minimal changes. Areas of code that are unique to one specific type of job are decoupled from the rest. The system allows for fine-grained job control, timely status notification and dynamic registration and deregistration of execution platforms depending on resources available. Several objected-oriented technologies are employed: Java, CORBA, UML, and software design patterns. The environment has been tested using a CFD code, INS2D.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114231420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ESP: A System Utilization Benchmark ESP:系统利用率基准
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10056
A. Wong, L. Oliker, W. Kramer, T. Kaltz, D. Bailey
This article describes a new benchmark, called the Effective System Performance (ESP) test, which is designed to measure system-level performance, including such factors as job scheduling efficiency, handling of large jobs and shutdown-reboot times. In particular, this test can be used to study the effects of various scheduling policies and parameters. We present here some results that we have obtained so far on the Cray T3E and IBM SP systems, together with insights obtained from simulations.
本文描述了一个新的基准测试,称为有效系统性能(ESP)测试,该测试旨在测量系统级性能,包括作业调度效率、处理大型作业和关闭-重新启动时间等因素。特别地,这个测试可以用来研究各种调度策略和参数的影响。我们在这里展示了迄今为止在Cray T3E和IBM SP系统上获得的一些结果,以及从模拟中获得的见解。
{"title":"ESP: A System Utilization Benchmark","authors":"A. Wong, L. Oliker, W. Kramer, T. Kaltz, D. Bailey","doi":"10.1109/SC.2000.10056","DOIUrl":"https://doi.org/10.1109/SC.2000.10056","url":null,"abstract":"This article describes a new benchmark, called the Effective System Performance (ESP) test, which is designed to measure system-level performance, including such factors as job scheduling efficiency, handling of large jobs and shutdown-reboot times. In particular, this test can be used to study the effects of various scheduling policies and parameters. We present here some results that we have obtained so far on the Cray T3E and IBM SP systems, together with insights obtained from simulations.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123959586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Dynamic Software Testing of MPI Applications with Umpire MPI应用的Umpire动态软件测试
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10055
J. Vetter, B. Supinski
As evidenced by the popularity of MPI (Message Passing Interface), message passing is an effective programming technique for managing coarse-grained concurrency on distributed computers. Unfortunately, debugging message-passing applications can be difficult. Software complexity, data races, and scheduling dependencies can make programming errors challenging to locate with manual, interactive debugging techniques. This article describes Umpire, a new tool for detecting programming errors at runtime in message passing applications. Umpire monitors the MPI operations of an application by interposing itself between the application and the MPI runtime system using the MPI profiling layer. Umpire then checks the application’s MPI behavior for specific errors. Our initial collection of programming errors includes deadlock detection, mismatched collective operations, and resource exhaustion. We present an evaluation on a variety of applications that demonstrates the effectiveness of this approach.
正如MPI(消息传递接口)的流行所证明的那样,消息传递是一种在分布式计算机上管理粗粒度并发性的有效编程技术。不幸的是,调试消息传递应用程序可能很困难。软件复杂性、数据竞争和调度依赖关系可能使编程错误难以通过手动、交互式调试技术进行定位。本文描述了Umpire,这是一个用于在消息传递应用程序的运行时检测编程错误的新工具。Umpire通过使用MPI剖析层将自己插入到应用程序和MPI运行时系统之间来监视应用程序的MPI操作。然后Umpire检查应用程序的MPI行为是否有特定的错误。我们最初收集的编程错误包括死锁检测、不匹配的集体操作和资源耗尽。我们对各种应用进行了评估,证明了这种方法的有效性。
{"title":"Dynamic Software Testing of MPI Applications with Umpire","authors":"J. Vetter, B. Supinski","doi":"10.1109/SC.2000.10055","DOIUrl":"https://doi.org/10.1109/SC.2000.10055","url":null,"abstract":"As evidenced by the popularity of MPI (Message Passing Interface), message passing is an effective programming technique for managing coarse-grained concurrency on distributed computers. Unfortunately, debugging message-passing applications can be difficult. Software complexity, data races, and scheduling dependencies can make programming errors challenging to locate with manual, interactive debugging techniques. This article describes Umpire, a new tool for detecting programming errors at runtime in message passing applications. Umpire monitors the MPI operations of an application by interposing itself between the application and the MPI runtime system using the MPI profiling layer. Umpire then checks the application’s MPI behavior for specific errors. Our initial collection of programming errors includes deadlock detection, mismatched collective operations, and resource exhaustion. We present an evaluation on a variety of applications that demonstrates the effectiveness of this approach.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129869572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 223
Is Data Distribution Necessary in OpenMP? 数据分布在OpenMP中是必要的吗?
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10025
Dimitrios S. Nikolopoulos, T. Papatheodorou, C. Polychronopoulos, Jesús Labarta, E. Ayguadé
This paper investigates the performance implications of data placement in OpenMP programs running on modern ccNUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of state-of-the-art ccNUMA architectures, reasonably balanced page placement schemes, such as round-robin or random distribution of pages incur modest performance losses. We also show that performance leaks stemming from suboptimal page placement schemes can be remedied with a smart user-level page migration engine. The main body of the paper describes how the OpenMP runtime environment can use page migration for implementing implicit data distribution and redistribution schemes without programmer intervention. Our experimental results support the effectiveness of these mechanisms and provide a proof of concept that there is no need to introduce data distribution directives in OpenMP and warrant the portability of the programming model.
本文研究了在现代ccNUMA多处理器上运行的OpenMP程序中数据放置的性能影响。数据局部性和最小化远程内存访问速率对于在这些系统上保持高性能至关重要。我们表明,由于最先进的ccNUMA体系结构的远程到本地内存访问延迟比较低,合理平衡的页面放置方案,如循环或随机分配页面,会导致适度的性能损失。我们还展示了由次优页面放置方案引起的性能泄漏可以通过智能用户级页面迁移引擎来修复。论文的主体部分描述了OpenMP运行时环境如何在没有程序员干预的情况下使用页面迁移来实现隐式数据分发和再分发方案。我们的实验结果支持这些机制的有效性,并证明了在OpenMP中不需要引入数据分发指令并保证编程模型的可移植性的概念。
{"title":"Is Data Distribution Necessary in OpenMP?","authors":"Dimitrios S. Nikolopoulos, T. Papatheodorou, C. Polychronopoulos, Jesús Labarta, E. Ayguadé","doi":"10.1109/SC.2000.10025","DOIUrl":"https://doi.org/10.1109/SC.2000.10025","url":null,"abstract":"This paper investigates the performance implications of data placement in OpenMP programs running on modern ccNUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of state-of-the-art ccNUMA architectures, reasonably balanced page placement schemes, such as round-robin or random distribution of pages incur modest performance losses. We also show that performance leaks stemming from suboptimal page placement schemes can be remedied with a smart user-level page migration engine. The main body of the paper describes how the OpenMP runtime environment can use page migration for implementing implicit data distribution and redistribution schemes without programmer intervention. Our experimental results support the effectiveness of these mechanisms and provide a proof of concept that there is no need to introduce data distribution directives in OpenMP and warrant the portability of the programming model.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130270123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Performance and Interoperability Issues in Incorporating Cluster Management Systems within a Wide-Area Network-Computing Environment 广域网计算环境下整合集群管理系统的性能和互操作性问题
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10021
S. Adabala, N. Kapadia, J. Fortes
This paper describes the performance and interoperability issues that arise in the process of integrating cluster management systems into a wide-area network-computing environment, and provides solutions in the context of the Purdue University Network Computing Hubs (PUNCH). The described solution provides users with a single point of access to resources spread across administrative domains, and an intel ligent translation process makes it possible for users to submit jobs to different types of cluster management systems in a transparent manner. The approach does not require any modifications to the cluster management software; however, call-back and caching capabilities that would improve performance and make such systems more interoperable with wide-area computing systems are discussed.
本文描述了将集群管理系统集成到广域网计算环境中的过程中出现的性能和互操作性问题,并在普渡大学网络计算中心(PUNCH)的背景下提供了解决方案。所描述的解决方案为用户提供了对跨管理域的资源的单点访问,并且智能转换过程使用户能够以透明的方式向不同类型的集群管理系统提交作业。该方法不需要对集群管理软件进行任何修改;然而,回调和缓存功能将提高性能并使此类系统与广域计算系统更具互操作性。
{"title":"Performance and Interoperability Issues in Incorporating Cluster Management Systems within a Wide-Area Network-Computing Environment","authors":"S. Adabala, N. Kapadia, J. Fortes","doi":"10.1109/SC.2000.10021","DOIUrl":"https://doi.org/10.1109/SC.2000.10021","url":null,"abstract":"This paper describes the performance and interoperability issues that arise in the process of integrating cluster management systems into a wide-area network-computing environment, and provides solutions in the context of the Purdue University Network Computing Hubs (PUNCH). The described solution provides users with a single point of access to resources spread across administrative domains, and an intel ligent translation process makes it possible for users to submit jobs to different types of cluster management systems in a transparent manner. The approach does not require any modifications to the cluster management software; however, call-back and caching capabilities that would improve performance and make such systems more interoperable with wide-area computing systems are discussed.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130842562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Tool Framework for Static and Dynamic Analysis of Object-Oriented Software with Templates 基于模板的面向对象软件静态与动态分析工具框架
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10052
K. Lindlan, J. Cuny, A. Malony, S. Shende, B. Mohr, R. Rivenburgh, C. Rasmussen
The developers of high-performance scientific applications often work in complex computing environments that place heavy demands on program analysis tools. The developers need tools that interoperate, are portable across machine architectures, and provide source-level feedback. In this paper, we describe a tool framework, the Program Database Toolkit (PDT), that supports the development of program analysis tools meeting these requirements. PDT uses compile-time information to create a complete database of high-level program information that is structured for well-defined and uniform access by tools and applications. PDT’s current applications make heavy use of advanced features of C++, in particular, templates. We describe the toolkit, focussing on its most important contribution -- its handling of templates -- as well as its use in existing applications.
高性能科学应用程序的开发人员经常在复杂的计算环境中工作,这对程序分析工具提出了很高的要求。开发人员需要能够互操作、跨机器架构可移植并提供源代码级反馈的工具。在本文中,我们描述了一个工具框架,程序数据库工具包(PDT),它支持开发满足这些需求的程序分析工具。PDT使用编译时信息来创建高级程序信息的完整数据库,该数据库是为工具和应用程序的定义良好和统一的访问而构建的。PDT当前的应用程序大量使用c++的高级特性,特别是模板。我们将描述这个工具包,重点关注它最重要的贡献——它对模板的处理——以及它在现有应用程序中的使用。
{"title":"A Tool Framework for Static and Dynamic Analysis of Object-Oriented Software with Templates","authors":"K. Lindlan, J. Cuny, A. Malony, S. Shende, B. Mohr, R. Rivenburgh, C. Rasmussen","doi":"10.1109/SC.2000.10052","DOIUrl":"https://doi.org/10.1109/SC.2000.10052","url":null,"abstract":"The developers of high-performance scientific applications often work in complex computing environments that place heavy demands on program analysis tools. The developers need tools that interoperate, are portable across machine architectures, and provide source-level feedback. In this paper, we describe a tool framework, the Program Database Toolkit (PDT), that supports the development of program analysis tools meeting these requirements. PDT uses compile-time information to create a complete database of high-level program information that is structured for well-defined and uniform access by tools and applications. PDT’s current applications make heavy use of advanced features of C++, in particular, templates. We describe the toolkit, focussing on its most important contribution -- its handling of templates -- as well as its use in existing applications.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124150666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks MPI与MPI+OpenMP在IBM SP上的NAS基准测试
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10001
F. Cappello, D. Etiemble
The hybrid memory model of clusters of multiprocessors raises two issues: programming model and performance. Many parallel programs have been written by using the MPI standard. To evaluate the pertinence of hybrid models for existing MPI codes, we compare a unified model (MPI) and a hybrid one (OpenMP fine grain parallelization after profiling) for the NAS 2.3 benchmarks on two IBM SP systems. The superiority of one model depends on 1) the level of shared memory model parallelization, 2) the communication patterns and 3) the memory access patterns. The relative speeds of the main architecture components (CPU, memory, and network) are of tremendous importance for selecting one model. With the used hybrid model, our results show that a unified MPI approach is better for most of the benchmarks. The hybrid approach becomes better only when fast processors make the communication performance significant and the level of parallelization is sufficient.
多处理器集群的混合内存模型提出了两个问题:编程模型和性能。许多并行程序都是使用MPI标准编写的。为了评估混合模型对现有MPI代码的相关性,我们在两个IBM SP系统上比较了用于NAS 2.3基准测试的统一模型(MPI)和混合模型(分析后的OpenMP细粒度并行化)。一个模型的优越性取决于:1)共享内存模型的并行化程度;2)通信模式;3)内存访问模式。主要架构组件(CPU、内存和网络)的相对速度对于选择一个模型非常重要。使用混合模型,我们的结果表明,统一的MPI方法对大多数基准测试都更好。只有当快速处理器使通信性能显著且并行化水平足够时,混合方法才会变得更好。
{"title":"MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks","authors":"F. Cappello, D. Etiemble","doi":"10.1109/SC.2000.10001","DOIUrl":"https://doi.org/10.1109/SC.2000.10001","url":null,"abstract":"The hybrid memory model of clusters of multiprocessors raises two issues: programming model and performance. Many parallel programs have been written by using the MPI standard. To evaluate the pertinence of hybrid models for existing MPI codes, we compare a unified model (MPI) and a hybrid one (OpenMP fine grain parallelization after profiling) for the NAS 2.3 benchmarks on two IBM SP systems. The superiority of one model depends on 1) the level of shared memory model parallelization, 2) the communication patterns and 3) the memory access patterns. The relative speeds of the main architecture components (CPU, memory, and network) are of tremendous importance for selecting one model. With the used hybrid model, our results show that a unified MPI approach is better for most of the benchmarks. The hybrid approach becomes better only when fast processors make the communication performance significant and the level of parallelization is sufficient.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132626546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 303
Tiling Optimizations for 3D Scientific Computations 3D科学计算的平铺优化
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10015
Gabriel Rivera, C. Tseng
Compiler transformations can significantly improve data locality for many scientific programs. In this paper, we show iterative solvers for partial differential equations (PDEs) in three dimensions require new compiler optimizations not needed for 2D codes, since reuse along the third dimension cannot fit in cachefor larger problem sizes. Tiling is a program transformation compilers can apply to capture this reuse, but successful application of tiling requires selection of non-conflicting tiles and/or padding array dimensions to eliminate conflicts. We present new algorithms and cost models for selecting tiling shapes and array pads. We explain why tiling is rarely needed for 2D PDE solvers, but can be helpful for 3D stencil codes. Experimental results show tiling 3D codes can reduce miss rates and achieve performance improvements of 17-121% for key scientific kernels, including a 27% average improvement for the key computational loop nest in the SPEC/NAS benchmark MGRID.
编译器转换可以显著改善许多科学程序的数据局部性。在本文中,我们展示了三维偏微分方程(PDEs)的迭代求解器需要新的编译器优化,而不需要二维代码,因为沿着第三维的重用无法适应更大问题规模的缓存。平铺是一种程序转换,编译器可以应用它来捕获这种重用,但是平铺的成功应用需要选择不冲突的平铺和/或填充数组维度来消除冲突。我们提出了新的算法和成本模型来选择平铺形状和阵列垫。我们解释了为什么平铺很少需要二维PDE求解器,但可以帮助3D模板代码。实验结果表明,平铺3D代码可以降低关键科学内核的缺失率,并实现17-121%的性能提升,其中在SPEC/NAS基准MGRID中关键计算环路巢平均提高27%。
{"title":"Tiling Optimizations for 3D Scientific Computations","authors":"Gabriel Rivera, C. Tseng","doi":"10.1109/SC.2000.10015","DOIUrl":"https://doi.org/10.1109/SC.2000.10015","url":null,"abstract":"Compiler transformations can significantly improve data locality for many scientific programs. In this paper, we show iterative solvers for partial differential equations (PDEs) in three dimensions require new compiler optimizations not needed for 2D codes, since reuse along the third dimension cannot fit in cachefor larger problem sizes. Tiling is a program transformation compilers can apply to capture this reuse, but successful application of tiling requires selection of non-conflicting tiles and/or padding array dimensions to eliminate conflicts. We present new algorithms and cost models for selecting tiling shapes and array pads. We explain why tiling is rarely needed for 2D PDE solvers, but can be helpful for 3D stencil codes. Experimental results show tiling 3D codes can reduce miss rates and achieve performance improvements of 17-121% for key scientific kernels, including a 27% average improvement for the key computational loop nest in the SPEC/NAS benchmark MGRID.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132302719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 259
PM2: High Performance Communication Middleware for Heterogeneous Network Environments 面向异构网络环境的高性能通信中间件
Pub Date : 2000-11-01 DOI: 10.1109/SC.2000.10013
Toshiyuki Takahashi, S. Sumimoto, A. Hori, H. Harada, Y. Ishikawa
This paper introduces a high performance communication middle layer, called PM2, for hetero-geneous network environments. PM2 currently supports Myrinet, Ethernet, and SMP. Binary code written in PM2 or written in a communication library, such as MPICH-SCore on top of PM2, may run on any combination of those networks without re-compilation. According to a set of NAS parallel benchmark results, MPICH-SCore performance is better than dedicated communication libraries such as MPICH-BIP/SMP and MPICH-GM when running some benchmark programs.
本文介绍了面向异构网络环境的高性能通信中间层PM2。PM2目前支持Myrinet、Ethernet和SMP。用PM2编写的二进制代码或用通信库(如PM2之上的MPICH-SCore)编写的二进制代码可以在这些网络的任何组合上运行,而无需重新编译。根据一组NAS并行基准测试结果,在运行一些基准测试程序时,MPICH-SCore的性能优于MPICH-BIP/SMP和MPICH-GM等专用通信库。
{"title":"PM2: High Performance Communication Middleware for Heterogeneous Network Environments","authors":"Toshiyuki Takahashi, S. Sumimoto, A. Hori, H. Harada, Y. Ishikawa","doi":"10.1109/SC.2000.10013","DOIUrl":"https://doi.org/10.1109/SC.2000.10013","url":null,"abstract":"This paper introduces a high performance communication middle layer, called PM2, for hetero-geneous network environments. PM2 currently supports Myrinet, Ethernet, and SMP. Binary code written in PM2 or written in a communication library, such as MPICH-SCore on top of PM2, may run on any combination of those networks without re-compilation. According to a set of NAS parallel benchmark results, MPICH-SCore performance is better than dedicated communication libraries such as MPICH-BIP/SMP and MPICH-GM when running some benchmark programs.","PeriodicalId":228250,"journal":{"name":"ACM/IEEE SC 2000 Conference (SC'00)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130318317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
期刊
ACM/IEEE SC 2000 Conference (SC'00)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1