2016 PGAS Applications Workshop (PAW)最新文献

英文中文

OpenSHMEM Non-blocking Data Movement Operations with MVAPICH2-X: Early Experiences OpenSHMEM非阻塞数据移动操作与MVAPICH2-X:早期经验

2016 PGAS Applications Workshop (PAW)

Pub Date : 2016-11-13 DOI: 10.1109/PAW.2016.7

Khaled Hamidouche, Jie Zhang, D. Panda, K. Tomko

PGAS models with a lightweight synchronization and shared memory abstraction, are seen as a good alternative to the Message Passing model for irregular communication patterns. OpenSHMEM is a library based PGAS model. OpenSHMEM 1.3 introduced Non-Blocking data movement operations to provide better asynchronous progress and overlap. In this paper, we present our experiences in designing Non-Blocking Put and Get operations on InfiniBand systems. Using the MVAPICH2-X runtime, we present the alternative designs for intra-node and inter-node operations. We also present a set of new benchmarks to analyze the latency, message rate performance, and communication/computation overlap benefits. The performance evaluation shows 7X improvement in the message rate. Furthermore, using a 3D-Stencil based application kernel, we assess the benefits of OpenSHMEM Non-Blocking extensions. We show 50% and 28% improvement on 27 and 64 processes, respectively.

具有轻量级同步和共享内存抽象的PGAS模型被视为不规则通信模式的消息传递模型的良好替代方案。OpenSHMEM是一个基于库的PGAS模型。OpenSHMEM 1.3引入了非阻塞数据移动操作，以提供更好的异步进程和重叠。在本文中，我们介绍了在InfiniBand系统上设计无阻塞的Put和Get操作的经验。使用MVAPICH2-X运行时，我们提出了节点内和节点间操作的替代设计。我们还提供了一组新的基准来分析延迟、消息速率性能和通信/计算重叠的好处。性能评估显示消息速率提高了7倍。此外，使用基于3d模板的应用程序内核，我们评估了OpenSHMEM非阻塞扩展的好处。我们分别在27个和64个过程中显示了50%和28%的改进。

引用次数: 0

Application of PGAS Programming to Power Grid Simulation PGAS编程在电网仿真中的应用

2016 PGAS Applications Workshop (PAW)

Pub Date : 2016-11-13 DOI: 10.1109/PAW.2016.10

B. Palmer

This paper will describe the application of the PGAS Global Arrays (GA) library to power grid simulations. The GridPACK™ framework has been designed to enable power grid engineers to develop parallel simulations of the power grid by providing a set of templates and libraries that encapsulate most of the details of parallel programming in higher level abstractions. The communication portions of the framework are implemented using a combination of message-passing (MPI) and one-sided communication (GA). This paper will provide a brief overview of GA and describe in detail the implementation of collective hash tables, which are used in many power grid applications to match data with a previously distributed network.

本文将介绍PGAS全局阵列库在电网仿真中的应用。GridPACK™框架旨在使电网工程师能够通过提供一组模板和库来开发电网的并行模拟，这些模板和库将并行编程的大部分细节封装在更高层次的抽象中。框架的通信部分使用消息传递(MPI)和单向通信(GA)的组合来实现。本文将提供遗传算法的简要概述，并详细描述集合哈希表的实现，它在许多电网应用中用于与先前分布式网络匹配数据。

引用次数: 0

Optimizing PGAS Overhead in a Multi-locale Chapel Implementation of CoMD 在CoMD的多场所Chapel实现中优化PGAS开销

2016 PGAS Applications Workshop (PAW)

Pub Date : 2016-11-13 DOI: 10.1109/PAW.2016.9

Riyaz Haque, D. Richards

Chapel supports distributed computing with an underlying PGAS memory address space. While it provides abstractions for writing simple and elegant distributed code, the type system currently lacks a notion of locality i.e. a description of an object's access behavior in relation to its actual location. This often necessitates programmer intervention to avoid redundant non-local data access. Moreover, due to insufficient locality information the compiler ends up using “wide” pointers—that can point to non-local data—for objects referenced in an otherwise completely local manner, adding to the runtime overhead.In this work we describe CoMD-Chapel, our distributed Chapel implementation of the CoMD benchmark. We demonstrate that optimizing data access through replication and localization is crucial for achieving performance comparable to the reference implementation. We discuss limitations of existing scope-based locality optimizations and argue instead for a more general (and robust) type-based approach. Lastly, we also evaluate code performance and scaling characteristics. The fully optimized version of CoMD-Chapel can perform to within 62%–87% of the reference implementation.

Chapel支持分布式计算与底层PGAS内存地址空间。虽然它为编写简单而优雅的分布式代码提供了抽象，但类型系统目前缺乏局部性的概念，即根据对象的实际位置描述对象的访问行为。这通常需要程序员进行干预，以避免冗余的非本地数据访问。此外，由于局部信息不足，编译器最终会使用“宽”指针——它可以指向非局部数据——来处理以完全局部方式引用的对象，这增加了运行时开销。在这项工作中，我们描述了CoMD-Chapel，我们的CoMD基准的分布式Chapel实现。我们证明，通过复制和本地化优化数据访问对于实现与参考实现相当的性能至关重要。我们讨论了现有的基于作用域的局部性优化的局限性，并提出了一种更通用(和健壮)的基于类型的方法。最后，我们还评估了代码性能和扩展特性。完全优化版本的CoMD-Chapel的性能可以达到参考实现的62%-87%。

{"title":"Optimizing PGAS Overhead in a Multi-locale Chapel Implementation of CoMD","authors":"Riyaz Haque, D. Richards","doi":"10.1109/PAW.2016.9","DOIUrl":"https://doi.org/10.1109/PAW.2016.9","url":null,"abstract":"Chapel supports distributed computing with an underlying PGAS memory address space. While it provides abstractions for writing simple and elegant distributed code, the type system currently lacks a notion of locality i.e. a description of an object's access behavior in relation to its actual location. This often necessitates programmer intervention to avoid redundant non-local data access. Moreover, due to insufficient locality information the compiler ends up using “wide” pointers—that can point to non-local data—for objects referenced in an otherwise completely local manner, adding to the runtime overhead.In this work we describe CoMD-Chapel, our distributed Chapel implementation of the CoMD benchmark. We demonstrate that optimizing data access through replication and localization is crucial for achieving performance comparable to the reference implementation. We discuss limitations of existing scope-based locality optimizations and argue instead for a more general (and robust) type-based approach. Lastly, we also evaluate code performance and scaling characteristics. The fully optimized version of CoMD-Chapel can perform to within 62%–87% of the reference implementation.","PeriodicalId":383847,"journal":{"name":"2016 PGAS Applications Workshop (PAW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115797313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Multi-scale CAFE Framework for Simulating Fracture in Heterogeneous Materials Implemented in Fortran Co-arrays and MPI 基于Fortran共阵列和MPI的非均质材料断裂模拟多尺度CAFE框架

2016 PGAS Applications Workshop (PAW)

Pub Date : 2016-11-13 DOI: 10.1109/PAW.2016.6

A. Shterenlikht, L. Margetts, J. D. Arregui-Mena, L. Cebamanos

Fortran coarrays have been used as an extension to the standard for over 20 years, mostly on Cray systems. Their appeal to users increased substantially when they were standardised in 2010. In this work we show that coarrays offer simple and intuitive data structures for 3D cellular automata (CA) modelling of material microstructures. We show how coarrays can be used together with an MPI finite element (FE) library to create a two-way concurrent hierarchical and scalable multi-scale CAFE deformation and fracture framework. Design of a coarray cellular automata microstructure evolution library CGPACK is described. A highly portable MPI FE library ParaFEM was used in this work. We show that independently CGPACK and ParaFEM programs can scale up well into tens of thousands of cores. Strong scaling of a hybrid ParaFEM/CGPACK MPI/coarray multi-scale framework was measured on an important solid mechanics practical example of a fracture of a steel round bar under tension. That program did not scale beyond 7 thousand cores. Excessive synchronisation might be one contributing factor to relatively poor scaling. Therefore we conclude with a comparative analysis of synchronisation requirements in MPI and coarray programs. Specific challenges of synchronising a coarray library are discussed.

Fortran阵列作为标准的扩展已经使用了20多年，主要是在Cray系统上。在2010年标准化后，它们对用户的吸引力大大增加。在这项工作中，我们表明，阵列为材料微结构的三维元胞自动机(CA)建模提供了简单直观的数据结构。我们展示了如何将同轴阵列与MPI有限元(FE)库一起使用，以创建双向并发分层和可扩展的多尺度CAFE变形和断裂框架。介绍了一个阵列元胞自动机微观结构演化库CGPACK的设计。在这项工作中使用了一个高度可移植的MPI有限元库ParaFEM。我们表明，独立的CGPACK和ParaFEM程序可以扩展到数万个核。在钢筋受拉断裂这一重要的固体力学实例上，测量了ParaFEM/CGPACK MPI/coarray混合多尺度框架的强标度。这个项目的规模没有超过7000个核。过度的同步可能是导致相对较差的可伸缩性的一个因素。因此，我们最后对MPI和同轴阵列程序中的同步要求进行了比较分析。讨论了同步同轴阵列库的具体挑战。

{"title":"Multi-scale CAFE Framework for Simulating Fracture in Heterogeneous Materials Implemented in Fortran Co-arrays and MPI","authors":"A. Shterenlikht, L. Margetts, J. D. Arregui-Mena, L. Cebamanos","doi":"10.1109/PAW.2016.6","DOIUrl":"https://doi.org/10.1109/PAW.2016.6","url":null,"abstract":"Fortran coarrays have been used as an extension to the standard for over 20 years, mostly on Cray systems. Their appeal to users increased substantially when they were standardised in 2010. In this work we show that coarrays offer simple and intuitive data structures for 3D cellular automata (CA) modelling of material microstructures. We show how coarrays can be used together with an MPI finite element (FE) library to create a two-way concurrent hierarchical and scalable multi-scale CAFE deformation and fracture framework. Design of a coarray cellular automata microstructure evolution library CGPACK is described. A highly portable MPI FE library ParaFEM was used in this work. We show that independently CGPACK and ParaFEM programs can scale up well into tens of thousands of cores. Strong scaling of a hybrid ParaFEM/CGPACK MPI/coarray multi-scale framework was measured on an important solid mechanics practical example of a fracture of a steel round bar under tension. That program did not scale beyond 7 thousand cores. Excessive synchronisation might be one contributing factor to relatively poor scaling. Therefore we conclude with a comparative analysis of synchronisation requirements in MPI and coarray programs. Specific challenges of synchronising a coarray library are discussed.","PeriodicalId":383847,"journal":{"name":"2016 PGAS Applications Workshop (PAW)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116525317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Experiences of Applying One-Sided Communication to Nearest-Neighbor Communication 单侧通信在最近邻通信中的应用经验

2016 PGAS Applications Workshop (PAW)

Pub Date : 2016-11-13 DOI: 10.1109/PAW.2016.8

H. Shan, Samuel Williams, Yili Zheng, Weiqun Zhang, Bei Wang, S. Ethier, Zhengji Zhao

Nearest-neighbor communication is one of the most important communication patterns appearing in many scientific applications. In this paper, we discuss the results of applying UPC++, a library-based partitioned global address space (PGAS) programming extension to C++, to an adaptive mesh framework (BoxLib), and a full scientific application GTC-P, whose communications are dominated by the nearest-neighbor communication. The results on a Cray XC40 system show that compared with the highly-tuned MPI two-sided implementations, UPC++ improves the communication performance up to 60% and 90% for BoxLib and GTC-P, respectively. We also implement the nearest-neighbor communication using MPI one-sided messages. The performance comparison demonstrates that the MPI one-sided implementation can also improve the communication performance over the two-sided version but not so significantly as UPC++ does.

最近邻通信是在许多科学应用中出现的最重要的通信方式之一。本文讨论了将基于库的分区全局地址空间(PGAS)编程扩展upc++应用于自适应网格框架(BoxLib)和以最近邻通信为主的完整科学应用GTC-P的结果。在Cray XC40系统上的结果表明，与高度调优的MPI双面实现相比，upc++在BoxLib和GTC-P上的通信性能分别提高了60%和90%。我们还使用MPI单侧消息实现了最近邻通信。性能比较表明，MPI单面实现也可以比双面版本提高通信性能，但不如upc++那样显著。

引用次数: 3

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 PGAS Applications Workshop (PAW)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀