Investigating the performance and productivity of DASH using the Cowichan problems

Proceedings of Workshops of HPC Asia Pub Date : 2018-01-31 DOI:10.1145/3176364.3176366

K. Fürlinger, R. Kowalewski, Tobias Fuchs, Benedikt Lehmann

{"title":"Investigating the performance and productivity of DASH using the Cowichan problems","authors":"K. Fürlinger, R. Kowalewski, Tobias Fuchs, Benedikt Lehmann","doi":"10.1145/3176364.3176366","DOIUrl":null,"url":null,"abstract":"DASH is a new realization of the PGAS (Partitioned Global Address Space) programming model in the form of a C++ template library. Instead of using a custom compiler, DASH provides expressive programming constructs using C++ abstraction mechanisms and offers distributed data structures and parallel algorithms that follow the concepts employed by the C++ standard template library (STL). In this paper we evaluate the performance and productivity of DASH by comparing our implementation of a set of benchmark programs with those developed by expert programmers in Intel Cilk, Intel TBB (Threading Building Blocks), Go and Chapel. We perform a comparison on shared memory multiprocessor systems ranging from moderately parallel multicore systems to a 64-core manycore system. We additionally perform a scalability study on a distributed memory system on up to 20 nodes (800 cores). Our results demonstrate that DASH offers productivity that is comparable with the best established programming systems for shared memory and also achieves comparable or better performance. Our results on multi-node systems show that DASH scales well and achieves excellent performance.","PeriodicalId":371083,"journal":{"name":"Proceedings of Workshops of HPC Asia","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Workshops of HPC Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3176364.3176366","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

DASH is a new realization of the PGAS (Partitioned Global Address Space) programming model in the form of a C++ template library. Instead of using a custom compiler, DASH provides expressive programming constructs using C++ abstraction mechanisms and offers distributed data structures and parallel algorithms that follow the concepts employed by the C++ standard template library (STL). In this paper we evaluate the performance and productivity of DASH by comparing our implementation of a set of benchmark programs with those developed by expert programmers in Intel Cilk, Intel TBB (Threading Building Blocks), Go and Chapel. We perform a comparison on shared memory multiprocessor systems ranging from moderately parallel multicore systems to a 64-core manycore system. We additionally perform a scalability study on a distributed memory system on up to 20 nodes (800 cores). Our results demonstrate that DASH offers productivity that is comparable with the best established programming systems for shared memory and also achieves comparable or better performance. Our results on multi-node systems show that DASH scales well and achieves excellent performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用coichan问题研究DASH的性能和生产率

DASH是以c++模板库的形式实现PGAS (Partitioned Global Address Space)编程模型。DASH没有使用自定义编译器，而是使用c++抽象机制提供表达性编程结构，并提供遵循c++标准模板库(STL)概念的分布式数据结构和并行算法。在本文中，我们通过将我们实现的一组基准程序与专家程序员在英特尔Cilk、英特尔TBB(线程构建块)、Go和Chapel中开发的程序进行比较，来评估DASH的性能和生产力。我们对共享内存多处理器系统进行了比较，从中等并行多核系统到64核多核系统。我们还在一个多达20个节点(800核)的分布式内存系统上进行了可扩展性研究。我们的研究结果表明，DASH提供的生产力可与现有的最佳共享内存编程系统相媲美，并且还实现了相当或更好的性能。我们在多节点系统上的实验结果表明，DASH具有良好的可扩展性和优异的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of Workshops of HPC Asia

自引率

0.00%

发文量

期刊最新文献

Scaling collectives on large clusters using Intel(R) architecture processors and fabric OpenMP-based parallel implementation of matrix-matrix multiplication on the intel knights landing Recent experiences in using MPI-3 RMA in the DASH PGAS runtime Optimizing a particle-in-cell code on Intel knights landing Towards a parallel algebraic multigrid solver using PGAS