Deterministic scale-free pipeline parallelism with hyperqueues

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2013-11-17 DOI:10.1145/2503210.2503233

H. Vandierendonck, Kallia Chronaki, Dimitrios S. Nikolopoulos

{"title":"Deterministic scale-free pipeline parallelism with hyperqueues","authors":"H. Vandierendonck, Kallia Chronaki, Dimitrios S. Nikolopoulos","doi":"10.1145/2503210.2503233","DOIUrl":null,"url":null,"abstract":"Ubiquitous parallel computing aims to make parallel programming accessible to a wide variety of programming areas using deterministic and scale-free programming models built on a task abstraction. However, it remains hard to reconcile these attributes with pipeline parallelism, where the number of pipeline stages is typically hard-coded in the program and defines the degree of parallelism. This paper introduces hyperqueues, a programming abstraction that enables the construction of deterministic and scale-free pipeline parallel programs. Hyperqueues extend the concept of Cilk++ hyperobjects to provide thread-local views on a shared data structure. While hyperobjects are organized around private local views, hyperqueues require shared concurrent views on the underlying data structure. We define the semantics of hyperqueues and describe their implementation in a work-stealing scheduler. We demonstrate scalable performance on pipeline-parallel PARSEC benchmarks and find that hyperqueues provide comparable or up to 30% better performance than POSIX threads and Intel's Threading Building Blocks. The latter are highly tuned to the number of available processing cores, while programs using hyperqueues are scale-free.","PeriodicalId":371074,"journal":{"name":"2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2503210.2503233","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Ubiquitous parallel computing aims to make parallel programming accessible to a wide variety of programming areas using deterministic and scale-free programming models built on a task abstraction. However, it remains hard to reconcile these attributes with pipeline parallelism, where the number of pipeline stages is typically hard-coded in the program and defines the degree of parallelism. This paper introduces hyperqueues, a programming abstraction that enables the construction of deterministic and scale-free pipeline parallel programs. Hyperqueues extend the concept of Cilk++ hyperobjects to provide thread-local views on a shared data structure. While hyperobjects are organized around private local views, hyperqueues require shared concurrent views on the underlying data structure. We define the semantics of hyperqueues and describe their implementation in a work-stealing scheduler. We demonstrate scalable performance on pipeline-parallel PARSEC benchmarks and find that hyperqueues provide comparable or up to 30% better performance than POSIX threads and Intel's Threading Building Blocks. The latter are highly tuned to the number of available processing cores, while programs using hyperqueues are scale-free.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有超队列的确定性无标度管道并行性

无所不在的并行计算旨在使用基于任务抽象的确定性和无标度编程模型，使并行编程可用于各种编程领域。然而，很难将这些属性与管道并行性相协调，管道阶段的数量通常在程序中硬编码，并定义并行度。本文介绍了超队列，它是一种编程抽象，可以构造确定性和无标度的管道并行程序。超队列扩展了cilk++超对象的概念，在共享数据结构上提供线程本地视图。虽然超对象是围绕私有本地视图组织的，但超队列需要底层数据结构上的共享并发视图。我们定义了超队列的语义，并描述了它们在工作窃取调度器中的实现。我们在管道并行的PARSEC基准测试中演示了可扩展的性能，并发现超队列提供的性能比POSIX线程和英特尔的线程构建块高出30%。后者对可用处理内核的数量进行了高度调优，而使用超队列的程序是无标度的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

自引率

0.00%

发文量