Declarative Tuning for Locality in Parallel Programs

S. Chatterjee, Nick Vrvilo, Zoran Budimlic, K. Knobe, Vivek Sarkar
{"title":"Declarative Tuning for Locality in Parallel Programs","authors":"S. Chatterjee, Nick Vrvilo, Zoran Budimlic, K. Knobe, Vivek Sarkar","doi":"10.1109/ICPP.2016.58","DOIUrl":null,"url":null,"abstract":"Optimized placement of data and computation for locality is critical for improving performance and reducing energy consumption on modern computing systems. However, for most programming models, modifying data and computation placements typically requires rewriting large portions of the application, thereby posing a huge performance portability challenge in today's rapidly evolving architecture landscape. In this paper we present TunedCnC, a novel, declarative and flexible CnC tuning framework for controlling the spatial and temporal placement of data and computation by specifying hierarchical affinity groups and distribution functions. TunedCnC emphasizes a separation of concerns: the domain expert specifies a parallel application by defining data and control dependences, while the tuning expert specifies how the application should be executed on a given architecture - defining when and where for data and computation placement. The application remains unchanged when tuned for a different platform or towards different performance goals. We evaluate the utility of TunedCnC on several applications, and demonstrate that varying the tuning specification can have a significant impact on an application's performance. Our evaluation is performed using an implementation of the Concurrent Collections (CnC) declarative parallel programming model, but our results should be applicable to tuning of other data-flow task-parallel programming models as well.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 45th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2016.58","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Optimized placement of data and computation for locality is critical for improving performance and reducing energy consumption on modern computing systems. However, for most programming models, modifying data and computation placements typically requires rewriting large portions of the application, thereby posing a huge performance portability challenge in today's rapidly evolving architecture landscape. In this paper we present TunedCnC, a novel, declarative and flexible CnC tuning framework for controlling the spatial and temporal placement of data and computation by specifying hierarchical affinity groups and distribution functions. TunedCnC emphasizes a separation of concerns: the domain expert specifies a parallel application by defining data and control dependences, while the tuning expert specifies how the application should be executed on a given architecture - defining when and where for data and computation placement. The application remains unchanged when tuned for a different platform or towards different performance goals. We evaluate the utility of TunedCnC on several applications, and demonstrate that varying the tuning specification can have a significant impact on an application's performance. Our evaluation is performed using an implementation of the Concurrent Collections (CnC) declarative parallel programming model, but our results should be applicable to tuning of other data-flow task-parallel programming models as well.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
并行程序中局部性的声明调优
在现代计算系统中,为局部性优化数据和计算位置对于提高性能和降低能耗至关重要。然而,对于大多数编程模型,修改数据和计算位置通常需要重写应用程序的大部分,因此在当今快速发展的体系结构环境中提出了巨大的性能可移植性挑战。在本文中,我们提出了TunedCnC,一个新颖的,声明性的和灵活的CnC调优框架,通过指定层次亲和组和分布函数来控制数据和计算的空间和时间位置。TunedCnC强调关注点分离:领域专家通过定义数据和控制依赖关系来指定并行应用程序,而调优专家则指定应用程序应该如何在给定的体系结构上执行——定义数据和计算放置的时间和地点。当针对不同的平台或不同的性能目标进行调优时,应用程序保持不变。我们评估了TunedCnC在几个应用程序上的效用,并证明改变调优规范会对应用程序的性能产生重大影响。我们的计算是使用Concurrent Collections (CnC)声明性并行编程模型的实现执行的,但是我们的结果也应该适用于其他数据流任务并行编程模型的调优。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Parallel k-Means++ for Multiple Shared-Memory Architectures RCHC: A Holistic Runtime System for Concurrent Heterogeneous Computing Partial Flattening: A Compilation Technique for Irregular Nested Parallelism on GPGPUs Improving RAID Performance Using an Endurable SSD Cache PARVMEC: An Efficient, Scalable Implementation of the Variational Moments Equilibrium Code
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1