Improving Performance of Structured-Memory, Data-Intensive Applications on Multi-core Platforms via a Space-Filling Curve Memory Layout

E. W. Bethel, David Camp, D. Donofrio, Mark Howison
{"title":"Improving Performance of Structured-Memory, Data-Intensive Applications on Multi-core Platforms via a Space-Filling Curve Memory Layout","authors":"E. W. Bethel, David Camp, D. Donofrio, Mark Howison","doi":"10.1109/IPDPSW.2015.71","DOIUrl":null,"url":null,"abstract":"Many data-intensive algorithms -- particularly in visualization, image processing, and data analysis -- operate on structured data, that is, data organized in multidimensional arrays. While many of these algorithms are quite numerically intensive, by and large, their performance is limited by the cost of memory accesses. As we move towards the exascale regime of computing, one central research challenge is finding ways to minimize data movement through the memory hierarchy, particularly within a node in a shared-memory parallel setting. We study the effects that an alternative in-memory data layout format has in terms of runtime performance gains resulting from reducing the amount of data moved through the memory hierarchy. We focus the study on shared-memory parallel implementations of two algorithms common in visualization and analysis: a stencil-based convolution kernel, which uses a structured memory access pattern, and ray casting volume rendering, which uses a semi-structured memory access pattern. The question we study is to better understand to what degree an alternative memory layout, when used by these key algorithms, will result in improved runtime performance and memory system utilization. Our approach uses a layout based on a Z-order (Morton-order) space-filling curve data organization, and we measure and report runtime and various metrics and counters associated with memory system utilization. Our results show nearly uniform improved runtime performance and improved utilization of the memory hierarchy across varying levels of concurrency the applications we tested. This approach is complementary to other memory optimization strategies like cache blocking, but may also be more general and widely applicable to a diverse set of applications.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2015.71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Many data-intensive algorithms -- particularly in visualization, image processing, and data analysis -- operate on structured data, that is, data organized in multidimensional arrays. While many of these algorithms are quite numerically intensive, by and large, their performance is limited by the cost of memory accesses. As we move towards the exascale regime of computing, one central research challenge is finding ways to minimize data movement through the memory hierarchy, particularly within a node in a shared-memory parallel setting. We study the effects that an alternative in-memory data layout format has in terms of runtime performance gains resulting from reducing the amount of data moved through the memory hierarchy. We focus the study on shared-memory parallel implementations of two algorithms common in visualization and analysis: a stencil-based convolution kernel, which uses a structured memory access pattern, and ray casting volume rendering, which uses a semi-structured memory access pattern. The question we study is to better understand to what degree an alternative memory layout, when used by these key algorithms, will result in improved runtime performance and memory system utilization. Our approach uses a layout based on a Z-order (Morton-order) space-filling curve data organization, and we measure and report runtime and various metrics and counters associated with memory system utilization. Our results show nearly uniform improved runtime performance and improved utilization of the memory hierarchy across varying levels of concurrency the applications we tested. This approach is complementary to other memory optimization strategies like cache blocking, but may also be more general and widely applicable to a diverse set of applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过空间填充曲线内存布局提高多核平台上结构化内存、数据密集型应用程序的性能
许多数据密集型算法——特别是在可视化、图像处理和数据分析方面——操作结构化数据,即组织在多维数组中的数据。虽然这些算法中的许多都是数字密集型的,但总的来说,它们的性能受到内存访问成本的限制。随着我们向百亿级计算体系迈进,一个核心的研究挑战是找到最小化内存层次结构中的数据移动的方法,特别是在共享内存并行设置中的节点内。我们研究了另一种内存数据布局格式在运行时性能提升方面的影响,因为它减少了通过内存层次结构移动的数据量。我们重点研究了可视化和分析中常见的两种算法的共享内存并行实现:基于模板的卷积核,它使用结构化内存访问模式,以及射线投射体渲染,它使用半结构化内存访问模式。我们研究的问题是更好地理解,当这些关键算法使用替代内存布局时,将在多大程度上提高运行时性能和内存系统利用率。我们的方法使用基于Z-order (Morton-order)空间填充曲线数据组织的布局,我们测量和报告运行时以及与内存系统利用率相关的各种指标和计数器。我们的结果显示,在我们测试的应用程序的不同并发级别上,运行时性能和内存层次结构的利用率几乎一致地得到了改进。这种方法是对其他内存优化策略(如缓存阻塞)的补充,但也可能更通用,更广泛地适用于各种应用程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Accelerating Large-Scale Single-Source Shortest Path on FPGA Relocation-Aware Floorplanning for Partially-Reconfigurable FPGA-Based Systems iWAPT Introduction and Committees Computing the Pseudo-Inverse of a Graph's Laplacian Using GPUs Optimizing Defensive Investments in Energy-Based Cyber-Physical Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1