Accelerating multicore reuse distance analysis with sampling and parallelization

Derek L. Schuff, Milind Kulkarni, Vijay S. Pai
{"title":"Accelerating multicore reuse distance analysis with sampling and parallelization","authors":"Derek L. Schuff, Milind Kulkarni, Vijay S. Pai","doi":"10.1145/1854273.1854286","DOIUrl":null,"url":null,"abstract":"Reuse distance analysis is a well-established tool for predicting cache performance, driving compiler optimizations, and assisting visualization and manual optimization of programs. Existing reuse distance analysis methods either do not account for the effects of multithreading, or suffer severe performance penalties. This paper presents a sampled, parallelized method of measuring reuse distance proiles for multithreaded programs, modeling private and shared cache configurations. The sampling technique allows it to spend much of its execution in a fast low-overhead mode, and allows the use of a new measurement method since sampled analysis does not need to consider the full state of the reuse stack. This measurement method uses O(1) data structures that may be made thread-private, allowing parallelization to reduce overhead in analysis mode. The performance of the resulting system is analyzed for a diverse set of parallel benchmarks and shown to generate accurate output compared to non-sampled full analysis as well as good results for the common application of locating low-locality code in the benchmarks, all with a performance overhead comparable to the best single-threaded analysis techniques.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"113","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1854273.1854286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 113

Abstract

Reuse distance analysis is a well-established tool for predicting cache performance, driving compiler optimizations, and assisting visualization and manual optimization of programs. Existing reuse distance analysis methods either do not account for the effects of multithreading, or suffer severe performance penalties. This paper presents a sampled, parallelized method of measuring reuse distance proiles for multithreaded programs, modeling private and shared cache configurations. The sampling technique allows it to spend much of its execution in a fast low-overhead mode, and allows the use of a new measurement method since sampled analysis does not need to consider the full state of the reuse stack. This measurement method uses O(1) data structures that may be made thread-private, allowing parallelization to reduce overhead in analysis mode. The performance of the resulting system is analyzed for a diverse set of parallel benchmarks and shown to generate accurate output compared to non-sampled full analysis as well as good results for the common application of locating low-locality code in the benchmarks, all with a performance overhead comparable to the best single-threaded analysis techniques.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用采样和并行化加速多核重用距离分析
重用距离分析是一种完善的工具,用于预测缓存性能、驱动编译器优化以及协助程序的可视化和手动优化。现有的重用距离分析方法要么没有考虑到多线程的影响,要么会遭受严重的性能损失。本文提出了一种采样、并行的方法来测量多线程程序的重用距离概要,并对私有和共享缓存配置进行建模。采样技术允许它在快速低开销模式下执行大部分时间,并且允许使用一种新的测量方法,因为采样分析不需要考虑重用堆栈的完整状态。这种测量方法使用O(1)个数据结构,可以将其设置为线程私有,从而允许并行化以减少分析模式中的开销。结果系统的性能在不同的并行基准测试中进行了分析,并显示与非采样的完整分析相比,生成了准确的输出,以及在基准测试中定位低局域代码的常见应用程序的良好结果,所有这些都具有与最佳单线程分析技术相当的性能开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Reducing task creation and termination overhead in explicitly parallel programs An intra-tile cache set balancing scheme NUcache: A multicore cache organization based on Next-Use distance Towards a science of parallel programming Discovering and understanding performance bottlenecks in transactional applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1