GPUMech: GPU Performance Modeling Technique Based on Interval Analysis

Jen-Cheng Huang, Joo Hwan Lee, Hyesoon Kim, H. Lee
{"title":"GPUMech: GPU Performance Modeling Technique Based on Interval Analysis","authors":"Jen-Cheng Huang, Joo Hwan Lee, Hyesoon Kim, H. Lee","doi":"10.1109/MICRO.2014.59","DOIUrl":null,"url":null,"abstract":"GPU has become a first-order computing plat-form. Nonetheless, not many performance modeling techniques have been developed for architecture studies. Several GPU analytical performance models have been proposed, but they mostly target application optimizations rather than the study of different architecture design options. Interval analysis is a relatively accurate performance modeling technique, which traverses the instruction trace and uses functional simulators, e.g., Cache simulator, to track the stall events that cause performance loss. It shows hundred times of speedup compared to detailed timing simulations and better accuracy compared to pure analytical models. However, previous techniques are limited to CPUs and not applicable to multithreaded architectures. In this work, we propose GPU Mech, an interval analysis-based performance modeling technique for GPU architectures. GPU Mech models multithreading and resource contentions caused by memory divergence. We compare GPU Mech with a detailed timing simulator and show that on average, GPU Mechhas 13.2% error for modeling the round-robin scheduling policy and 14.0% error for modeling the greedy-then-oldest policy while achieving a 97x faster simulation speed. In addition, GPU Mech generates CPI stacks, which help hardware/software developers to visualize performance bottlenecks of a kernel.","PeriodicalId":6591,"journal":{"name":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","volume":"9 1","pages":"268-279"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MICRO.2014.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 44

Abstract

GPU has become a first-order computing plat-form. Nonetheless, not many performance modeling techniques have been developed for architecture studies. Several GPU analytical performance models have been proposed, but they mostly target application optimizations rather than the study of different architecture design options. Interval analysis is a relatively accurate performance modeling technique, which traverses the instruction trace and uses functional simulators, e.g., Cache simulator, to track the stall events that cause performance loss. It shows hundred times of speedup compared to detailed timing simulations and better accuracy compared to pure analytical models. However, previous techniques are limited to CPUs and not applicable to multithreaded architectures. In this work, we propose GPU Mech, an interval analysis-based performance modeling technique for GPU architectures. GPU Mech models multithreading and resource contentions caused by memory divergence. We compare GPU Mech with a detailed timing simulator and show that on average, GPU Mechhas 13.2% error for modeling the round-robin scheduling policy and 14.0% error for modeling the greedy-then-oldest policy while achieving a 97x faster simulation speed. In addition, GPU Mech generates CPI stacks, which help hardware/software developers to visualize performance bottlenecks of a kernel.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GPUMech:基于区间分析的GPU性能建模技术
GPU已经成为一级计算平台。尽管如此,为架构研究开发的性能建模技术并不多。已经提出了几种GPU分析性能模型,但它们主要针对应用程序优化,而不是研究不同的架构设计选项。间隔分析是一种相对准确的性能建模技术,它遍历指令跟踪并使用功能模拟器(例如Cache模拟器)来跟踪导致性能损失的失速事件。与详细的时序模拟相比,它显示了数百倍的加速,与纯分析模型相比,它具有更好的准确性。但是,以前的技术仅限于cpu,不适用于多线程体系结构。在这项工作中,我们提出了GPU Mech,这是一种基于间隔分析的GPU架构性能建模技术。GPU力学模型多线程和资源争用引起的内存发散。我们将GPU Mech与详细的时序模拟器进行比较,结果表明,平均而言,GPU Mech在建模轮询调度策略时的误差为13.2%,在建模贪婪-最老策略时的误差为14.0%,而仿真速度提高了97倍。此外,GPU Mech生成CPI堆栈,这有助于硬件/软件开发人员可视化内核的性能瓶颈。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Specializing Compiler Optimizations through Programmable Composition for Dense Matrix Computations Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers Equalizer: Dynamic Tuning of GPU Resources for Efficient Execution Harnessing Soft Computations for Low-Budget Fault Tolerance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1