动态经纱调整:高性能SIMT的分析与效益

Ahmad Lashgar, A. Baniasadi, A. Khonsari
{"title":"动态经纱调整:高性能SIMT的分析与效益","authors":"Ahmad Lashgar, A. Baniasadi, A. Khonsari","doi":"10.1109/ICCD.2012.6378694","DOIUrl":null,"url":null,"abstract":"Modern GPUs synchronize threads grouped in warps. The number of threads included in each warp (or warp size) affects divergence, synchronization overhead, and the efficiency of memory access coalescing. Small warps reduce the performance penalty associated with branch and memory divergence at the expense of a reduction in memory coalescing. Large warps enhance memory coalescing significantly but also increase branch and memory divergence. Dynamic workload behavior, including branch/memory divergence and coalescing, is an important factor in determining the warp size returning best performance. Based on this observation, we propose Dynamic Warp Resizing (DWR). DWR outperforms static warp size decisions, up to 2.28X.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Dynamic warp resizing: Analysis and benefits in high-performance SIMT\",\"authors\":\"Ahmad Lashgar, A. Baniasadi, A. Khonsari\",\"doi\":\"10.1109/ICCD.2012.6378694\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern GPUs synchronize threads grouped in warps. The number of threads included in each warp (or warp size) affects divergence, synchronization overhead, and the efficiency of memory access coalescing. Small warps reduce the performance penalty associated with branch and memory divergence at the expense of a reduction in memory coalescing. Large warps enhance memory coalescing significantly but also increase branch and memory divergence. Dynamic workload behavior, including branch/memory divergence and coalescing, is an important factor in determining the warp size returning best performance. Based on this observation, we propose Dynamic Warp Resizing (DWR). DWR outperforms static warp size decisions, up to 2.28X.\",\"PeriodicalId\":313428,\"journal\":{\"name\":\"2012 IEEE 30th International Conference on Computer Design (ICCD)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 30th International Conference on Computer Design (ICCD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2012.6378694\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 30th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2012.6378694","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

现代gpu同步线程分组在扭曲。每个warp中包含的线程数(或warp大小)会影响散度、同步开销和内存访问合并的效率。较小的翘曲减少了与分支和内存发散相关的性能损失,但代价是内存合并的减少。大翘曲显著增强了内存聚合,但也增加了分支和内存的发散。动态工作负载行为,包括分支/内存发散和合并,是决定翘曲大小以获得最佳性能的重要因素。基于这一观察,我们提出动态调整经纱大小(DWR)。DWR优于静态经纱大小决定,高达2.28倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Dynamic warp resizing: Analysis and benefits in high-performance SIMT
Modern GPUs synchronize threads grouped in warps. The number of threads included in each warp (or warp size) affects divergence, synchronization overhead, and the efficiency of memory access coalescing. Small warps reduce the performance penalty associated with branch and memory divergence at the expense of a reduction in memory coalescing. Large warps enhance memory coalescing significantly but also increase branch and memory divergence. Dynamic workload behavior, including branch/memory divergence and coalescing, is an important factor in determining the warp size returning best performance. Based on this observation, we propose Dynamic Warp Resizing (DWR). DWR outperforms static warp size decisions, up to 2.28X.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Oblivious routing design for mesh networks to achieve a new worst-case throughput bound WaveSync: A low-latency source synchronous bypass network-on-chip architecture Integration of correct-by-construction BIP models into the MetroII design space exploration flow Dynamic phase-based tuning for embedded systems using phase distance mapping A comparative study of wearout mechanisms in state-of-art microprocessors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1