DRAW: investigating benefits of adaptive fetch group size on GPU

M. Yoon, Yunho Oh, Sangpil Lee, Seung-Hun Kim, Deokho Kim, W. Ro
{"title":"DRAW: investigating benefits of adaptive fetch group size on GPU","authors":"M. Yoon, Yunho Oh, Sangpil Lee, Seung-Hun Kim, Deokho Kim, W. Ro","doi":"10.1109/ISPASS.2015.7095804","DOIUrl":null,"url":null,"abstract":"Previously, hiding operation stalls is one of the important issues to suppress performance degradation of Graphics Processing Units (GPUs). In this paper, we first conduct a detailed study of factors affecting the operation stalls in terms of the fetch group size on the warp scheduler. Throughout this paper, we find that the size of fetch group is highly involved in hiding various types of operation stalls. The short latency stalls can be hidden by issuing other available warps from the same fetch group. Therefore, the short latency stalls may not be hidden well under small sized fetch group since the group has the limited number of issuable warps to hide stalls. On the contrary, the long latency stalls can be hidden by dividing warps into multiple fetch groups. The scheduler switches the fetch groups when the warps in each fetch group reach the long latency memory operation point. Therefore, the stalls may not be hidden well at the large sized fetch group. Increasing the size of fetch group reduces the number of fetch groups to hide the stalls. In addition, the load/store unit stalls are caused by the limited hardware resources to handle the memory operations. To hide all these stalls effectively, we propose a Dynamic Resizing on Active Warps (DRAW) scheduler which adjusts the size of active fetch group. From the evaluation results, DRAW scheduler reduces an average of 16.3% of stall cycles and improves an average performance of 11.3% compared to the conventional two-level warp scheduler.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2015.7095804","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Previously, hiding operation stalls is one of the important issues to suppress performance degradation of Graphics Processing Units (GPUs). In this paper, we first conduct a detailed study of factors affecting the operation stalls in terms of the fetch group size on the warp scheduler. Throughout this paper, we find that the size of fetch group is highly involved in hiding various types of operation stalls. The short latency stalls can be hidden by issuing other available warps from the same fetch group. Therefore, the short latency stalls may not be hidden well under small sized fetch group since the group has the limited number of issuable warps to hide stalls. On the contrary, the long latency stalls can be hidden by dividing warps into multiple fetch groups. The scheduler switches the fetch groups when the warps in each fetch group reach the long latency memory operation point. Therefore, the stalls may not be hidden well at the large sized fetch group. Increasing the size of fetch group reduces the number of fetch groups to hide the stalls. In addition, the load/store unit stalls are caused by the limited hardware resources to handle the memory operations. To hide all these stalls effectively, we propose a Dynamic Resizing on Active Warps (DRAW) scheduler which adjusts the size of active fetch group. From the evaluation results, DRAW scheduler reduces an average of 16.3% of stall cycles and improves an average performance of 11.3% compared to the conventional two-level warp scheduler.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DRAW:在GPU上研究自适应获取组大小的好处
隐藏操作停顿是抑制图形处理器性能下降的重要问题之一。在本文中,我们首先详细研究了影响经纱调度程序上读取组大小的操作延迟因素。在本文中,我们发现获取组的大小与隐藏各种类型的操作停顿高度相关。可以通过发出来自同一获取组的其他可用翘曲来隐藏短延迟延迟。因此,在较小的fetch组中,由于组中可发布的warp数量有限,因此可能无法很好地隐藏短延迟摊位。相反,可以通过将warp划分为多个fetch组来隐藏长时间的延迟。当每个提取组中的翘曲到达长延迟内存操作点时,调度器切换提取组。因此,摊位可能不会隐藏在大型取回组。增加取物组的大小可以减少取物组的数量以隐藏摊位。此外,处理内存操作的硬件资源有限导致了加载/存储单元的停顿。为了有效地隐藏所有这些延迟,我们提出了一个动态调整主动抓取(DRAW)调度程序,它可以调整活动抓取组的大小。从评估结果来看,与传统的两级翘曲调度器相比,DRAW调度器平均减少了16.3%的失速周期,平均性能提高了11.3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Graph Processing Platforms at Scale: Practices and Experiences Self-monitoring overhead of the Linux perf_ event performance counter interface Analyzing communication models for distributed thread-collaborative processors in terms of energy and time A full-system approach to analyze the impact of next-generation mobile flash storage Graph-matching-based simulation-region selection for multiple binaries
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1