DRAW: investigating benefits of adaptive fetch group size on GPU

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI:10.1109/ISPASS.2015.7095804

M. Yoon, Yunho Oh, Sangpil Lee, Seung-Hun Kim, Deokho Kim, W. Ro

{"title":"DRAW: investigating benefits of adaptive fetch group size on GPU","authors":"M. Yoon, Yunho Oh, Sangpil Lee, Seung-Hun Kim, Deokho Kim, W. Ro","doi":"10.1109/ISPASS.2015.7095804","DOIUrl":null,"url":null,"abstract":"Previously, hiding operation stalls is one of the important issues to suppress performance degradation of Graphics Processing Units (GPUs). In this paper, we first conduct a detailed study of factors affecting the operation stalls in terms of the fetch group size on the warp scheduler. Throughout this paper, we find that the size of fetch group is highly involved in hiding various types of operation stalls. The short latency stalls can be hidden by issuing other available warps from the same fetch group. Therefore, the short latency stalls may not be hidden well under small sized fetch group since the group has the limited number of issuable warps to hide stalls. On the contrary, the long latency stalls can be hidden by dividing warps into multiple fetch groups. The scheduler switches the fetch groups when the warps in each fetch group reach the long latency memory operation point. Therefore, the stalls may not be hidden well at the large sized fetch group. Increasing the size of fetch group reduces the number of fetch groups to hide the stalls. In addition, the load/store unit stalls are caused by the limited hardware resources to handle the memory operations. To hide all these stalls effectively, we propose a Dynamic Resizing on Active Warps (DRAW) scheduler which adjusts the size of active fetch group. From the evaluation results, DRAW scheduler reduces an average of 16.3% of stall cycles and improves an average performance of 11.3% compared to the conventional two-level warp scheduler.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2015.7095804","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Previously, hiding operation stalls is one of the important issues to suppress performance degradation of Graphics Processing Units (GPUs). In this paper, we first conduct a detailed study of factors affecting the operation stalls in terms of the fetch group size on the warp scheduler. Throughout this paper, we find that the size of fetch group is highly involved in hiding various types of operation stalls. The short latency stalls can be hidden by issuing other available warps from the same fetch group. Therefore, the short latency stalls may not be hidden well under small sized fetch group since the group has the limited number of issuable warps to hide stalls. On the contrary, the long latency stalls can be hidden by dividing warps into multiple fetch groups. The scheduler switches the fetch groups when the warps in each fetch group reach the long latency memory operation point. Therefore, the stalls may not be hidden well at the large sized fetch group. Increasing the size of fetch group reduces the number of fetch groups to hide the stalls. In addition, the load/store unit stalls are caused by the limited hardware resources to handle the memory operations. To hide all these stalls effectively, we propose a Dynamic Resizing on Active Warps (DRAW) scheduler which adjusts the size of active fetch group. From the evaluation results, DRAW scheduler reduces an average of 16.3% of stall cycles and improves an average performance of 11.3% compared to the conventional two-level warp scheduler.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DRAW:在GPU上研究自适应获取组大小的好处

隐藏操作停顿是抑制图形处理器性能下降的重要问题之一。在本文中，我们首先详细研究了影响经纱调度程序上读取组大小的操作延迟因素。在本文中，我们发现获取组的大小与隐藏各种类型的操作停顿高度相关。可以通过发出来自同一获取组的其他可用翘曲来隐藏短延迟延迟。因此，在较小的fetch组中，由于组中可发布的warp数量有限，因此可能无法很好地隐藏短延迟摊位。相反，可以通过将warp划分为多个fetch组来隐藏长时间的延迟。当每个提取组中的翘曲到达长延迟内存操作点时，调度器切换提取组。因此，摊位可能不会隐藏在大型取回组。增加取物组的大小可以减少取物组的数量以隐藏摊位。此外，处理内存操作的硬件资源有限导致了加载/存储单元的停顿。为了有效地隐藏所有这些延迟，我们提出了一个动态调整主动抓取(DRAW)调度程序，它可以调整活动抓取组的大小。从评估结果来看，与传统的两级翘曲调度器相比，DRAW调度器平均减少了16.3%的失速周期，平均性能提高了11.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

自引率

0.00%

发文量