Cache Capacity Aware Thread Scheduling for Irregular Memory Access on many-core GPGPUs

Hsien-Kai Kuo, Ta-Kan Yen, B. Lai, Jing-Yang Jou
{"title":"Cache Capacity Aware Thread Scheduling for Irregular Memory Access on many-core GPGPUs","authors":"Hsien-Kai Kuo, Ta-Kan Yen, B. Lai, Jing-Yang Jou","doi":"10.1109/ASPDAC.2013.6509618","DOIUrl":null,"url":null,"abstract":"On-chip shared cache is effective to alleviate the memory bottleneck in modern many-core systems, such as GPGPUs. However, when scheduling numerous concurrent threads on a GPGPU, a cache capacity agnostic scheduling scheme could lead to severe cache contention among threads and thus significant performance degradation. Moreover, the diverse working sets in irregular applications make the cache contention issue an even more serious problem. As a result, taking cache capacity into account has become a critical scheduling issue of GPGPUs. This paper formulates a Cache Capacity Aware Thread Scheduling Problem to capture the impact of cache capacity as well as different architectural considerations. With a proof to be NP-hard, this paper has proposed two algorithms to perform the cache capacity aware thread scheduling. The simulation results on Nvidia's Fermi configuration have shown that the proposed scheduling scheme can effectively avoid cache contention, and achieve an average of 44.7% cache miss reduction and 28.5% runtime enhancement. The paper also shows the runtime can be enhanced up to 62.5% for more complex applications.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASPDAC.2013.6509618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

On-chip shared cache is effective to alleviate the memory bottleneck in modern many-core systems, such as GPGPUs. However, when scheduling numerous concurrent threads on a GPGPU, a cache capacity agnostic scheduling scheme could lead to severe cache contention among threads and thus significant performance degradation. Moreover, the diverse working sets in irregular applications make the cache contention issue an even more serious problem. As a result, taking cache capacity into account has become a critical scheduling issue of GPGPUs. This paper formulates a Cache Capacity Aware Thread Scheduling Problem to capture the impact of cache capacity as well as different architectural considerations. With a proof to be NP-hard, this paper has proposed two algorithms to perform the cache capacity aware thread scheduling. The simulation results on Nvidia's Fermi configuration have shown that the proposed scheduling scheme can effectively avoid cache contention, and achieve an average of 44.7% cache miss reduction and 28.5% runtime enhancement. The paper also shows the runtime can be enhanced up to 62.5% for more complex applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多核gpgpu非规则内存访问的缓存容量感知线程调度
片上共享缓存是缓解现代多核系统(如gpgpu)内存瓶颈的有效方法。然而,当调度GPGPU上的多个并发线程时,与缓存容量无关的调度方案可能导致线程之间严重的缓存争用,从而导致显著的性能下降。此外,不规则应用程序中的各种工作集使缓存争用问题变得更加严重。因此,考虑缓存容量已成为gpgpu的一个关键调度问题。本文提出了一个缓存容量感知线程调度问题,以捕获缓存容量的影响以及不同的体系结构考虑因素。在证明了NP-hard算法的基础上,提出了两种基于缓存容量感知的线程调度算法。在Nvidia的Fermi配置上的仿真结果表明,该调度方案可以有效避免缓存争用,平均减少44.7%的缓存缺失,提高28.5%的运行时间。这篇论文还表明,对于更复杂的应用程序,运行时可以提高62.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Compiler-assisted refresh minimization for volatile STT-RAM cache Processor and DRAM integration by TSV-based 3-D stacking for power-aware SOCs Performance bound and yield analysis for analog circuits under process variations MIXSyn: An efficient logic synthesis methodology for mixed XOR-AND/OR dominated circuits Unconditionally stable explicit method for the fast 3-D simulation of on-chip power distribution network with through silicon via
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1