On the Feasibility of Advanced Cache Indexing for High-Performance and Energy-Efficient GPGPU Computing

Proceedings of the 3rd International Workshop on Many-core Embedded Systems Pub Date : 2015-06-13 DOI:10.1145/2768177.2768179

Kyu Yeun Kim, Seunghoe Kim, Woongki Baek

{"title":"On the Feasibility of Advanced Cache Indexing for High-Performance and Energy-Efficient GPGPU Computing","authors":"Kyu Yeun Kim, Seunghoe Kim, Woongki Baek","doi":"10.1145/2768177.2768179","DOIUrl":null,"url":null,"abstract":"To achieve higher performance and energy efficiency, GPGPU architectures have recently begun to employ hardware caches. Adding hardware caches to GPGPUs, however, does not automatically guarantee improved performance and energy efficiency due to the thrashing in small hardware caches shared by thousands of threads. While prior work has proposed warp scheduling and cache bypassing techniques to address this issue, relatively little work has been done in the context of advanced cache indexing. To bridge this gap, this work investigates the feasibility of advanced cache indexing for high-performance and energy-efficient GPGPU computing. We first discuss the design and implementation of static and adaptive cache indexing schemes for GPGPUs. We then quantify the effectiveness of the advanced indexing schemes using GPGPU benchmarks. Our quantitative evaluation demonstrates that the advanced cache indexing schemes are promising in that they significantly outperform the conventional cache indexing scheme. In addition, for a subset of cache-sensitive benchmarks, the adaptive indexing scheme substantially outperforms the static indexing scheme by effectively identifying and utilizing high-quality indexing bits based on runtime information. Finally, our evaluation shows that the effectiveness of advanced cache indexing is sensitive to different warp schedulers, motivating further research on coordinated cache indexing and warp scheduling techniques.","PeriodicalId":374555,"journal":{"name":"Proceedings of the 3rd International Workshop on Many-core Embedded Systems","volume":"226 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Workshop on Many-core Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2768177.2768179","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

To achieve higher performance and energy efficiency, GPGPU architectures have recently begun to employ hardware caches. Adding hardware caches to GPGPUs, however, does not automatically guarantee improved performance and energy efficiency due to the thrashing in small hardware caches shared by thousands of threads. While prior work has proposed warp scheduling and cache bypassing techniques to address this issue, relatively little work has been done in the context of advanced cache indexing. To bridge this gap, this work investigates the feasibility of advanced cache indexing for high-performance and energy-efficient GPGPU computing. We first discuss the design and implementation of static and adaptive cache indexing schemes for GPGPUs. We then quantify the effectiveness of the advanced indexing schemes using GPGPU benchmarks. Our quantitative evaluation demonstrates that the advanced cache indexing schemes are promising in that they significantly outperform the conventional cache indexing scheme. In addition, for a subset of cache-sensitive benchmarks, the adaptive indexing scheme substantially outperforms the static indexing scheme by effectively identifying and utilizing high-quality indexing bits based on runtime information. Finally, our evaluation shows that the effectiveness of advanced cache indexing is sensitive to different warp schedulers, motivating further research on coordinated cache indexing and warp scheduling techniques.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于高效节能GPGPU计算的高级缓存索引可行性研究

为了实现更高的性能和能源效率，GPGPU架构最近开始使用硬件缓存。然而，向gpgpu添加硬件缓存并不能自动保证性能和能源效率的提高，因为由数千个线程共享的小型硬件缓存会出现抖动。虽然之前的工作已经提出了曲速调度和缓存绕过技术来解决这个问题，但在高级缓存索引方面做的工作相对较少。为了弥补这一差距，本工作研究了高性能和节能GPGPU计算的高级缓存索引的可行性。我们首先讨论了gpgpu的静态和自适应缓存索引方案的设计和实现。然后，我们使用GPGPU基准来量化高级索引方案的有效性。我们的定量评估表明，先进的缓存索引方案是有前途的，因为它们明显优于传统的缓存索引方案。此外，对于缓存敏感基准测试的一个子集，自适应索引方案通过基于运行时信息有效地识别和利用高质量索引位，大大优于静态索引方案。最后，我们的评估表明，高级缓存索引的有效性对不同的曲调度程序很敏感，这激励了进一步研究协调缓存索引和曲调度技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 3rd International Workshop on Many-core Embedded Systems

自引率

0.00%

发文量