A case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling flexible data compression with assist warps

Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, A. Bhowmick, Rachata Ausavarungnirun, C. Das, M. Kandemir, T. Mowry, O. Mutlu
{"title":"A case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling flexible data compression with assist warps","authors":"Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, A. Bhowmick, Rachata Ausavarungnirun, C. Das, M. Kandemir, T. Mowry, O. Mutlu","doi":"10.1145/2749469.2750399","DOIUrl":null,"url":null,"abstract":"Modern Graphics Processing Units (CPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a CPU is bottle necked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive. This paper introduces the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in CPU execution. CABA provides flexible mechanisms to automatically generate \"assist warps\" that execute on CPU cores to perform specific tasks that can improve CPU performance and efficiency. CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the CPU is bottlenecked by the available computational units, in which case the memory pipelines are idle and can be used by CABA to speed up computation, e.g., by performing memoization using assist warps. We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the CPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides an average performance improvement of 41.7% (as high as 2.6X) across a variety of memory-bandwidth-sensitive GPGPU applications.","PeriodicalId":6878,"journal":{"name":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","volume":"188 1","pages":"41-53"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"106","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2749469.2750399","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 106

Abstract

Modern Graphics Processing Units (CPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a CPU is bottle necked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive. This paper introduces the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in CPU execution. CABA provides flexible mechanisms to automatically generate "assist warps" that execute on CPU cores to perform specific tasks that can improve CPU performance and efficiency. CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the CPU is bottlenecked by the available computational units, in which case the memory pipelines are idle and can be used by CABA to speed up computation, e.g., by performing memoization using assist warps. We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the CPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides an average performance improvement of 41.7% (as high as 2.6X) across a variety of memory-bandwidth-sensitive GPGPU applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
gpu中核心辅助瓶颈加速的案例:通过辅助扭曲实现灵活的数据压缩
现代图形处理单元(cpu)的配置很好,可以支持数千个线程的并发执行。不幸的是,执行过程中的不同瓶颈和异构应用程序需求会导致核心中资源利用率的不平衡。例如,当CPU受到可用的片外内存带宽的限制时,它的计算资源通常非常空闲,等待来自内存的数据到达。本文介绍了核心辅助瓶颈加速(CABA)框架,该框架利用空闲的片上资源来缓解CPU执行中的各种瓶颈。CABA提供了灵活的机制来自动生成在CPU内核上执行的“辅助扭曲”,以执行可以提高CPU性能和效率的特定任务。CABA允许使用空闲的计算单元和管道来缓解内存带宽瓶颈,例如,通过使用辅助扭曲来执行数据压缩以从内存传输更少的数据。相反,同样的框架可以用来处理CPU被可用的计算单元阻塞的情况,在这种情况下,内存管道是空闲的,可以被CABA用来加速计算,例如,通过使用辅助扭曲执行记忆。我们提供了一个全面的设计和评估的CABA,以执行有效和灵活的数据压缩在CPU内存层次,以缓解内存带宽瓶颈。我们的广泛评估表明,当使用CABA实现数据压缩时,在各种内存带宽敏感的GPGPU应用程序中提供41.7%(高达2.6倍)的平均性能改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Redundant Memory Mappings for fast access to large memories Multiple Clone Row DRAM: A low latency and area optimized DRAM Manycore Network Interfaces for in-memory rack-scale computing Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures ShiDianNao: Shifting vision processing closer to the sensor
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1