对扭曲调度器友好的STT-RAM/SRAM混合GPGPU寄存器文件设计

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) Pub Date : 2017-11-13 DOI:10.1109/ICCAD.2017.8203850

Quan Deng, Youtao Zhang, Minxuan Zhang, Jun Yang

{"title":"对扭曲调度器友好的STT-RAM/SRAM混合GPGPU寄存器文件设计","authors":"Quan Deng, Youtao Zhang, Minxuan Zhang, Jun Yang","doi":"10.1109/ICCAD.2017.8203850","DOIUrl":null,"url":null,"abstract":"Modern Graphics Processing Units (GPUs) widely adopt large SRAM based register file (RF) to enable fast context-switch. A large SRAM RF may consume 20% to 40% GPU power, which has become one of the major design challenges for GPUs. Recent studies mitigate the issue through hybrid RF designs that architect a large STT-RAM (Spin Transfer Torque Magnetic memory) RF and a small SRAM buffer. However, the long STT-RAM write latency throttles the data exchange between STT-RAM and SRAM, which deprecates warp scheduler with frequent context switches, e.g., round robin scheduler. In this paper, we propose HC-RF, a warp-scheduler friendly hybrid RF design using novel SRAM/STT-RAM hybrid cell (HC) structure. HC-RF exploits cell level integration to improve the effective bandwidth between STT-RAM and SRAM. By enabling silent data transfer from SRAM to STT-RAM without blocking RF banks, HC-RF supports concurrent context-switching and decouples its dependency on warp scheduler. Our experimental results show that, on average, HC-RF achieves 50% performance improvement and 44% energy consumption reduction over the coarse-grained hybrid design when adopting LRR(Loose Round Robin) warp scheduler.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Towards warp-scheduler friendly STT-RAM/SRAM hybrid GPGPU register file design\",\"authors\":\"Quan Deng, Youtao Zhang, Minxuan Zhang, Jun Yang\",\"doi\":\"10.1109/ICCAD.2017.8203850\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern Graphics Processing Units (GPUs) widely adopt large SRAM based register file (RF) to enable fast context-switch. A large SRAM RF may consume 20% to 40% GPU power, which has become one of the major design challenges for GPUs. Recent studies mitigate the issue through hybrid RF designs that architect a large STT-RAM (Spin Transfer Torque Magnetic memory) RF and a small SRAM buffer. However, the long STT-RAM write latency throttles the data exchange between STT-RAM and SRAM, which deprecates warp scheduler with frequent context switches, e.g., round robin scheduler. In this paper, we propose HC-RF, a warp-scheduler friendly hybrid RF design using novel SRAM/STT-RAM hybrid cell (HC) structure. HC-RF exploits cell level integration to improve the effective bandwidth between STT-RAM and SRAM. By enabling silent data transfer from SRAM to STT-RAM without blocking RF banks, HC-RF supports concurrent context-switching and decouples its dependency on warp scheduler. Our experimental results show that, on average, HC-RF achieves 50% performance improvement and 44% energy consumption reduction over the coarse-grained hybrid design when adopting LRR(Loose Round Robin) warp scheduler.\",\"PeriodicalId\":126686,\"journal\":{\"name\":\"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCAD.2017.8203850\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAD.2017.8203850","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

现代图形处理单元(gpu)广泛采用基于SRAM的大寄存器文件(RF)来实现快速上下文切换。一个大的SRAM RF可能会消耗20%到40%的GPU功率，这已经成为GPU的主要设计挑战之一。最近的研究通过混合射频设计缓解了这个问题，该设计构建了一个大型STT-RAM(自旋传递扭矩磁存储器)射频和一个小型SRAM缓冲区。然而，STT-RAM的长写延迟限制了STT-RAM和SRAM之间的数据交换，这不利于使用频繁上下文切换的warp调度器，例如轮询调度器。在本文中，我们提出了HC-RF，一种基于SRAM/STT-RAM混合单元(HC)结构的扭曲调度友好型混合RF设计。HC-RF利用小区级集成来提高STT-RAM和SRAM之间的有效带宽。通过在不阻塞射频库的情况下实现从SRAM到STT-RAM的静默数据传输，HC-RF支持并发上下文切换，并解耦了对warp调程程序的依赖。实验结果表明，采用LRR(Loose Round Robin) warp scheduler时，HC-RF比粗粒度混合设计平均性能提高50%，能耗降低44%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Towards warp-scheduler friendly STT-RAM/SRAM hybrid GPGPU register file design

Modern Graphics Processing Units (GPUs) widely adopt large SRAM based register file (RF) to enable fast context-switch. A large SRAM RF may consume 20% to 40% GPU power, which has become one of the major design challenges for GPUs. Recent studies mitigate the issue through hybrid RF designs that architect a large STT-RAM (Spin Transfer Torque Magnetic memory) RF and a small SRAM buffer. However, the long STT-RAM write latency throttles the data exchange between STT-RAM and SRAM, which deprecates warp scheduler with frequent context switches, e.g., round robin scheduler. In this paper, we propose HC-RF, a warp-scheduler friendly hybrid RF design using novel SRAM/STT-RAM hybrid cell (HC) structure. HC-RF exploits cell level integration to improve the effective bandwidth between STT-RAM and SRAM. By enabling silent data transfer from SRAM to STT-RAM without blocking RF banks, HC-RF supports concurrent context-switching and decouples its dependency on warp scheduler. Our experimental results show that, on average, HC-RF achieves 50% performance improvement and 44% energy consumption reduction over the coarse-grained hybrid design when adopting LRR(Loose Round Robin) warp scheduler.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

自引率

0.00%

发文量

期刊最新文献

Clepsydra: Modeling timing flows in hardware designs A case for low frequency single cycle multi hop NoCs for energy efficiency and high performance P4: Phase-based power/performance prediction of heterogeneous systems via neural networks Cyclist: Accelerating hardware development A coordinated synchronous and asynchronous parallel routing approach for FPGAs