Data Convection

Soheil Khadirsharbiyani, Jagadish B. Kotra, Karthik Rao, M. Kandemir
{"title":"Data Convection","authors":"Soheil Khadirsharbiyani, Jagadish B. Kotra, Karthik Rao, M. Kandemir","doi":"10.1145/3508027","DOIUrl":null,"url":null,"abstract":"Stacked DRAMs have been studied, evaluated in multiple scenarios, and even productized in the last decade. The large available bandwidth they offer make them an attractive choice, particularly, in high-performance computing (HPC) environments. Consequently, many prior research efforts have studied and evaluated 3D stacked DRAM-based designs. Despite offering high bandwidth, stacked DRAMs are severely constrained by the overall memory capacity offered. In this paper, we study and evaluate integrating stacked DRAM on top of a GPU in a 3D manner which in tandem with the 2.5D stacked DRAM increases the capacity and the bandwidth without increasing the package size. This integration of 3D stacked DRAMs aids in satisfying the capacity requirements of emerging workloads like deep learning. Though this vertical 3D integration of stacked DRAMs also increases the total available bandwidth, we observe that the bandwidth offered by these 3D stacked DRAMs is severely limited by the heat generated on the GPU. Based on our experiments on a cycle-level simulator, we make a key observation that the sections of the 3D stacked DRAM that are closer to the GPU have lower retention-times compared to the farther layers of stacked DRAM. This thermal-induced variable retention-times causes certain sections of 3D stacked DRAM to be refreshed more frequently compared to the others, thereby resulting in thermal-induced NUMA paradigms. To alleviate such thermal-induced NUMA behavior, we propose and experimentally evaluate three different incarnations of Data Convection, i.e., Intra-layer, Inter-layer, and Intra + Inter-layer, that aim at placing the most-frequently accessed data in a thermal-induced retention-aware fashion, taking into account both bank-level and channel-level parallelism. Our evaluations on a cycle-level GPU simulator indicate that, in a multi-application scenario, our Intra-layer, Inter-layer and Intra + Inter-layer algorithms improve the overall performance by 1.8%, 11.7%, and 14.4%, respectively, over a baseline that already encompasses 3D+2.5D stacked DRAMs.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Stacked DRAMs have been studied, evaluated in multiple scenarios, and even productized in the last decade. The large available bandwidth they offer make them an attractive choice, particularly, in high-performance computing (HPC) environments. Consequently, many prior research efforts have studied and evaluated 3D stacked DRAM-based designs. Despite offering high bandwidth, stacked DRAMs are severely constrained by the overall memory capacity offered. In this paper, we study and evaluate integrating stacked DRAM on top of a GPU in a 3D manner which in tandem with the 2.5D stacked DRAM increases the capacity and the bandwidth without increasing the package size. This integration of 3D stacked DRAMs aids in satisfying the capacity requirements of emerging workloads like deep learning. Though this vertical 3D integration of stacked DRAMs also increases the total available bandwidth, we observe that the bandwidth offered by these 3D stacked DRAMs is severely limited by the heat generated on the GPU. Based on our experiments on a cycle-level simulator, we make a key observation that the sections of the 3D stacked DRAM that are closer to the GPU have lower retention-times compared to the farther layers of stacked DRAM. This thermal-induced variable retention-times causes certain sections of 3D stacked DRAM to be refreshed more frequently compared to the others, thereby resulting in thermal-induced NUMA paradigms. To alleviate such thermal-induced NUMA behavior, we propose and experimentally evaluate three different incarnations of Data Convection, i.e., Intra-layer, Inter-layer, and Intra + Inter-layer, that aim at placing the most-frequently accessed data in a thermal-induced retention-aware fashion, taking into account both bank-level and channel-level parallelism. Our evaluations on a cycle-level GPU simulator indicate that, in a multi-application scenario, our Intra-layer, Inter-layer and Intra + Inter-layer algorithms improve the overall performance by 1.8%, 11.7%, and 14.4%, respectively, over a baseline that already encompasses 3D+2.5D stacked DRAMs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数据对流
在过去的十年中,堆叠dram已经在多种情况下进行了研究,评估,甚至产品化。它们提供的大可用带宽使它们成为一个有吸引力的选择,特别是在高性能计算(HPC)环境中。因此,许多先前的研究工作已经研究和评估了基于3D堆叠dram的设计。尽管提供高带宽,但堆叠dram受到所提供的整体内存容量的严重限制。在本文中,我们研究和评估了以3D方式将堆叠DRAM集成在GPU上,与2.5D堆叠DRAM串联在一起,在不增加封装尺寸的情况下增加容量和带宽。这种3D堆叠dram的集成有助于满足深度学习等新兴工作负载的容量需求。虽然这种堆叠dram的垂直3D集成也增加了总可用带宽,但我们观察到这些3D堆叠dram提供的带宽受到GPU上产生的热量的严重限制。基于我们在周期级模拟器上的实验,我们做出了一个关键的观察,即与更远的堆叠DRAM层相比,靠近GPU的3D堆叠DRAM的部分具有更低的保留时间。这种热诱导的可变保留时间导致3D堆叠DRAM的某些部分比其他部分更频繁地刷新,从而导致热诱导的NUMA范式。为了减轻这种热诱导的NUMA行为,我们提出并实验评估了三种不同的数据对流形式,即层内、层间和层内+层间,旨在将最频繁访问的数据置于热诱导的保留感知方式中,同时考虑到银行级和通道级并行性。我们在周期级GPU模拟器上的评估表明,在多应用场景下,我们的Intra-layer, Inter-layer和Intra + Inter-layer算法在已经包含3D+2.5D堆叠dram的基线上分别提高了1.8%,11.7%和14.4%的整体性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.20
自引率
0.00%
发文量
0
期刊最新文献
A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLs POMACS V7, N2, June 2023 Editorial SplitRPC: A {Control + Data} Path Splitting RPC Stack for ML Inference Serving Smash: Flexible, Fast, and Resource-efficient Placement and Lookup of Distributed Storage Towards Accelerating Data Intensive Application's Shuffle Process Using SmartNICs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1