Automatic HBM Management: Models and Algorithms

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2022-07-11 DOI:10.1145/3490148.3538570

Daniel DeLayo, Kenny Zhang, Kunal Agrawal, M. A. Bender, Jonathan W. Berry, Rathish Das, Benjamin Moseley, C. Phillips

{"title":"Automatic HBM Management: Models and Algorithms","authors":"Daniel DeLayo, Kenny Zhang, Kunal Agrawal, M. A. Bender, Jonathan W. Berry, Rathish Das, Benjamin Moseley, C. Phillips","doi":"10.1145/3490148.3538570","DOIUrl":null,"url":null,"abstract":"Some past and future supercomputer nodes incorporate High- Bandwidth Memory (HBM). Compared to standard DRAM, HBM has similar latency, higher bandwidth and lower capacity. In this paper, we evaluate algorithms for managing High- Bandwidth Memory automatically. Previous work suggests that, in the worst case, performance is extremely sensitive to the policy for managing the channel to DRAM. Prior theory shows that a priority-based scheme (where there is a static strict priority-order among p threads for channel access) is O(1)-competitive, but FIFO is not, and in the worst case is Ω(p) competitive. Following this theoretical guidance would be a disruptive change for vendors, who currently use FIFO variants in their DRAMcontroller hardware. Our goal is to determine theoretically and empirically whether we can justify recommending investment in priority-based DRAM controller hardware. In order to experiment with DRAM channel protocols, we chose a theoretical model, validated it against real hardware, and implemented a basic simulator. We corroborated the previous theoretical results for the model, conducted a parameter sweep while running our simulator on address traces from memory bandwidth-bound codes (GNU sort and TACO sparse matrix-vector product), and designed better channel-access algorithms.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3490148.3538570","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Some past and future supercomputer nodes incorporate High- Bandwidth Memory (HBM). Compared to standard DRAM, HBM has similar latency, higher bandwidth and lower capacity. In this paper, we evaluate algorithms for managing High- Bandwidth Memory automatically. Previous work suggests that, in the worst case, performance is extremely sensitive to the policy for managing the channel to DRAM. Prior theory shows that a priority-based scheme (where there is a static strict priority-order among p threads for channel access) is O(1)-competitive, but FIFO is not, and in the worst case is Ω(p) competitive. Following this theoretical guidance would be a disruptive change for vendors, who currently use FIFO variants in their DRAMcontroller hardware. Our goal is to determine theoretically and empirically whether we can justify recommending investment in priority-based DRAM controller hardware. In order to experiment with DRAM channel protocols, we chose a theoretical model, validated it against real hardware, and implemented a basic simulator. We corroborated the previous theoretical results for the model, conducted a parameter sweep while running our simulator on address traces from memory bandwidth-bound codes (GNU sort and TACO sparse matrix-vector product), and designed better channel-access algorithms.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自动HBM管理:模型和算法

一些过去和未来的超级计算机节点采用了高带宽存储器(HBM)。与标准DRAM相比，HBM具有相似的延迟，更高的带宽和更低的容量。在本文中，我们评估了自动管理高带宽存储器的算法。以前的工作表明，在最坏的情况下，性能对管理到DRAM的通道的策略极其敏感。先前的理论表明，基于优先级的方案(在p个线程之间有一个静态严格的优先级顺序用于通道访问)是0(1)竞争的，但FIFO不是，在最坏的情况下是Ω(p)竞争的。对于目前在其dram控制器硬件中使用FIFO变体的供应商来说，遵循这一理论指导将是一个颠覆性的变化。我们的目标是从理论上和经验上确定我们是否可以合理地建议投资基于优先级的DRAM控制器硬件。为了实验DRAM通道协议，我们选择了一个理论模型，在实际硬件上验证了它，并实现了一个基本的模拟器。我们验证了之前模型的理论结果，在运行模拟器时对内存带宽绑定码(GNU排序和TACO稀疏矩阵向量积)的地址跟踪进行了参数扫描，并设计了更好的信道访问算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures

自引率

0.00%

发文量