Architecting Waferscale Processors - A GPU Case Study

Saptadeep Pal, Daniel Petrisko, Matthew Tomei, Puneet Gupta, S. Iyer, Rakesh Kumar
{"title":"Architecting Waferscale Processors - A GPU Case Study","authors":"Saptadeep Pal, Daniel Petrisko, Matthew Tomei, Puneet Gupta, S. Iyer, Rakesh Kumar","doi":"10.1109/HPCA.2019.00042","DOIUrl":null,"url":null,"abstract":"Increasing communication overheads are already threatening computer system scaling. One approach to dramatically reduce communication overheads is waferscale processing. However, waferscale processors [1], [2], [3] have been historically deemed impractical due to yield issues [1], [4] inherent to conventional integration technology. Emerging integration technologies such as Silicon-Interconnection Fabric (Si-IF) [5], [6], [7], where pre-manufactured dies are directly bonded on to a silicon wafer, may enable one to build a waferscale system without the corresponding yield issues. As such, waferscalar architectures need to be revisited. In this paper, we study if it is feasible and useful to build today’s architectures at waferscale. Using a waferscale GPU as a case study, we show that while a 300 mm wafer can house about 100 GPU modules (GPM), only a much scaled down GPU architecture with about 40 GPMs can be built when physical concerns are considered. We also study the performance and energy implications of waferscale architectures. We show that waferscale GPUs can provide significant performance and energy efficiency advantages (up to 18.9x speedup and 143x EDP benefit compared against equivalent MCM-GPU based implementation on PCB) without any change in the programming model. We also develop thread scheduling and data placement policies for waferscale GPU architectures. Our policies outperform state-of-art scheduling and data placement policies by up to 2.88x (average 1.4x) and 1.62x (average 1.11x) for 24 GPM and 40 GPM cases respectively. Finally, we build the first Si-IF prototype with interconnected dies. We observe 100% of the inter-die interconnects to be successfully connected in our prototype. Coupled with the high yield reported previously for bonding of dies on Si-IF, this demonstrates the technological readiness for building a waferscale GPU architecture. Keywords—Waferscale Processors, GPU, Silicon Interconnect Fabric","PeriodicalId":102050,"journal":{"name":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"14 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2019.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26

Abstract

Increasing communication overheads are already threatening computer system scaling. One approach to dramatically reduce communication overheads is waferscale processing. However, waferscale processors [1], [2], [3] have been historically deemed impractical due to yield issues [1], [4] inherent to conventional integration technology. Emerging integration technologies such as Silicon-Interconnection Fabric (Si-IF) [5], [6], [7], where pre-manufactured dies are directly bonded on to a silicon wafer, may enable one to build a waferscale system without the corresponding yield issues. As such, waferscalar architectures need to be revisited. In this paper, we study if it is feasible and useful to build today’s architectures at waferscale. Using a waferscale GPU as a case study, we show that while a 300 mm wafer can house about 100 GPU modules (GPM), only a much scaled down GPU architecture with about 40 GPMs can be built when physical concerns are considered. We also study the performance and energy implications of waferscale architectures. We show that waferscale GPUs can provide significant performance and energy efficiency advantages (up to 18.9x speedup and 143x EDP benefit compared against equivalent MCM-GPU based implementation on PCB) without any change in the programming model. We also develop thread scheduling and data placement policies for waferscale GPU architectures. Our policies outperform state-of-art scheduling and data placement policies by up to 2.88x (average 1.4x) and 1.62x (average 1.11x) for 24 GPM and 40 GPM cases respectively. Finally, we build the first Si-IF prototype with interconnected dies. We observe 100% of the inter-die interconnects to be successfully connected in our prototype. Coupled with the high yield reported previously for bonding of dies on Si-IF, this demonstrates the technological readiness for building a waferscale GPU architecture. Keywords—Waferscale Processors, GPU, Silicon Interconnect Fabric
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
架构晶圆级处理器- GPU案例研究
不断增加的通信开销已经威胁到计算机系统的扩展。大幅度降低通信开销的一种方法是晶圆级处理。然而,由于传统集成技术固有的良率问题[1],[4],晶圆级处理器[1],[2],[3]一直被认为是不切实际的。新兴的集成技术,如硅互连结构(Si-IF)[5],[6],[7],其中预先制造的模具直接粘合在硅片上,可以使人们建立一个晶圆级系统,而没有相应的良率问题。因此,需要重新审视晶圆标量架构。在本文中,我们研究了在晶圆规模上构建今天的架构是否可行和有用。使用晶圆级GPU作为案例研究,我们表明,虽然300毫米晶圆可以容纳大约100个GPU模块(GPM),但当考虑到物理问题时,只能构建具有大约40个GPM的大幅缩小的GPU架构。我们还研究了晶圆级架构的性能和能源影响。我们表明,在不改变编程模型的情况下,晶圆级gpu可以提供显着的性能和能效优势(与基于PCB的等效MCM-GPU实现相比,高达18.9倍的加速和143倍的EDP优势)。我们还为晶圆级GPU架构开发线程调度和数据放置策略。在24 GPM和40 GPM情况下,我们的策略分别比最先进的调度和数据放置策略高出2.88倍(平均1.4倍)和1.62倍(平均1.11倍)。最后,我们建立了第一个Si-IF原型与互连的模具。在我们的原型中,我们观察到100%的内部芯片互连成功连接。再加上之前报道的Si-IF上的高成品率,这表明构建晶圆级GPU架构的技术准备就绪。关键词:晶圆级处理器,GPU,硅互连结构
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Machine Learning at Facebook: Understanding Inference at the Edge Understanding the Future of Energy Efficiency in Multi-Module GPUs POWERT Channels: A Novel Class of Covert CommunicationExploiting Power Management Vulnerabilities The Accelerator Wall: Limits of Chip Specialization Featherlight Reuse-Distance Measurement
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1