{"title":"Wafer-Scale Computing: Advancements, Challenges, and Future Perspectives [Feature]","authors":"Yang Hu, Xinhan Lin, Huizheng Wang, Zhen He, Xingmao Yu, Jiahao Zhang, Qize Yang, Zheng Xu, Sihan Guan, Jiahao Fang, Haoran Shang, Xinru Tang, Xu Dai, Shaojun Wei, Shouyi Yin","doi":"10.1109/mcas.2024.3349669","DOIUrl":null,"url":null,"abstract":"Nowadays, artificial intelligence (AI) technology with large models plays an increasingly important role in both academia and industry. It also brings a rapidly increasing demand for the computing power of the hardware. As the computing demand for AI continues to grow, the growth of hardware computing power has failed to keep up. This has become a significant factor restricting the development of AI. The augmentation of hardware computing power is mainly propelled by the escalation of transistor density and chip area. However, the former is impeded by the termination of the Moore’s Law and Dennard scaling, and the latter is significantly restricted by the challenge of disrupting the legacy fabrication equipment and process. In recent years, advanced packaging technologies that have gradually matured are increasingly used to implement bigger chips that integrate multiple chiplets, while still providing interconnections with chip-level density and bandwidth. This technique points out a new path of continuing the increase of computing power while leveraging the current fabrication process without significant disruption. Enabled by this technique, a chip can extend to a size of wafer-scale (over 10,000 mm\n<inline-formula xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"> <tex-math notation=\"LaTeX\">$^{2}$ </tex-math></inline-formula>\n), provisioning orders of magnitude more computing capabilities (several POPS within just one monolithic chip) and die-to-die bandwidth density (over 15 GB/s/mm) than regular chips, and emerges a new Wafer-scale Computing paradigm. Compared to conventional high-performance computing paradigms such as multi-accelerator and datacenter-scale computing, Wafer-scale Computing shows remarkable advantages in communication bandwidth, integration density, and programmability potential. Not surprisingly, disruptive Wafer-scale Computing also brings unprecedented design challenges for hardware architecture, design- <inline-formula xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"> <tex-math notation=\"LaTeX\">$\\backslash $ </tex-math></inline-formula>\nsystem- technology co-optimization, power and cooling systems, and compiler tool chain. At present, there are no comprehensive surveys summarizing the current state and design insights of Wafer-scale Computing. This article aims to take the first step to help academia and industry review existing wafer-scale chips and essential technologies in a one-stop manner. So that people can conveniently grasp the basic knowledge and key points, understand the achievements and shortcomings of existing research, and contribute to this promising research direction.","PeriodicalId":55038,"journal":{"name":"IEEE Circuits and Systems Magazine","volume":"2 1","pages":""},"PeriodicalIF":5.6000,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Circuits and Systems Magazine","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/mcas.2024.3349669","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Nowadays, artificial intelligence (AI) technology with large models plays an increasingly important role in both academia and industry. It also brings a rapidly increasing demand for the computing power of the hardware. As the computing demand for AI continues to grow, the growth of hardware computing power has failed to keep up. This has become a significant factor restricting the development of AI. The augmentation of hardware computing power is mainly propelled by the escalation of transistor density and chip area. However, the former is impeded by the termination of the Moore’s Law and Dennard scaling, and the latter is significantly restricted by the challenge of disrupting the legacy fabrication equipment and process. In recent years, advanced packaging technologies that have gradually matured are increasingly used to implement bigger chips that integrate multiple chiplets, while still providing interconnections with chip-level density and bandwidth. This technique points out a new path of continuing the increase of computing power while leveraging the current fabrication process without significant disruption. Enabled by this technique, a chip can extend to a size of wafer-scale (over 10,000 mm
$^{2}$
), provisioning orders of magnitude more computing capabilities (several POPS within just one monolithic chip) and die-to-die bandwidth density (over 15 GB/s/mm) than regular chips, and emerges a new Wafer-scale Computing paradigm. Compared to conventional high-performance computing paradigms such as multi-accelerator and datacenter-scale computing, Wafer-scale Computing shows remarkable advantages in communication bandwidth, integration density, and programmability potential. Not surprisingly, disruptive Wafer-scale Computing also brings unprecedented design challenges for hardware architecture, design- $\backslash $
system- technology co-optimization, power and cooling systems, and compiler tool chain. At present, there are no comprehensive surveys summarizing the current state and design insights of Wafer-scale Computing. This article aims to take the first step to help academia and industry review existing wafer-scale chips and essential technologies in a one-stop manner. So that people can conveniently grasp the basic knowledge and key points, understand the achievements and shortcomings of existing research, and contribute to this promising research direction.
期刊介绍:
The IEEE Circuits and Systems Magazine covers the subject areas represented by the Society's transactions, including: analog, passive, switch capacitor, and digital filters; electronic circuits, networks, graph theory, and RF communication circuits; system theory; discrete, IC, and VLSI circuit design; multidimensional circuits and systems; large-scale systems and power networks; nonlinear circuits and systems, wavelets, filter banks, and applications; neural networks; and signal processing. Content also covers the areas represented by the Society technical committees: analog signal processing, cellular neural networks and array computing, circuits and systems for communications, computer-aided network design, digital signal processing, multimedia systems and applications, neural systems and applications, nonlinear circuits and systems, power systems and power electronics and circuits, sensors and micromaching, visual signal processing and communication, and VLSI systems and applications. Lastly, the magazine covers the interests represented by the widespread conference activity of the IEEE Circuits and Systems Society. In addition to the technical articles, the magazine also covers Society administrative activities, as for instance the meetings of the Board of Governors, Society People, as for instance the stories of award winners-fellows, medalists, and so forth, and Places reached by the Society, including readable reports from the Society's conferences around the world.