首页 > 最新文献

IEEE Journal on Emerging and Selected Topics in Circuits and Systems最新文献

英文 中文
Iterative Layout-Aware Power, Thermal, and IR-Drop Co-Optimization: Ensuring Convergency in 3D-ICs 迭代布局感知功率、热和红外下降协同优化:确保3d - ic的收敛性
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-22 DOI: 10.1109/JETCAS.2025.3591727
Mohamed Naeim;Dwaipayan Biswas;Yun Dai;Odysseas Zografos;Herman Oprins;Geert Van der Plas;C. T. Kao;Pinhong Chen;Dragomir Milojevic
Continuous scaling of integrated circuits and the adoption of 3D integration have significantly increased power density, creating critical challenges for thermal and power integrity. Rising power densities can trigger thermal runaway, negatively impacting device reliability and performance. This paper presents an electrothermal coupling framework built upon commercial Electronic Design Automation (EDA) tools, designed for precise iterative analysis of power, thermal and IR-drop characteristics in 2D and 3D configurations of a many-core RISCV SoC. The proposed framework automates iterative Power-Temperature (P-T) simulations to evaluate thermal convergence and potential thermal runaway scenarios in Memory-on-Logic (MoL) and Logic-on-Memory (LoM) stacked configurations. It identifies thermal hotspots, tracks their progression and evaluates various cooling strategies. Initial results indicate that the first P-T iteration provides up to 10% power savings in 3D due to a 11% wirelength reduction compared to 2D. However, by the fifth iteration, MoL power saving reduces to 4%, whereas LoM maintains the 10% saving. LoM exhibits a 6°C lower peak temperature compared to MoL under equivalent cooling conditions. Compared to 2D, the range of power densities (110 - $260~W/cm^{2}$ ) results in temperature variations of $-1^{circ } C$ to $+3^{circ } C$ in LoM. A $10^{circ } C$ rise in temperature increases IR-drop by 11%; however, physical design-aware adjustments, such as a tighter Power Delivery Network (PDN) pitch, effectively reducing IR-drop by 54%, mitigating thermal impacts.
集成电路的持续缩放和3D集成的采用显著提高了功率密度,为热和功率完整性带来了严峻的挑战。功率密度的上升会引发热失控,对器件的可靠性和性能产生负面影响。本文提出了一种基于商业电子设计自动化(EDA)工具的电热耦合框架,用于精确迭代分析多核RISCV SoC的2D和3D配置中的功率、热和红外降特性。提出的框架可自动进行迭代功率-温度(P-T)模拟,以评估内存-逻辑(MoL)和逻辑-内存(LoM)堆叠配置中的热收敛和潜在的热失控情况。它识别热热点,跟踪其进展并评估各种冷却策略。初步结果表明,由于与2D相比,第一次P-T迭代减少了11%的带宽,因此在3D中可节省高达10%的功耗。然而,到第五次迭代时,MoL的功耗降低到4%,而LoM的功耗保持在10%。在相同的冷却条件下,LoM的峰值温度比MoL低6°C。与二维相比,功率密度范围(110 - 260~W/cm^{2}$)导致温度变化$-1^{circ} C$至$+3^{circ} C$。气温每上升10美元,ir -下降11%;然而,物理设计感知调整,如更紧凑的电力输送网络(PDN)间距,有效地减少了54%的ir下降,减轻了热影响。
{"title":"Iterative Layout-Aware Power, Thermal, and IR-Drop Co-Optimization: Ensuring Convergency in 3D-ICs","authors":"Mohamed Naeim;Dwaipayan Biswas;Yun Dai;Odysseas Zografos;Herman Oprins;Geert Van der Plas;C. T. Kao;Pinhong Chen;Dragomir Milojevic","doi":"10.1109/JETCAS.2025.3591727","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3591727","url":null,"abstract":"Continuous scaling of integrated circuits and the adoption of 3D integration have significantly increased power density, creating critical challenges for thermal and power integrity. Rising power densities can trigger thermal runaway, negatively impacting device reliability and performance. This paper presents an electrothermal coupling framework built upon commercial Electronic Design Automation (EDA) tools, designed for precise iterative analysis of power, thermal and IR-drop characteristics in 2D and 3D configurations of a many-core RISCV SoC. The proposed framework automates iterative Power-Temperature (P-T) simulations to evaluate thermal convergence and potential thermal runaway scenarios in Memory-on-Logic (MoL) and Logic-on-Memory (LoM) stacked configurations. It identifies thermal hotspots, tracks their progression and evaluates various cooling strategies. Initial results indicate that the first P-T iteration provides up to 10% power savings in 3D due to a 11% wirelength reduction compared to 2D. However, by the fifth iteration, MoL power saving reduces to 4%, whereas LoM maintains the 10% saving. LoM exhibits a 6°C lower peak temperature compared to MoL under equivalent cooling conditions. Compared to 2D, the range of power densities (110 - <inline-formula> <tex-math>$260~W/cm^{2}$ </tex-math></inline-formula>) results in temperature variations of <inline-formula> <tex-math>$-1^{circ } C$ </tex-math></inline-formula> to <inline-formula> <tex-math>$+3^{circ } C$ </tex-math></inline-formula> in LoM. A <inline-formula> <tex-math>$10^{circ } C$ </tex-math></inline-formula> rise in temperature increases IR-drop by 11%; however, physical design-aware adjustments, such as a tighter Power Delivery Network (PDN) pitch, effectively reducing IR-drop by 54%, mitigating thermal impacts.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"648-658"},"PeriodicalIF":3.8,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bumpless Build Cube (BBCube) 3D: Heterogeneous 3D Integration Using WOW and COW 无碰撞构建立方体(BBCube) 3D:使用WOW和COW的异构3D集成
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-22 DOI: 10.1109/JETCAS.2025.3591677
Norio Chujo;Hiroyuki Ryoson;Koji Sakui;Shinji Sugatani;Masao Taguchi;Takayuki Ohba
We propose a novel technology called Bumpless Build Cube (BBCubeTM) 3D for AI and high-performance computing (HPC) applications that require high bandwidth and power efficiency. BBCube 3D is constructed through heterogeneous 3D integration, in which xPU (e.g., CPU, GPU, TPU) chiplets and DRAM dies are stacked using a combination of bumpless wafer-on-wafer (WoW) and chip-on-wafer (CoW) processes. The bumpless stacking process adopts a method similar to multilevel metallization in the back-end-of-line (BEOL), enabling BBCube to provide reliable and high-density interconnects between dies. Moreover, BBCube features low-capacitance and low-impedance through-silicon vias (TSVs) due to the use of thin silicon and slim TSV structures. To further enhance performance, a highly parallel DRAM architecture leveraging the bumpless WoW process is introduced. The high-density TSVs enable lower data transmission speeds without compromising bandwidth. Additionally, the adoption of four-phase shielded I/Os (FPS-I/O) allows for a reduction in power supply voltage. BBCube 3D has the potential to achieve a bandwidth 30 times higher than DDR5 and four times higher than HBM2E, while achieving bit access energy consumption reduced to one-twentieth that of DDR5 and one-fifth that of HBM2E. The low-impedance TSVs in BBCube ensure robust power integrity for the xPU stacked on top of the layered DRAM. Furthermore, integrating an xPU on top of the Cube enables efficient cooling of high-power xPUs. BBCube can accommodate an xPU with a power density exceeding 50 W/cm2 — comparable to the latest GPUs — while maintaining the DRAM temperature below 95°C.
我们提出了一种新的技术,称为无碰撞构建立方体(BBCubeTM) 3D用于人工智能和高性能计算(HPC)应用,需要高带宽和功率效率。BBCube 3D是通过异构3D集成构建的,其中xPU(例如CPU, GPU, TPU)芯片和DRAM芯片使用无凹凸的晶圆上(WoW)和晶圆上(CoW)工艺组合堆叠。无凹凸堆积工艺采用了类似于后端线(BEOL)多层金属化的方法,使BBCube能够在模具之间提供可靠和高密度的互连。此外,由于使用了薄硅和超薄的TSV结构,BBCube具有低电容和低阻抗的硅通孔(TSV)。为了进一步提高性能,引入了一种利用无颠簸WoW过程的高度并行DRAM架构。高密度tsv可以在不影响带宽的情况下实现更低的数据传输速度。此外,采用四相屏蔽I/O (FPS-I/O)可以降低电源电压。BBCube 3D有可能实现比DDR5高30倍、比HBM2E高4倍的带宽,同时实现比特访问能耗降低到DDR5的二十分之一、HBM2E的五分之一。BBCube中的低阻抗tsv可确保堆叠在分层DRAM之上的xPU具有强大的电源完整性。此外,在Cube上集成xPU可以实现高功率xPU的高效冷却。BBCube可以容纳功率密度超过50 W/cm2的xPU,与最新的gpu相当,同时保持DRAM温度低于95°C。
{"title":"Bumpless Build Cube (BBCube) 3D: Heterogeneous 3D Integration Using WOW and COW","authors":"Norio Chujo;Hiroyuki Ryoson;Koji Sakui;Shinji Sugatani;Masao Taguchi;Takayuki Ohba","doi":"10.1109/JETCAS.2025.3591677","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3591677","url":null,"abstract":"We propose a novel technology called Bumpless Build Cube (BBCubeTM) 3D for AI and high-performance computing (HPC) applications that require high bandwidth and power efficiency. BBCube 3D is constructed through heterogeneous 3D integration, in which xPU (e.g., CPU, GPU, TPU) chiplets and DRAM dies are stacked using a combination of bumpless wafer-on-wafer (WoW) and chip-on-wafer (CoW) processes. The bumpless stacking process adopts a method similar to multilevel metallization in the back-end-of-line (BEOL), enabling BBCube to provide reliable and high-density interconnects between dies. Moreover, BBCube features low-capacitance and low-impedance through-silicon vias (TSVs) due to the use of thin silicon and slim TSV structures. To further enhance performance, a highly parallel DRAM architecture leveraging the bumpless WoW process is introduced. The high-density TSVs enable lower data transmission speeds without compromising bandwidth. Additionally, the adoption of four-phase shielded I/Os (FPS-I/O) allows for a reduction in power supply voltage. BBCube 3D has the potential to achieve a bandwidth 30 times higher than DDR5 and four times higher than HBM2E, while achieving bit access energy consumption reduced to one-twentieth that of DDR5 and one-fifth that of HBM2E. The low-impedance TSVs in BBCube ensure robust power integrity for the xPU stacked on top of the layered DRAM. Furthermore, integrating an xPU on top of the Cube enables efficient cooling of high-power xPUs. BBCube can accommodate an xPU with a power density exceeding 50 W/cm<sup>2</sup> — comparable to the latest GPUs — while maintaining the DRAM temperature below 95°C.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"404-414"},"PeriodicalIF":3.8,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Die-to-Die Communication: UCIe Link Simulation and Optimization in a Chiplet-Based System 高效模对模通信:基于芯片系统的UCIe链路仿真与优化
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-22 DOI: 10.1109/JETCAS.2025.3590822
Kunyue Li;Shuaipeng Li;Xiaoyan Li;Zizheng Dong;Sai Gao;Jialei Sun;Naifeng Jing;Qin Wang;Guanghui He;Jianfei Jiang
With the increasing complexity of chiplet-based architectures and the growing adoption of heterogeneous computing, efficient and high-precision simulation frameworks are essential for evaluating chiplet interconnect performance. This paper presents an implementation of the UCIe protocol (Universal Chiplet Interconnect Express) within the gem5 simulation environment, providing a comprehensive system-level modeling framework for UCIe-based interconnects. The proposed UCIe link model supports full-stack protocol simulation, including transaction-level processing, die-to-die adaptation, and physical layer interactions. A flit-based packing mechanism is introduced to enable accurate transmission modeling, and an Ack/Nak-based retry mechanism is introduced to ensure robust data integrity. An optimized event-driven scheduling strategy is incorporated to address performance bottlenecks observed in traditional UCIe simulations. Experimental evaluations validate the accuracy and efficiency of the proposed UCIe link model by comparing its latency with theoretical values under different computational workloads. The results show that the model closely matches the theoretical predictions, with a latency deviation of less than 0.5% in most cases. Additionally, performance comparisons between UCIe interconnect, PCIe interconnect, and direct memory transfer reveal that UCIe incurs only a minor protocol overhead (less than 0.7%), making it a practical and scalable solution for multi-chipset interconnects. The proposed UCIe simulation framework provides a high-precision virtual verification platform for chiplet system design, enabling low-cost, high-accuracy performance evaluations, and lays the foundation for future optimizations in large-scale chiplet-based architectures.
随着基于芯片架构的复杂性的增加和异构计算的日益普及,高效和高精度的仿真框架对于评估芯片互连性能至关重要。本文提出了在gem5仿真环境中实现UCIe协议(Universal Chiplet Interconnect Express),为基于UCIe的互连提供了一个全面的系统级建模框架。提议的UCIe链路模型支持全栈协议仿真,包括事务级处理、死对死适应和物理层交互。引入了基于flit的封装机制以实现准确的传输建模,并引入了基于Ack/ nak的重试机制以确保稳健的数据完整性。采用了优化的事件驱动调度策略来解决传统UCIe模拟中观察到的性能瓶颈。通过对比不同计算负荷下的时延与理论值,实验验证了UCIe链路模型的准确性和效率。结果表明,该模型与理论预测非常吻合,大多数情况下延迟偏差小于0.5%。此外,UCIe互连、PCIe互连和直接内存传输之间的性能比较表明,UCIe只会产生很小的协议开销(小于0.7%),使其成为多芯片组互连的实用且可扩展的解决方案。所提出的UCIe仿真框架为芯片系统设计提供了高精度的虚拟验证平台,实现了低成本、高精度的性能评估,并为未来大规模芯片架构的优化奠定了基础。
{"title":"Efficient Die-to-Die Communication: UCIe Link Simulation and Optimization in a Chiplet-Based System","authors":"Kunyue Li;Shuaipeng Li;Xiaoyan Li;Zizheng Dong;Sai Gao;Jialei Sun;Naifeng Jing;Qin Wang;Guanghui He;Jianfei Jiang","doi":"10.1109/JETCAS.2025.3590822","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3590822","url":null,"abstract":"With the increasing complexity of chiplet-based architectures and the growing adoption of heterogeneous computing, efficient and high-precision simulation frameworks are essential for evaluating chiplet interconnect performance. This paper presents an implementation of the UCIe protocol (Universal Chiplet Interconnect Express) within the gem5 simulation environment, providing a comprehensive system-level modeling framework for UCIe-based interconnects. The proposed UCIe link model supports full-stack protocol simulation, including transaction-level processing, die-to-die adaptation, and physical layer interactions. A flit-based packing mechanism is introduced to enable accurate transmission modeling, and an Ack/Nak-based retry mechanism is introduced to ensure robust data integrity. An optimized event-driven scheduling strategy is incorporated to address performance bottlenecks observed in traditional UCIe simulations. Experimental evaluations validate the accuracy and efficiency of the proposed UCIe link model by comparing its latency with theoretical values under different computational workloads. The results show that the model closely matches the theoretical predictions, with a latency deviation of less than 0.5% in most cases. Additionally, performance comparisons between UCIe interconnect, PCIe interconnect, and direct memory transfer reveal that UCIe incurs only a minor protocol overhead (less than 0.7%), making it a practical and scalable solution for multi-chipset interconnects. The proposed UCIe simulation framework provides a high-precision virtual verification platform for chiplet system design, enabling low-cost, high-accuracy performance evaluations, and lays the foundation for future optimizations in large-scale chiplet-based architectures.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"599-608"},"PeriodicalIF":3.8,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Implementation of Delay Testable Boundary Scan and Post-Bond Test Results in a 3D IC 三维集成电路中延迟可测试边界扫描和键后测试结果的实现
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-22 DOI: 10.1109/JETCAS.2025.3591617
Hiroyuki Yotsuyanagi;Keigo Takami;Masaki Hashizume
A defective through-silicon via (TSV) may cause a small delay fault that is difficult to detect using conventional logic testing methods. Testing TSVs used for chip-to-chip interconnection in 3D stacked ICs is a challenging problem. We have proposed a delay testable boundary scan design that has an embedded time-to-digital converter that can measure the timing slack between the test clock and an incoming signal through a TSV. A prototype 3D stacked IC with this delay testable circuit was fabricated using TSVs of various diameters. The measurement results show that the proposed delay testable boundary scan can effectively identify both logic errors that occurred in TSVs with open defects due to a small diameter and outliers in delay through a TSV that have no logic errors.
一个缺陷的硅通孔(TSV)可能会导致一个小的延迟故障,这是很难用传统的逻辑测试方法检测到的。测试用于3D堆叠集成电路中芯片对芯片互连的tsv是一个具有挑战性的问题。我们提出了一种可测试延迟的边界扫描设计,该设计具有嵌入式时间-数字转换器,可以通过TSV测量测试时钟和输入信号之间的时间松弛。利用不同直径的tsv制作了具有该延迟测试电路的3D堆叠集成电路原型。测量结果表明,所提出的延迟可测试边界扫描既能有效识别出由于直径小而存在开放缺陷的TSV中的逻辑错误,也能通过无逻辑错误的TSV识别出延迟异常值。
{"title":"An Implementation of Delay Testable Boundary Scan and Post-Bond Test Results in a 3D IC","authors":"Hiroyuki Yotsuyanagi;Keigo Takami;Masaki Hashizume","doi":"10.1109/JETCAS.2025.3591617","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3591617","url":null,"abstract":"A defective through-silicon via (TSV) may cause a small delay fault that is difficult to detect using conventional logic testing methods. Testing TSVs used for chip-to-chip interconnection in 3D stacked ICs is a challenging problem. We have proposed a delay testable boundary scan design that has an embedded time-to-digital converter that can measure the timing slack between the test clock and an incoming signal through a TSV. A prototype 3D stacked IC with this delay testable circuit was fabricated using TSVs of various diameters. The measurement results show that the proposed delay testable boundary scan can effectively identify both logic errors that occurred in TSVs with open defects due to a small diameter and outliers in delay through a TSV that have no logic errors.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"469-477"},"PeriodicalIF":3.8,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chiplets Interface Protocol (ChIP) for Ultra-Large-Scale Applications 超大规模应用的芯片接口协议(ChIP)
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-22 DOI: 10.1109/JETCAS.2025.3591559
Arvin Delavari;Boris Vaisband
As computational workloads continue to grow, heterogeneous integration of chiplet-based systems is becoming critically important for data-intensive applications such as high-performance computing, large language models, and artificial intelligence. Scaling to ultra-large-scale (ULS) systems introduces, however, significant communication challenges due to the limitations of network architectures and packaging technologies. Efficient data transfer across a network of thousands of chiplets remains a critical bottleneck. A robust, low-latency, area- and energy-efficient communication architecture for ULS system named Chiplet Interface Protocol (ChIP) is proposed in this work. ChIP supports burst transfers and out-of-order transactions while leveraging the simple universal parallel interface for chips (SuperCHIPS)—a simple area- and energy-efficient streaming channel at the physical layer. Evaluated on a wafer-scale platform, ChIP was compared to state-of-the-art (SOTA) chiplet-based interfaces, including LIPINCON, BoW, UCIe, and AIB, in performance, hardware efficiency, and unified signaling figures of merit. From the comparison results, ChIP significantly outperforms the SOTA alternatives ( $5.53times $ better) in bandwidth per shoreline, reaching 2.2 Tbps/mm in pipelined mode and up to 7.3 Tbps/mm in burst transactions. In addition, the transceiver area per link in ChIP is $485~mu $ m2—46.1% smaller than the best SOTA alternative—while achieving 0.38–0.53 pJ/bit energy and 1 ns latency in 45 nm CMOS over a 0.5 mm link, with efficiency sustained across longer channels and varied packaging due to minimal handshaking and optimized point-to-point specifications. The performance of ChIP is evaluated across multiple network configurations on a fine-pitch integration platform, and also for a customized hybrid topology, referred to as the network on interconnect fabric (NoIF), that is introduced and analyzed in this work. The architecture of the NoIF forms the foundation for ULS computing platforms, delivering exceptional results as compared to SOTA solutions. The superior hardware efficiency and advanced inter-chiplet communication features of ChIP position this proposed protocol as an ideal candidate for chiplet communication in ULS architectures.
随着计算工作负载的持续增长,基于芯片的系统的异构集成对于高性能计算、大型语言模型和人工智能等数据密集型应用程序变得至关重要。然而,由于网络架构和封装技术的限制,扩展到超大规模(ULS)系统带来了重大的通信挑战。在数千个小芯片的网络中高效的数据传输仍然是一个关键的瓶颈。本文提出了一种用于ULS系统的稳健、低延迟、区域和节能的通信架构——芯片接口协议(ChIP)。ChIP支持突发传输和乱序交易,同时利用芯片的简单通用并行接口(SuperCHIPS) -一种简单的区域和节能的物理层流通道。在晶圆级平台上进行评估,ChIP与最先进的(SOTA)基于芯片的接口(包括LIPINCON、BoW、UCIe和AIB)在性能、硬件效率和统一信令方面进行了比较。从比较结果来看,ChIP在每海岸线带宽方面明显优于SOTA替代方案(5.53倍),在流水线模式下达到2.2 Tbps/mm,在突发事务中高达7.3 Tbps/mm。此外,ChIP中每个链路的收发器面积比最佳SOTA替代品小46.1%,同时在0.5 mm链路上实现0.38-0.53 pJ/bit能量和1ns延迟,由于最小的握手和优化的点对点规格,在更长的通道和不同的封装中保持效率。在一个细间距集成平台上,对ChIP的性能进行了跨多种网络配置的评估,并对一种定制的混合拓扑进行了评估,该拓扑被称为互连结构网络(NoIF),在本工作中进行了介绍和分析。NoIF的体系结构构成了ULS计算平台的基础,与SOTA解决方案相比,它提供了卓越的结果。ChIP优越的硬件效率和先进的芯片间通信特性使该协议成为ULS架构中芯片通信的理想候选协议。
{"title":"Chiplets Interface Protocol (ChIP) for Ultra-Large-Scale Applications","authors":"Arvin Delavari;Boris Vaisband","doi":"10.1109/JETCAS.2025.3591559","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3591559","url":null,"abstract":"As computational workloads continue to grow, heterogeneous integration of chiplet-based systems is becoming critically important for data-intensive applications such as high-performance computing, large language models, and artificial intelligence. Scaling to ultra-large-scale (ULS) systems introduces, however, significant communication challenges due to the limitations of network architectures and packaging technologies. Efficient data transfer across a network of thousands of chiplets remains a critical bottleneck. A robust, low-latency, area- and energy-efficient communication architecture for ULS system named Chiplet Interface Protocol (ChIP) is proposed in this work. ChIP supports burst transfers and out-of-order transactions while leveraging the simple universal parallel interface for chips (SuperCHIPS)—a simple area- and energy-efficient streaming channel at the physical layer. Evaluated on a wafer-scale platform, ChIP was compared to state-of-the-art (SOTA) chiplet-based interfaces, including LIPINCON, BoW, UCIe, and AIB, in performance, hardware efficiency, and unified signaling figures of merit. From the comparison results, ChIP significantly outperforms the SOTA alternatives (<inline-formula> <tex-math>$5.53times $ </tex-math></inline-formula> better) in bandwidth per shoreline, reaching 2.2 Tbps/mm in pipelined mode and up to 7.3 Tbps/mm in burst transactions. In addition, the transceiver area per link in ChIP is <inline-formula> <tex-math>$485~mu $ </tex-math></inline-formula>m<sup>2</sup>—46.1% smaller than the best SOTA alternative—while achieving 0.38–0.53 pJ/bit energy and 1 ns latency in 45 nm CMOS over a 0.5 mm link, with efficiency sustained across longer channels and varied packaging due to minimal handshaking and optimized point-to-point specifications. The performance of ChIP is evaluated across multiple network configurations on a fine-pitch integration platform, and also for a customized hybrid topology, referred to as the network on interconnect fabric (NoIF), that is introduced and analyzed in this work. The architecture of the NoIF forms the foundation for ULS computing platforms, delivering exceptional results as compared to SOTA solutions. The superior hardware efficiency and advanced inter-chiplet communication features of ChIP position this proposed protocol as an ideal candidate for chiplet communication in ULS architectures.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"585-598"},"PeriodicalIF":3.8,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11088081","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Thermal Perspective Design and Analysis of Multi-Stacked Structures 多层叠结构的热透视设计与分析
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-21 DOI: 10.1109/JETCAS.2025.3590877
Tianjian Liu;Jie Wu;Zhen Chen;Shujuan Liu;Zhongkai Jiang;Haoyang Peng;Zhandi Yang;Xing Hu;Dong Xie;Fang Dong;Yiqun Wang;Sheng Liu
This paper investigates the thermal performance of multi-layer metal interconnects in three-dimensional (3D) stacked structures through finite element analysis (FEA). The 3D integrated circuit (3D IC) consists of five vertically stacked chips. The interconnections between the chips are achieved through through-silicon vias (TSVs), metal redistribution layers (RDLs), and hybrid bonding. Due to the complexity of the 3D IC structure, this work simplifies the detailed 3D IC model by employing equivalent models for each chip layer and the hybrid bonding structure. The study reveals that the portion of Cu significantly affects the thermal conductivity of the hybrid bonding structure, exhibiting quasi-linear dependence. Additionally, the misalignment between the upper and lower Cu pads decreases the thermal conductivity of the structure. Furthermore, equivalent models for different chip layers, including metal interconnect layers and TSVs, are constructed based on specific cases, and the equivalent thermal conductivities are extracted accordingly. Based on the equivalent results of each layer, the thermal conductivity of the complex 3D IC structure is ultimately determined. This work provides valuable results and guidance for the thermal design and practice of 3D IC.
本文采用有限元分析方法研究了多层金属互连体在三维堆叠结构中的热性能。3D集成电路(3D IC)由五个垂直堆叠的芯片组成。芯片之间的互连是通过硅通孔(tsv)、金属再分配层(rdl)和混合键合实现的。由于三维集成电路结构的复杂性,本工作通过对每个芯片层和混合键合结构采用等效模型来简化详细的三维集成电路模型。研究表明,Cu的含量对杂化键合结构的导热系数有显著影响,表现为准线性关系。此外,上下铜衬垫之间的错位降低了结构的导热性。在此基础上,结合具体案例构建了金属互连层和tsv等不同芯片层的等效模型,并提取了等效导热系数。基于各层的等效结果,最终确定了复杂三维集成电路结构的导热系数。本工作对三维集成电路的热设计和实践具有重要的指导意义。
{"title":"Thermal Perspective Design and Analysis of Multi-Stacked Structures","authors":"Tianjian Liu;Jie Wu;Zhen Chen;Shujuan Liu;Zhongkai Jiang;Haoyang Peng;Zhandi Yang;Xing Hu;Dong Xie;Fang Dong;Yiqun Wang;Sheng Liu","doi":"10.1109/JETCAS.2025.3590877","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3590877","url":null,"abstract":"This paper investigates the thermal performance of multi-layer metal interconnects in three-dimensional (3D) stacked structures through finite element analysis (FEA). The 3D integrated circuit (3D IC) consists of five vertically stacked chips. The interconnections between the chips are achieved through through-silicon vias (TSVs), metal redistribution layers (RDLs), and hybrid bonding. Due to the complexity of the 3D IC structure, this work simplifies the detailed 3D IC model by employing equivalent models for each chip layer and the hybrid bonding structure. The study reveals that the portion of Cu significantly affects the thermal conductivity of the hybrid bonding structure, exhibiting quasi-linear dependence. Additionally, the misalignment between the upper and lower Cu pads decreases the thermal conductivity of the structure. Furthermore, equivalent models for different chip layers, including metal interconnect layers and TSVs, are constructed based on specific cases, and the equivalent thermal conductivities are extracted accordingly. Based on the equivalent results of each layer, the thermal conductivity of the complex 3D IC structure is ultimately determined. This work provides valuable results and guidance for the thermal design and practice of 3D IC.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"438-444"},"PeriodicalIF":3.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BUP: A Deadlock Resolution Framework for 2.5D Chiplet Networks by Update Packet Bypassing 基于更新包旁路的2.5D芯片网络死锁解决框架
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-21 DOI: 10.1109/JETCAS.2025.3590870
Tiejun Li;Jianmin Zhang;Yi Yang;Yan Sun
Interposer-based 2.5D integration, as a promising packaging technology, is extensively applied in chiplet-based systems. However, even if both the interposer and chiplets are deadlock-free, new deadlocks may still occur across them after integration. To address these, we have proposed a deadlock resolution framework called Boundary Update Packet (BUP) for 2.5D-chiplet systems. BUP keeps path diversity for inter-chiplet packets routing and supports each chiplet can be independently designed. BUP ensures the freedom of routing design while simultaneously supporting fault-tolerant routing. The experimental results have shown that as compared with previous deadlock-free designs in 2.5D-chiplet systems, BUP achieves an average performance improvement of 13% under synthetic traffic; While under real-application workloads, BUP provides an average runtime speedup of 3.5% with an area overhead of less than 6%.
基于interposer的2.5D集成作为一种很有前途的封装技术,在基于芯片的系统中得到了广泛的应用。然而,即使中介程序和小程序都没有死锁,在集成之后,它们之间仍然可能出现新的死锁。为了解决这些问题,我们提出了一个名为边界更新包(BUP)的死锁解决框架,用于2.5 d芯片系统。BUP保持了分片间报文路由的路径多样性,支持每个分片可以独立设计。BUP保证了路由设计的自由,同时支持容错路由。实验结果表明,与以前的无死锁设计相比,在合成流量下,BUP系统的平均性能提高了13%;而在实际应用程序工作负载下,BUP提供了3.5%的平均运行时加速,而面积开销不到6%。
{"title":"BUP: A Deadlock Resolution Framework for 2.5D Chiplet Networks by Update Packet Bypassing","authors":"Tiejun Li;Jianmin Zhang;Yi Yang;Yan Sun","doi":"10.1109/JETCAS.2025.3590870","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3590870","url":null,"abstract":"Interposer-based 2.5D integration, as a promising packaging technology, is extensively applied in chiplet-based systems. However, even if both the interposer and chiplets are deadlock-free, new deadlocks may still occur across them after integration. To address these, we have proposed a deadlock resolution framework called Boundary Update Packet (BUP) for 2.5D-chiplet systems. BUP keeps path diversity for inter-chiplet packets routing and supports each chiplet can be independently designed. BUP ensures the freedom of routing design while simultaneously supporting fault-tolerant routing. The experimental results have shown that as compared with previous deadlock-free designs in 2.5D-chiplet systems, BUP achieves an average performance improvement of 13% under synthetic traffic; While under real-application workloads, BUP provides an average runtime speedup of 3.5% with an area overhead of less than 6%.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"537-545"},"PeriodicalIF":3.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneous Integration in Co-Packaged Optics 共封装光学器件中的异质集成
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-21 DOI: 10.1109/JETCAS.2025.3590744
Yu-Tao Yang;Chih-Ming Hung
Generative artificial intelligence (GAI) and Large Language Model (LLM) require data center to have higher bandwidth, and better energy efficiency. To achieve this, Co-packaged optics (CPO) is one of the future directions that leverages advanced packaging with integrated photonics. However, this tight integration complicates data center system design and multi-physics interactions, including electrical, optical, thermal, mechanical, and material aspects. In this paper, heterogeneous integration (HI) in CPO is discussed. Multi-physics packaging is exemplified with two cases. Challenges in HI technologies are reviewed and corresponding mitigation methods are provided, including 1) thermal crosstalk within the electrical domain and between the electrical and the optical interaction, 2) SIPI of wide-and-slow and narrow-and-fast channel links, and 3) pros and cons of interposer material. Integrated photonics part is introduced and is composed of 1) light sources, 2) optical coupling strategies, 3) fiber attach schemes with advanced packaging, and 4) integrated optical technologies, e.g. novel microlens, optical TSV, 3D waveguide, and optical 3DIC. This article aims to identify the key HI challenges in CPO and points out the potential solutions for future CPO system advancement.
生成式人工智能(GAI)和大语言模型(LLM)要求数据中心具有更高的带宽和更高的能源效率。为了实现这一目标,共封装光学(CPO)是利用先进封装与集成光子学的未来方向之一。然而,这种紧密集成使数据中心系统设计和多物理场交互变得复杂,包括电气、光学、热、机械和材料方面。本文讨论了CPO中的异构集成问题。以两个案例为例说明了多物理场封装。回顾了HI技术面临的挑战,并提供了相应的缓解方法,包括1)电域中以及电与光相互作用之间的热串扰,2)宽与慢和窄与快通道链路的SIPI,以及3)中间体材料的优缺点。集成光子学部分主要由1)光源、2)光耦合策略、3)封装先进的光纤连接方案以及4)新型微透镜、光学TSV、三维波导、光学3DIC等集成光学技术组成。本文旨在确定CPO中的关键HI挑战,并指出未来CPO系统发展的潜在解决方案。
{"title":"Heterogeneous Integration in Co-Packaged Optics","authors":"Yu-Tao Yang;Chih-Ming Hung","doi":"10.1109/JETCAS.2025.3590744","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3590744","url":null,"abstract":"Generative artificial intelligence (GAI) and Large Language Model (LLM) require data center to have higher bandwidth, and better energy efficiency. To achieve this, Co-packaged optics (CPO) is one of the future directions that leverages advanced packaging with integrated photonics. However, this tight integration complicates data center system design and multi-physics interactions, including electrical, optical, thermal, mechanical, and material aspects. In this paper, heterogeneous integration (HI) in CPO is discussed. Multi-physics packaging is exemplified with two cases. Challenges in HI technologies are reviewed and corresponding mitigation methods are provided, including 1) thermal crosstalk within the electrical domain and between the electrical and the optical interaction, 2) SIPI of wide-and-slow and narrow-and-fast channel links, and 3) pros and cons of interposer material. Integrated photonics part is introduced and is composed of 1) light sources, 2) optical coupling strategies, 3) fiber attach schemes with advanced packaging, and 4) integrated optical technologies, e.g. novel microlens, optical TSV, 3D waveguide, and optical 3DIC. This article aims to identify the key HI challenges in CPO and points out the potential solutions for future CPO system advancement.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"427-437"},"PeriodicalIF":3.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intermetallic Compounds (IMCs) Growth Investigation, Kinetic Parameter Analysis and Reliability Evaluation of In Solder Metal for 3D Integration Packaging 三维集成封装中金属钎料金属间化合物(IMCs)生长研究、动力学参数分析及可靠性评价
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-21 DOI: 10.1109/JETCAS.2025.3591363
Tassawar Hussain;Jaber Derakhshandeh;Tom Cochet;Ehsan Shafahian;Prathamesh Dhakras;Aksel Göhnermeier;Eric Beyne;Ingrid De Wolf
The increasing demand for higher functional density in microelectronics necessitates the miniaturization of interconnects in 3D integration, which presents challenges in processing and reliability. During fabrication and service life, interconnect microbumps remain in a non-equilibrium state, leading to interfacial reactions and atomic diffusion that drive intermetallic compounds (IMCs) growth and phase transformations, impacting the electrical, thermal, and mechanical properties, and affecting long-term reliability. With global restrictions on Pb-based solders, indium (In) has emerged as a viable low-melting-point alternative, especially for temperature-sensitive packaging. Understanding IMCs kinetics in In-based systems is essential for optimizing reliability. This study investigates the kinetics and phase transformation of IMCs in Ni/In and Cu/In systems under solid-state aging conditions using an in-situ resistance measurement technique. The approach overcomes the limitations of traditional scanning electron microscopy (SEM)-based analysis by enabling continuous monitoring of IMCs growth. The Ni/In system forms Ni3In7 through a reaction-controlled mechanism with an activation energy of $108~pm ~30$ kJ/mol. In the Cu/In system, CuIn2 is formed at room temperature that undergoes a phase transformation to Cu11In9 via a peritectoid reaction above $107.5~^{circ }$ C of iso-thermal aging. The transformation shifts from a reaction-diffusion mixed controlled regime at $110~^{circ }$ C (n $approx ~0.73$ ) to diffusion control between 120- $140~^{circ }$ C (n $approx ~0.45$ –0.62), and possibly to grain-boundary diffusion at $150~^{circ }$ C (n $approx ~0.19$ ). The activation energy for CuIn ${}_{2} to $ Cu11In9 transformation is $196~pm ~82$ kJ/mol, indicating a higher energy barrier. These findings contribute to the development of low-temperature bonding techniques and fine-pitch interconnect optimization for future microelectronics packaging.
微电子技术对高功能密度的需求日益增长,使得三维集成互连的小型化成为必然,这在加工和可靠性方面提出了挑战。在制造和使用寿命期间,互连微凸点保持在非平衡状态,导致界面反应和原子扩散,从而驱动金属间化合物(IMCs)的生长和相变,影响电学、热学和机械性能,并影响长期可靠性。随着全球对铅基焊料的限制,铟(In)已经成为一种可行的低熔点替代品,特别是对于温度敏感的包装。了解基于in的系统中的IMCs动力学对于优化可靠性至关重要。本研究利用原位电阻测量技术研究了Ni/ in和Cu/ in体系中IMCs在固态时效条件下的动力学和相变。该方法克服了传统的基于扫描电子显微镜(SEM)分析的局限性,实现了对IMCs生长的连续监测。Ni/In体系通过反应控制机制生成Ni3In7,活化能为$108~ $ pm ~ $ 30 kJ/mol。在Cu/In体系中,Cu/In在室温下形成CuIn2,并通过等温时效107.5~^{circ}$ C以上的准晶反应转变为Cu11In9。从$110~ $ {circ}$ C (n $约~0.73$)的反应-扩散混合控制转变为$ 120 ~ $140~ $ ^{circ}$ C (n $约~0.45$ ~ 0.62 $)的扩散控制,并可能转变为$150~ $ {circ}$ C (n $约~0.19$)的晶界扩散。CuIn ${}_{2} 到$ Cu11In9的转变活化能为$196~ $ pm ~ $ 82$ kJ/mol,表明存在较高的能垒。这些发现有助于低温键合技术的发展和未来微电子封装的细间距互连优化。
{"title":"Intermetallic Compounds (IMCs) Growth Investigation, Kinetic Parameter Analysis and Reliability Evaluation of In Solder Metal for 3D Integration Packaging","authors":"Tassawar Hussain;Jaber Derakhshandeh;Tom Cochet;Ehsan Shafahian;Prathamesh Dhakras;Aksel Göhnermeier;Eric Beyne;Ingrid De Wolf","doi":"10.1109/JETCAS.2025.3591363","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3591363","url":null,"abstract":"The increasing demand for higher functional density in microelectronics necessitates the miniaturization of interconnects in 3D integration, which presents challenges in processing and reliability. During fabrication and service life, interconnect microbumps remain in a non-equilibrium state, leading to interfacial reactions and atomic diffusion that drive intermetallic compounds (IMCs) growth and phase transformations, impacting the electrical, thermal, and mechanical properties, and affecting long-term reliability. With global restrictions on Pb-based solders, indium (In) has emerged as a viable low-melting-point alternative, especially for temperature-sensitive packaging. Understanding IMCs kinetics in In-based systems is essential for optimizing reliability. This study investigates the kinetics and phase transformation of IMCs in Ni/In and Cu/In systems under solid-state aging conditions using an in-situ resistance measurement technique. The approach overcomes the limitations of traditional scanning electron microscopy (SEM)-based analysis by enabling continuous monitoring of IMCs growth. The Ni/In system forms Ni<sub>3</sub>In<sub>7</sub> through a reaction-controlled mechanism with an activation energy of <inline-formula> <tex-math>$108~pm ~30$ </tex-math></inline-formula> kJ/mol. In the Cu/In system, CuIn<sub>2</sub> is formed at room temperature that undergoes a phase transformation to Cu<sub>11</sub>In<sub>9</sub> via a peritectoid reaction above <inline-formula> <tex-math>$107.5~^{circ }$ </tex-math></inline-formula>C of iso-thermal aging. The transformation shifts from a reaction-diffusion mixed controlled regime at <inline-formula> <tex-math>$110~^{circ }$ </tex-math></inline-formula>C (n <inline-formula> <tex-math>$approx ~0.73$ </tex-math></inline-formula>) to diffusion control between 120-<inline-formula> <tex-math>$140~^{circ }$ </tex-math></inline-formula>C (n <inline-formula> <tex-math>$approx ~0.45$ </tex-math></inline-formula>–0.62), and possibly to grain-boundary diffusion at <inline-formula> <tex-math>$150~^{circ }$ </tex-math></inline-formula>C (n <inline-formula> <tex-math>$approx ~0.19$ </tex-math></inline-formula>). The activation energy for CuIn<inline-formula> <tex-math>${}_{2} to $ </tex-math></inline-formula> Cu<sub>11</sub>In<sub>9</sub> transformation is <inline-formula> <tex-math>$196~pm ~82$ </tex-math></inline-formula> kJ/mol, indicating a higher energy barrier. These findings contribute to the development of low-temperature bonding techniques and fine-pitch interconnect optimization for future microelectronics packaging.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"392-403"},"PeriodicalIF":3.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Depth-First: A Deterministic and Scalable NoC Routing Protocol for 3.5D Packaged Architectures 深度优先:3.5D封装架构的确定性和可扩展NoC路由协议
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-17 DOI: 10.1109/JETCAS.2025.3590106
Davy Million;César Fuguet;Adrian Evans;Rim El Cheikh;Alireza Monemi;Jonathan Balkind;Frédéric Pétrot
New high-volume commercial products combine 2.5D silicon-interposer based assemblies with 3D monolithic stacks of chiplets. This combination is called 3.5D packaging and makes it possible to assemble dense compute solutions. Components communicate via a Network-On-Chip, but current solutions do not support 3.5D Network-On-Chip topologies. To this end, this work proposes Depth-First, the first Deterministic, Virtual Channel based, Network-On-Chip routing protocol supporting 3.5D network topologies. The protocol prevents deadlocks using additional Virtual Channels only in the upper chiplets, while imposing no VC constraints on the base interposer. Depth-First also features an efficient node naming scheme, enabling highly compact routing tables. Since vertical links must be assigned to routers, we present a Mixed-Integer Linear Programming formulation that greatly speeds up execution time compared to a reference implementation from prior work, which was based on an exhaustive search. We formally prove that the protocol is deadlock-free, study its performance using an open-source cycle-accurate simulator, and compare it with other protocols (on a comparable topology). A partial implementation of Depth-First in an open-source router results in a small 4.9% area impact (7nm process) compared to an implementation without our routing algorithm.
新的大批量商业产品结合了基于2.5D硅中间层的组件和3D单片芯片堆栈。这种组合被称为3.5D封装,使组装密集计算解决方案成为可能。组件通过片上网络进行通信,但目前的解决方案不支持3.5D片上网络拓扑。为此,本工作提出深度优先,这是第一个支持3.5D网络拓扑的确定性,基于虚拟通道的片上网络路由协议。该协议仅在上层小芯片中使用额外的虚拟通道来防止死锁,同时对基本中间层不施加VC约束。Depth-First还具有高效的节点命名方案,支持高度紧凑的路由表。由于垂直链路必须分配给路由器,我们提出了一个混合整数线性规划公式,与之前基于穷举搜索的参考实现相比,它大大加快了执行时间。我们正式证明了该协议无死锁,使用开源周期精确模拟器研究了其性能,并将其与其他协议(在可比拓扑上)进行了比较。与没有我们的路由算法的实现相比,在开源路由器中部分实现深度优先导致4.9%的面积影响(7nm工艺)。
{"title":"Depth-First: A Deterministic and Scalable NoC Routing Protocol for 3.5D Packaged Architectures","authors":"Davy Million;César Fuguet;Adrian Evans;Rim El Cheikh;Alireza Monemi;Jonathan Balkind;Frédéric Pétrot","doi":"10.1109/JETCAS.2025.3590106","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3590106","url":null,"abstract":"New high-volume commercial products combine 2.5D silicon-interposer based assemblies with 3D monolithic stacks of chiplets. This combination is called 3.5D packaging and makes it possible to assemble dense compute solutions. Components communicate via a Network-On-Chip, but current solutions do not support 3.5D Network-On-Chip topologies. To this end, this work proposes Depth-First, the first Deterministic, Virtual Channel based, Network-On-Chip routing protocol supporting 3.5D network topologies. The protocol prevents deadlocks using additional Virtual Channels only in the upper chiplets, while imposing no VC constraints on the base interposer. Depth-First also features an efficient node naming scheme, enabling highly compact routing tables. Since vertical links must be assigned to routers, we present a Mixed-Integer Linear Programming formulation that greatly speeds up execution time compared to a reference implementation from prior work, which was based on an exhaustive search. We formally prove that the protocol is deadlock-free, study its performance using an open-source cycle-accurate simulator, and compare it with other protocols (on a comparable topology). A partial implementation of Depth-First in an open-source router results in a small 4.9% area impact (7nm process) compared to an implementation without our routing algorithm.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"546-559"},"PeriodicalIF":3.8,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1