Pub Date : 2025-07-22DOI: 10.1109/JETCAS.2025.3591727
Mohamed Naeim;Dwaipayan Biswas;Yun Dai;Odysseas Zografos;Herman Oprins;Geert Van der Plas;C. T. Kao;Pinhong Chen;Dragomir Milojevic
Continuous scaling of integrated circuits and the adoption of 3D integration have significantly increased power density, creating critical challenges for thermal and power integrity. Rising power densities can trigger thermal runaway, negatively impacting device reliability and performance. This paper presents an electrothermal coupling framework built upon commercial Electronic Design Automation (EDA) tools, designed for precise iterative analysis of power, thermal and IR-drop characteristics in 2D and 3D configurations of a many-core RISCV SoC. The proposed framework automates iterative Power-Temperature (P-T) simulations to evaluate thermal convergence and potential thermal runaway scenarios in Memory-on-Logic (MoL) and Logic-on-Memory (LoM) stacked configurations. It identifies thermal hotspots, tracks their progression and evaluates various cooling strategies. Initial results indicate that the first P-T iteration provides up to 10% power savings in 3D due to a 11% wirelength reduction compared to 2D. However, by the fifth iteration, MoL power saving reduces to 4%, whereas LoM maintains the 10% saving. LoM exhibits a 6°C lower peak temperature compared to MoL under equivalent cooling conditions. Compared to 2D, the range of power densities (110 - $260~W/cm^{2}$ ) results in temperature variations of $-1^{circ } C$ to $+3^{circ } C$ in LoM. A $10^{circ } C$ rise in temperature increases IR-drop by 11%; however, physical design-aware adjustments, such as a tighter Power Delivery Network (PDN) pitch, effectively reducing IR-drop by 54%, mitigating thermal impacts.
{"title":"Iterative Layout-Aware Power, Thermal, and IR-Drop Co-Optimization: Ensuring Convergency in 3D-ICs","authors":"Mohamed Naeim;Dwaipayan Biswas;Yun Dai;Odysseas Zografos;Herman Oprins;Geert Van der Plas;C. T. Kao;Pinhong Chen;Dragomir Milojevic","doi":"10.1109/JETCAS.2025.3591727","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3591727","url":null,"abstract":"Continuous scaling of integrated circuits and the adoption of 3D integration have significantly increased power density, creating critical challenges for thermal and power integrity. Rising power densities can trigger thermal runaway, negatively impacting device reliability and performance. This paper presents an electrothermal coupling framework built upon commercial Electronic Design Automation (EDA) tools, designed for precise iterative analysis of power, thermal and IR-drop characteristics in 2D and 3D configurations of a many-core RISCV SoC. The proposed framework automates iterative Power-Temperature (P-T) simulations to evaluate thermal convergence and potential thermal runaway scenarios in Memory-on-Logic (MoL) and Logic-on-Memory (LoM) stacked configurations. It identifies thermal hotspots, tracks their progression and evaluates various cooling strategies. Initial results indicate that the first P-T iteration provides up to 10% power savings in 3D due to a 11% wirelength reduction compared to 2D. However, by the fifth iteration, MoL power saving reduces to 4%, whereas LoM maintains the 10% saving. LoM exhibits a 6°C lower peak temperature compared to MoL under equivalent cooling conditions. Compared to 2D, the range of power densities (110 - <inline-formula> <tex-math>$260~W/cm^{2}$ </tex-math></inline-formula>) results in temperature variations of <inline-formula> <tex-math>$-1^{circ } C$ </tex-math></inline-formula> to <inline-formula> <tex-math>$+3^{circ } C$ </tex-math></inline-formula> in LoM. A <inline-formula> <tex-math>$10^{circ } C$ </tex-math></inline-formula> rise in temperature increases IR-drop by 11%; however, physical design-aware adjustments, such as a tighter Power Delivery Network (PDN) pitch, effectively reducing IR-drop by 54%, mitigating thermal impacts.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"648-658"},"PeriodicalIF":3.8,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a novel technology called Bumpless Build Cube (BBCubeTM) 3D for AI and high-performance computing (HPC) applications that require high bandwidth and power efficiency. BBCube 3D is constructed through heterogeneous 3D integration, in which xPU (e.g., CPU, GPU, TPU) chiplets and DRAM dies are stacked using a combination of bumpless wafer-on-wafer (WoW) and chip-on-wafer (CoW) processes. The bumpless stacking process adopts a method similar to multilevel metallization in the back-end-of-line (BEOL), enabling BBCube to provide reliable and high-density interconnects between dies. Moreover, BBCube features low-capacitance and low-impedance through-silicon vias (TSVs) due to the use of thin silicon and slim TSV structures. To further enhance performance, a highly parallel DRAM architecture leveraging the bumpless WoW process is introduced. The high-density TSVs enable lower data transmission speeds without compromising bandwidth. Additionally, the adoption of four-phase shielded I/Os (FPS-I/O) allows for a reduction in power supply voltage. BBCube 3D has the potential to achieve a bandwidth 30 times higher than DDR5 and four times higher than HBM2E, while achieving bit access energy consumption reduced to one-twentieth that of DDR5 and one-fifth that of HBM2E. The low-impedance TSVs in BBCube ensure robust power integrity for the xPU stacked on top of the layered DRAM. Furthermore, integrating an xPU on top of the Cube enables efficient cooling of high-power xPUs. BBCube can accommodate an xPU with a power density exceeding 50 W/cm2 — comparable to the latest GPUs — while maintaining the DRAM temperature below 95°C.
{"title":"Bumpless Build Cube (BBCube) 3D: Heterogeneous 3D Integration Using WOW and COW","authors":"Norio Chujo;Hiroyuki Ryoson;Koji Sakui;Shinji Sugatani;Masao Taguchi;Takayuki Ohba","doi":"10.1109/JETCAS.2025.3591677","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3591677","url":null,"abstract":"We propose a novel technology called Bumpless Build Cube (BBCubeTM) 3D for AI and high-performance computing (HPC) applications that require high bandwidth and power efficiency. BBCube 3D is constructed through heterogeneous 3D integration, in which xPU (e.g., CPU, GPU, TPU) chiplets and DRAM dies are stacked using a combination of bumpless wafer-on-wafer (WoW) and chip-on-wafer (CoW) processes. The bumpless stacking process adopts a method similar to multilevel metallization in the back-end-of-line (BEOL), enabling BBCube to provide reliable and high-density interconnects between dies. Moreover, BBCube features low-capacitance and low-impedance through-silicon vias (TSVs) due to the use of thin silicon and slim TSV structures. To further enhance performance, a highly parallel DRAM architecture leveraging the bumpless WoW process is introduced. The high-density TSVs enable lower data transmission speeds without compromising bandwidth. Additionally, the adoption of four-phase shielded I/Os (FPS-I/O) allows for a reduction in power supply voltage. BBCube 3D has the potential to achieve a bandwidth 30 times higher than DDR5 and four times higher than HBM2E, while achieving bit access energy consumption reduced to one-twentieth that of DDR5 and one-fifth that of HBM2E. The low-impedance TSVs in BBCube ensure robust power integrity for the xPU stacked on top of the layered DRAM. Furthermore, integrating an xPU on top of the Cube enables efficient cooling of high-power xPUs. BBCube can accommodate an xPU with a power density exceeding 50 W/cm<sup>2</sup> — comparable to the latest GPUs — while maintaining the DRAM temperature below 95°C.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"404-414"},"PeriodicalIF":3.8,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing complexity of chiplet-based architectures and the growing adoption of heterogeneous computing, efficient and high-precision simulation frameworks are essential for evaluating chiplet interconnect performance. This paper presents an implementation of the UCIe protocol (Universal Chiplet Interconnect Express) within the gem5 simulation environment, providing a comprehensive system-level modeling framework for UCIe-based interconnects. The proposed UCIe link model supports full-stack protocol simulation, including transaction-level processing, die-to-die adaptation, and physical layer interactions. A flit-based packing mechanism is introduced to enable accurate transmission modeling, and an Ack/Nak-based retry mechanism is introduced to ensure robust data integrity. An optimized event-driven scheduling strategy is incorporated to address performance bottlenecks observed in traditional UCIe simulations. Experimental evaluations validate the accuracy and efficiency of the proposed UCIe link model by comparing its latency with theoretical values under different computational workloads. The results show that the model closely matches the theoretical predictions, with a latency deviation of less than 0.5% in most cases. Additionally, performance comparisons between UCIe interconnect, PCIe interconnect, and direct memory transfer reveal that UCIe incurs only a minor protocol overhead (less than 0.7%), making it a practical and scalable solution for multi-chipset interconnects. The proposed UCIe simulation framework provides a high-precision virtual verification platform for chiplet system design, enabling low-cost, high-accuracy performance evaluations, and lays the foundation for future optimizations in large-scale chiplet-based architectures.
{"title":"Efficient Die-to-Die Communication: UCIe Link Simulation and Optimization in a Chiplet-Based System","authors":"Kunyue Li;Shuaipeng Li;Xiaoyan Li;Zizheng Dong;Sai Gao;Jialei Sun;Naifeng Jing;Qin Wang;Guanghui He;Jianfei Jiang","doi":"10.1109/JETCAS.2025.3590822","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3590822","url":null,"abstract":"With the increasing complexity of chiplet-based architectures and the growing adoption of heterogeneous computing, efficient and high-precision simulation frameworks are essential for evaluating chiplet interconnect performance. This paper presents an implementation of the UCIe protocol (Universal Chiplet Interconnect Express) within the gem5 simulation environment, providing a comprehensive system-level modeling framework for UCIe-based interconnects. The proposed UCIe link model supports full-stack protocol simulation, including transaction-level processing, die-to-die adaptation, and physical layer interactions. A flit-based packing mechanism is introduced to enable accurate transmission modeling, and an Ack/Nak-based retry mechanism is introduced to ensure robust data integrity. An optimized event-driven scheduling strategy is incorporated to address performance bottlenecks observed in traditional UCIe simulations. Experimental evaluations validate the accuracy and efficiency of the proposed UCIe link model by comparing its latency with theoretical values under different computational workloads. The results show that the model closely matches the theoretical predictions, with a latency deviation of less than 0.5% in most cases. Additionally, performance comparisons between UCIe interconnect, PCIe interconnect, and direct memory transfer reveal that UCIe incurs only a minor protocol overhead (less than 0.7%), making it a practical and scalable solution for multi-chipset interconnects. The proposed UCIe simulation framework provides a high-precision virtual verification platform for chiplet system design, enabling low-cost, high-accuracy performance evaluations, and lays the foundation for future optimizations in large-scale chiplet-based architectures.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"599-608"},"PeriodicalIF":3.8,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A defective through-silicon via (TSV) may cause a small delay fault that is difficult to detect using conventional logic testing methods. Testing TSVs used for chip-to-chip interconnection in 3D stacked ICs is a challenging problem. We have proposed a delay testable boundary scan design that has an embedded time-to-digital converter that can measure the timing slack between the test clock and an incoming signal through a TSV. A prototype 3D stacked IC with this delay testable circuit was fabricated using TSVs of various diameters. The measurement results show that the proposed delay testable boundary scan can effectively identify both logic errors that occurred in TSVs with open defects due to a small diameter and outliers in delay through a TSV that have no logic errors.
{"title":"An Implementation of Delay Testable Boundary Scan and Post-Bond Test Results in a 3D IC","authors":"Hiroyuki Yotsuyanagi;Keigo Takami;Masaki Hashizume","doi":"10.1109/JETCAS.2025.3591617","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3591617","url":null,"abstract":"A defective through-silicon via (TSV) may cause a small delay fault that is difficult to detect using conventional logic testing methods. Testing TSVs used for chip-to-chip interconnection in 3D stacked ICs is a challenging problem. We have proposed a delay testable boundary scan design that has an embedded time-to-digital converter that can measure the timing slack between the test clock and an incoming signal through a TSV. A prototype 3D stacked IC with this delay testable circuit was fabricated using TSVs of various diameters. The measurement results show that the proposed delay testable boundary scan can effectively identify both logic errors that occurred in TSVs with open defects due to a small diameter and outliers in delay through a TSV that have no logic errors.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"469-477"},"PeriodicalIF":3.8,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-22DOI: 10.1109/JETCAS.2025.3591559
Arvin Delavari;Boris Vaisband
As computational workloads continue to grow, heterogeneous integration of chiplet-based systems is becoming critically important for data-intensive applications such as high-performance computing, large language models, and artificial intelligence. Scaling to ultra-large-scale (ULS) systems introduces, however, significant communication challenges due to the limitations of network architectures and packaging technologies. Efficient data transfer across a network of thousands of chiplets remains a critical bottleneck. A robust, low-latency, area- and energy-efficient communication architecture for ULS system named Chiplet Interface Protocol (ChIP) is proposed in this work. ChIP supports burst transfers and out-of-order transactions while leveraging the simple universal parallel interface for chips (SuperCHIPS)—a simple area- and energy-efficient streaming channel at the physical layer. Evaluated on a wafer-scale platform, ChIP was compared to state-of-the-art (SOTA) chiplet-based interfaces, including LIPINCON, BoW, UCIe, and AIB, in performance, hardware efficiency, and unified signaling figures of merit. From the comparison results, ChIP significantly outperforms the SOTA alternatives ($5.53times $ better) in bandwidth per shoreline, reaching 2.2 Tbps/mm in pipelined mode and up to 7.3 Tbps/mm in burst transactions. In addition, the transceiver area per link in ChIP is $485~mu $ m2—46.1% smaller than the best SOTA alternative—while achieving 0.38–0.53 pJ/bit energy and 1 ns latency in 45 nm CMOS over a 0.5 mm link, with efficiency sustained across longer channels and varied packaging due to minimal handshaking and optimized point-to-point specifications. The performance of ChIP is evaluated across multiple network configurations on a fine-pitch integration platform, and also for a customized hybrid topology, referred to as the network on interconnect fabric (NoIF), that is introduced and analyzed in this work. The architecture of the NoIF forms the foundation for ULS computing platforms, delivering exceptional results as compared to SOTA solutions. The superior hardware efficiency and advanced inter-chiplet communication features of ChIP position this proposed protocol as an ideal candidate for chiplet communication in ULS architectures.
{"title":"Chiplets Interface Protocol (ChIP) for Ultra-Large-Scale Applications","authors":"Arvin Delavari;Boris Vaisband","doi":"10.1109/JETCAS.2025.3591559","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3591559","url":null,"abstract":"As computational workloads continue to grow, heterogeneous integration of chiplet-based systems is becoming critically important for data-intensive applications such as high-performance computing, large language models, and artificial intelligence. Scaling to ultra-large-scale (ULS) systems introduces, however, significant communication challenges due to the limitations of network architectures and packaging technologies. Efficient data transfer across a network of thousands of chiplets remains a critical bottleneck. A robust, low-latency, area- and energy-efficient communication architecture for ULS system named Chiplet Interface Protocol (ChIP) is proposed in this work. ChIP supports burst transfers and out-of-order transactions while leveraging the simple universal parallel interface for chips (SuperCHIPS)—a simple area- and energy-efficient streaming channel at the physical layer. Evaluated on a wafer-scale platform, ChIP was compared to state-of-the-art (SOTA) chiplet-based interfaces, including LIPINCON, BoW, UCIe, and AIB, in performance, hardware efficiency, and unified signaling figures of merit. From the comparison results, ChIP significantly outperforms the SOTA alternatives (<inline-formula> <tex-math>$5.53times $ </tex-math></inline-formula> better) in bandwidth per shoreline, reaching 2.2 Tbps/mm in pipelined mode and up to 7.3 Tbps/mm in burst transactions. In addition, the transceiver area per link in ChIP is <inline-formula> <tex-math>$485~mu $ </tex-math></inline-formula>m<sup>2</sup>—46.1% smaller than the best SOTA alternative—while achieving 0.38–0.53 pJ/bit energy and 1 ns latency in 45 nm CMOS over a 0.5 mm link, with efficiency sustained across longer channels and varied packaging due to minimal handshaking and optimized point-to-point specifications. The performance of ChIP is evaluated across multiple network configurations on a fine-pitch integration platform, and also for a customized hybrid topology, referred to as the network on interconnect fabric (NoIF), that is introduced and analyzed in this work. The architecture of the NoIF forms the foundation for ULS computing platforms, delivering exceptional results as compared to SOTA solutions. The superior hardware efficiency and advanced inter-chiplet communication features of ChIP position this proposed protocol as an ideal candidate for chiplet communication in ULS architectures.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"585-598"},"PeriodicalIF":3.8,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11088081","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the thermal performance of multi-layer metal interconnects in three-dimensional (3D) stacked structures through finite element analysis (FEA). The 3D integrated circuit (3D IC) consists of five vertically stacked chips. The interconnections between the chips are achieved through through-silicon vias (TSVs), metal redistribution layers (RDLs), and hybrid bonding. Due to the complexity of the 3D IC structure, this work simplifies the detailed 3D IC model by employing equivalent models for each chip layer and the hybrid bonding structure. The study reveals that the portion of Cu significantly affects the thermal conductivity of the hybrid bonding structure, exhibiting quasi-linear dependence. Additionally, the misalignment between the upper and lower Cu pads decreases the thermal conductivity of the structure. Furthermore, equivalent models for different chip layers, including metal interconnect layers and TSVs, are constructed based on specific cases, and the equivalent thermal conductivities are extracted accordingly. Based on the equivalent results of each layer, the thermal conductivity of the complex 3D IC structure is ultimately determined. This work provides valuable results and guidance for the thermal design and practice of 3D IC.
{"title":"Thermal Perspective Design and Analysis of Multi-Stacked Structures","authors":"Tianjian Liu;Jie Wu;Zhen Chen;Shujuan Liu;Zhongkai Jiang;Haoyang Peng;Zhandi Yang;Xing Hu;Dong Xie;Fang Dong;Yiqun Wang;Sheng Liu","doi":"10.1109/JETCAS.2025.3590877","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3590877","url":null,"abstract":"This paper investigates the thermal performance of multi-layer metal interconnects in three-dimensional (3D) stacked structures through finite element analysis (FEA). The 3D integrated circuit (3D IC) consists of five vertically stacked chips. The interconnections between the chips are achieved through through-silicon vias (TSVs), metal redistribution layers (RDLs), and hybrid bonding. Due to the complexity of the 3D IC structure, this work simplifies the detailed 3D IC model by employing equivalent models for each chip layer and the hybrid bonding structure. The study reveals that the portion of Cu significantly affects the thermal conductivity of the hybrid bonding structure, exhibiting quasi-linear dependence. Additionally, the misalignment between the upper and lower Cu pads decreases the thermal conductivity of the structure. Furthermore, equivalent models for different chip layers, including metal interconnect layers and TSVs, are constructed based on specific cases, and the equivalent thermal conductivities are extracted accordingly. Based on the equivalent results of each layer, the thermal conductivity of the complex 3D IC structure is ultimately determined. This work provides valuable results and guidance for the thermal design and practice of 3D IC.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"438-444"},"PeriodicalIF":3.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-21DOI: 10.1109/JETCAS.2025.3590870
Tiejun Li;Jianmin Zhang;Yi Yang;Yan Sun
Interposer-based 2.5D integration, as a promising packaging technology, is extensively applied in chiplet-based systems. However, even if both the interposer and chiplets are deadlock-free, new deadlocks may still occur across them after integration. To address these, we have proposed a deadlock resolution framework called Boundary Update Packet (BUP) for 2.5D-chiplet systems. BUP keeps path diversity for inter-chiplet packets routing and supports each chiplet can be independently designed. BUP ensures the freedom of routing design while simultaneously supporting fault-tolerant routing. The experimental results have shown that as compared with previous deadlock-free designs in 2.5D-chiplet systems, BUP achieves an average performance improvement of 13% under synthetic traffic; While under real-application workloads, BUP provides an average runtime speedup of 3.5% with an area overhead of less than 6%.
{"title":"BUP: A Deadlock Resolution Framework for 2.5D Chiplet Networks by Update Packet Bypassing","authors":"Tiejun Li;Jianmin Zhang;Yi Yang;Yan Sun","doi":"10.1109/JETCAS.2025.3590870","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3590870","url":null,"abstract":"Interposer-based 2.5D integration, as a promising packaging technology, is extensively applied in chiplet-based systems. However, even if both the interposer and chiplets are deadlock-free, new deadlocks may still occur across them after integration. To address these, we have proposed a deadlock resolution framework called Boundary Update Packet (BUP) for 2.5D-chiplet systems. BUP keeps path diversity for inter-chiplet packets routing and supports each chiplet can be independently designed. BUP ensures the freedom of routing design while simultaneously supporting fault-tolerant routing. The experimental results have shown that as compared with previous deadlock-free designs in 2.5D-chiplet systems, BUP achieves an average performance improvement of 13% under synthetic traffic; While under real-application workloads, BUP provides an average runtime speedup of 3.5% with an area overhead of less than 6%.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"537-545"},"PeriodicalIF":3.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-21DOI: 10.1109/JETCAS.2025.3590744
Yu-Tao Yang;Chih-Ming Hung
Generative artificial intelligence (GAI) and Large Language Model (LLM) require data center to have higher bandwidth, and better energy efficiency. To achieve this, Co-packaged optics (CPO) is one of the future directions that leverages advanced packaging with integrated photonics. However, this tight integration complicates data center system design and multi-physics interactions, including electrical, optical, thermal, mechanical, and material aspects. In this paper, heterogeneous integration (HI) in CPO is discussed. Multi-physics packaging is exemplified with two cases. Challenges in HI technologies are reviewed and corresponding mitigation methods are provided, including 1) thermal crosstalk within the electrical domain and between the electrical and the optical interaction, 2) SIPI of wide-and-slow and narrow-and-fast channel links, and 3) pros and cons of interposer material. Integrated photonics part is introduced and is composed of 1) light sources, 2) optical coupling strategies, 3) fiber attach schemes with advanced packaging, and 4) integrated optical technologies, e.g. novel microlens, optical TSV, 3D waveguide, and optical 3DIC. This article aims to identify the key HI challenges in CPO and points out the potential solutions for future CPO system advancement.
{"title":"Heterogeneous Integration in Co-Packaged Optics","authors":"Yu-Tao Yang;Chih-Ming Hung","doi":"10.1109/JETCAS.2025.3590744","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3590744","url":null,"abstract":"Generative artificial intelligence (GAI) and Large Language Model (LLM) require data center to have higher bandwidth, and better energy efficiency. To achieve this, Co-packaged optics (CPO) is one of the future directions that leverages advanced packaging with integrated photonics. However, this tight integration complicates data center system design and multi-physics interactions, including electrical, optical, thermal, mechanical, and material aspects. In this paper, heterogeneous integration (HI) in CPO is discussed. Multi-physics packaging is exemplified with two cases. Challenges in HI technologies are reviewed and corresponding mitigation methods are provided, including 1) thermal crosstalk within the electrical domain and between the electrical and the optical interaction, 2) SIPI of wide-and-slow and narrow-and-fast channel links, and 3) pros and cons of interposer material. Integrated photonics part is introduced and is composed of 1) light sources, 2) optical coupling strategies, 3) fiber attach schemes with advanced packaging, and 4) integrated optical technologies, e.g. novel microlens, optical TSV, 3D waveguide, and optical 3DIC. This article aims to identify the key HI challenges in CPO and points out the potential solutions for future CPO system advancement.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"427-437"},"PeriodicalIF":3.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-21DOI: 10.1109/JETCAS.2025.3591363
Tassawar Hussain;Jaber Derakhshandeh;Tom Cochet;Ehsan Shafahian;Prathamesh Dhakras;Aksel Göhnermeier;Eric Beyne;Ingrid De Wolf
The increasing demand for higher functional density in microelectronics necessitates the miniaturization of interconnects in 3D integration, which presents challenges in processing and reliability. During fabrication and service life, interconnect microbumps remain in a non-equilibrium state, leading to interfacial reactions and atomic diffusion that drive intermetallic compounds (IMCs) growth and phase transformations, impacting the electrical, thermal, and mechanical properties, and affecting long-term reliability. With global restrictions on Pb-based solders, indium (In) has emerged as a viable low-melting-point alternative, especially for temperature-sensitive packaging. Understanding IMCs kinetics in In-based systems is essential for optimizing reliability. This study investigates the kinetics and phase transformation of IMCs in Ni/In and Cu/In systems under solid-state aging conditions using an in-situ resistance measurement technique. The approach overcomes the limitations of traditional scanning electron microscopy (SEM)-based analysis by enabling continuous monitoring of IMCs growth. The Ni/In system forms Ni3In7 through a reaction-controlled mechanism with an activation energy of $108~pm ~30$ kJ/mol. In the Cu/In system, CuIn2 is formed at room temperature that undergoes a phase transformation to Cu11In9 via a peritectoid reaction above $107.5~^{circ }$ C of iso-thermal aging. The transformation shifts from a reaction-diffusion mixed controlled regime at $110~^{circ }$ C (n $approx ~0.73$ ) to diffusion control between 120-$140~^{circ }$ C (n $approx ~0.45$ –0.62), and possibly to grain-boundary diffusion at $150~^{circ }$ C (n $approx ~0.19$ ). The activation energy for CuIn${}_{2} to $ Cu11In9 transformation is $196~pm ~82$ kJ/mol, indicating a higher energy barrier. These findings contribute to the development of low-temperature bonding techniques and fine-pitch interconnect optimization for future microelectronics packaging.
{"title":"Intermetallic Compounds (IMCs) Growth Investigation, Kinetic Parameter Analysis and Reliability Evaluation of In Solder Metal for 3D Integration Packaging","authors":"Tassawar Hussain;Jaber Derakhshandeh;Tom Cochet;Ehsan Shafahian;Prathamesh Dhakras;Aksel Göhnermeier;Eric Beyne;Ingrid De Wolf","doi":"10.1109/JETCAS.2025.3591363","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3591363","url":null,"abstract":"The increasing demand for higher functional density in microelectronics necessitates the miniaturization of interconnects in 3D integration, which presents challenges in processing and reliability. During fabrication and service life, interconnect microbumps remain in a non-equilibrium state, leading to interfacial reactions and atomic diffusion that drive intermetallic compounds (IMCs) growth and phase transformations, impacting the electrical, thermal, and mechanical properties, and affecting long-term reliability. With global restrictions on Pb-based solders, indium (In) has emerged as a viable low-melting-point alternative, especially for temperature-sensitive packaging. Understanding IMCs kinetics in In-based systems is essential for optimizing reliability. This study investigates the kinetics and phase transformation of IMCs in Ni/In and Cu/In systems under solid-state aging conditions using an in-situ resistance measurement technique. The approach overcomes the limitations of traditional scanning electron microscopy (SEM)-based analysis by enabling continuous monitoring of IMCs growth. The Ni/In system forms Ni<sub>3</sub>In<sub>7</sub> through a reaction-controlled mechanism with an activation energy of <inline-formula> <tex-math>$108~pm ~30$ </tex-math></inline-formula> kJ/mol. In the Cu/In system, CuIn<sub>2</sub> is formed at room temperature that undergoes a phase transformation to Cu<sub>11</sub>In<sub>9</sub> via a peritectoid reaction above <inline-formula> <tex-math>$107.5~^{circ }$ </tex-math></inline-formula>C of iso-thermal aging. The transformation shifts from a reaction-diffusion mixed controlled regime at <inline-formula> <tex-math>$110~^{circ }$ </tex-math></inline-formula>C (n <inline-formula> <tex-math>$approx ~0.73$ </tex-math></inline-formula>) to diffusion control between 120-<inline-formula> <tex-math>$140~^{circ }$ </tex-math></inline-formula>C (n <inline-formula> <tex-math>$approx ~0.45$ </tex-math></inline-formula>–0.62), and possibly to grain-boundary diffusion at <inline-formula> <tex-math>$150~^{circ }$ </tex-math></inline-formula>C (n <inline-formula> <tex-math>$approx ~0.19$ </tex-math></inline-formula>). The activation energy for CuIn<inline-formula> <tex-math>${}_{2} to $ </tex-math></inline-formula> Cu<sub>11</sub>In<sub>9</sub> transformation is <inline-formula> <tex-math>$196~pm ~82$ </tex-math></inline-formula> kJ/mol, indicating a higher energy barrier. These findings contribute to the development of low-temperature bonding techniques and fine-pitch interconnect optimization for future microelectronics packaging.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"392-403"},"PeriodicalIF":3.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145073371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-17DOI: 10.1109/JETCAS.2025.3590106
Davy Million;César Fuguet;Adrian Evans;Rim El Cheikh;Alireza Monemi;Jonathan Balkind;Frédéric Pétrot
New high-volume commercial products combine 2.5D silicon-interposer based assemblies with 3D monolithic stacks of chiplets. This combination is called 3.5D packaging and makes it possible to assemble dense compute solutions. Components communicate via a Network-On-Chip, but current solutions do not support 3.5D Network-On-Chip topologies. To this end, this work proposes Depth-First, the first Deterministic, Virtual Channel based, Network-On-Chip routing protocol supporting 3.5D network topologies. The protocol prevents deadlocks using additional Virtual Channels only in the upper chiplets, while imposing no VC constraints on the base interposer. Depth-First also features an efficient node naming scheme, enabling highly compact routing tables. Since vertical links must be assigned to routers, we present a Mixed-Integer Linear Programming formulation that greatly speeds up execution time compared to a reference implementation from prior work, which was based on an exhaustive search. We formally prove that the protocol is deadlock-free, study its performance using an open-source cycle-accurate simulator, and compare it with other protocols (on a comparable topology). A partial implementation of Depth-First in an open-source router results in a small 4.9% area impact (7nm process) compared to an implementation without our routing algorithm.
{"title":"Depth-First: A Deterministic and Scalable NoC Routing Protocol for 3.5D Packaged Architectures","authors":"Davy Million;César Fuguet;Adrian Evans;Rim El Cheikh;Alireza Monemi;Jonathan Balkind;Frédéric Pétrot","doi":"10.1109/JETCAS.2025.3590106","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3590106","url":null,"abstract":"New high-volume commercial products combine 2.5D silicon-interposer based assemblies with 3D monolithic stacks of chiplets. This combination is called 3.5D packaging and makes it possible to assemble dense compute solutions. Components communicate via a Network-On-Chip, but current solutions do not support 3.5D Network-On-Chip topologies. To this end, this work proposes Depth-First, the first Deterministic, Virtual Channel based, Network-On-Chip routing protocol supporting 3.5D network topologies. The protocol prevents deadlocks using additional Virtual Channels only in the upper chiplets, while imposing no VC constraints on the base interposer. Depth-First also features an efficient node naming scheme, enabling highly compact routing tables. Since vertical links must be assigned to routers, we present a Mixed-Integer Linear Programming formulation that greatly speeds up execution time compared to a reference implementation from prior work, which was based on an exhaustive search. We formally prove that the protocol is deadlock-free, study its performance using an open-source cycle-accurate simulator, and compare it with other protocols (on a comparable topology). A partial implementation of Depth-First in an open-source router results in a small 4.9% area impact (7nm process) compared to an implementation without our routing algorithm.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"546-559"},"PeriodicalIF":3.8,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}