首页 > 最新文献

IEEE Journal on Emerging and Selected Topics in Circuits and Systems最新文献

英文 中文
IEEE Journal on Emerging and Selected Topics in Circuits and Systems Publication Information IEEE关于电路和系统中新兴和选定主题的期刊出版信息
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-09-15 DOI: 10.1109/JETCAS.2025.3603802
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems Publication Information","authors":"","doi":"10.1109/JETCAS.2025.3603802","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3603802","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"C2-C2"},"PeriodicalIF":3.8,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11164995","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
System-Technology Co-Optimization Methodology for LLM Accelerators With Advanced Packaging 先进封装LLM加速器的系统技术协同优化方法
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-07 DOI: 10.1109/JETCAS.2025.3596593
Janak Sharda;Shimeng Yu
Recent progress in large language models (LLMs) suggests the feasibility of their deployment on personal devices with model size reduction to a few to dozens of GB. Still, intermediate data’s computing needs are intensive, requiring frequent data reloading from the high-bandwidth memory (HBM). Today’s HBM bandwidth is limited by the number of channels embedded in a 2.5D integrated system. Advanced packaging techniques such as through silicon vias (TSV) and Cu-Cu hybrid bonding (HB) could potentially provide higher bandwidth interconnects between memory and logic dies in a 3D integrated system, where the vertical interconnect can reduce the distance between memory and logic, reducing the total energy consumption. However, this creates a large design exploration space for mixing and matching different packaging techniques and can result in complex thermal management issues due to the proximity of various components. In this work, we describe an evaluation methodology which is used to construct a framework capable of benchmarking system-level power, performance, and area (PPA) metrics for 2.5D/3D integrated systems for LLM accelerators. Additionally, we utilize the framework to conduct a detailed analysis to identify the bottlenecks for training and inference across various models and batch sizes. It is observed that the memory bandwidth and routing energy bottlenecks the inference performance, and the available compute bottlenecks the training performance. Finally, we perform thermal evaluations to observe the trade-off between peak operating temperature and the throughput across different packaging configurations.
大型语言模型(llm)的最新进展表明,将其部署在个人设备上的可行性可以将模型大小减小到几到几十GB。但是,中间数据的计算需求是密集的,需要频繁地从高带宽内存(HBM)重新加载数据。目前的HBM带宽受到2.5D集成系统中嵌入的信道数量的限制。先进的封装技术,如硅通孔(TSV)和Cu-Cu混合键合(HB),可以在3D集成系统中提供更高带宽的存储器和逻辑芯片之间的互连,其中垂直互连可以减少存储器和逻辑之间的距离,降低总能耗。然而,这为混合和匹配不同的封装技术创造了巨大的设计探索空间,并且由于各种组件的接近,可能导致复杂的热管理问题。在这项工作中,我们描述了一种评估方法,该方法用于构建一个框架,该框架能够对LLM加速器的2.5D/3D集成系统的系统级功率、性能和面积(PPA)指标进行基准测试。此外,我们利用该框架进行详细分析,以确定跨各种模型和批大小的训练和推理的瓶颈。观察到内存带宽和路由能量是推理性能的瓶颈,可用计算是训练性能的瓶颈。最后,我们执行热评估,以观察不同封装配置的峰值工作温度和吞吐量之间的权衡。
{"title":"System-Technology Co-Optimization Methodology for LLM Accelerators With Advanced Packaging","authors":"Janak Sharda;Shimeng Yu","doi":"10.1109/JETCAS.2025.3596593","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3596593","url":null,"abstract":"Recent progress in large language models (LLMs) suggests the feasibility of their deployment on personal devices with model size reduction to a few to dozens of GB. Still, intermediate data’s computing needs are intensive, requiring frequent data reloading from the high-bandwidth memory (HBM). Today’s HBM bandwidth is limited by the number of channels embedded in a 2.5D integrated system. Advanced packaging techniques such as through silicon vias (TSV) and Cu-Cu hybrid bonding (HB) could potentially provide higher bandwidth interconnects between memory and logic dies in a 3D integrated system, where the vertical interconnect can reduce the distance between memory and logic, reducing the total energy consumption. However, this creates a large design exploration space for mixing and matching different packaging techniques and can result in complex thermal management issues due to the proximity of various components. In this work, we describe an evaluation methodology which is used to construct a framework capable of benchmarking system-level power, performance, and area (PPA) metrics for 2.5D/3D integrated systems for LLM accelerators. Additionally, we utilize the framework to conduct a detailed analysis to identify the bottlenecks for training and inference across various models and batch sizes. It is observed that the memory bandwidth and routing energy bottlenecks the inference performance, and the available compute bottlenecks the training performance. Finally, we perform thermal evaluations to observe the trade-off between peak operating temperature and the throughput across different packaging configurations.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"577-584"},"PeriodicalIF":3.8,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Thermal Performance in 2.5D Systems Using Embedded Isolators 使用嵌入式隔离器优化2.5D系统的热性能
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-05 DOI: 10.1109/JETCAS.2025.3595909
George Karfakis;Myriam Bouzidi;Yunhyeok Im;Alexander Graening;Suresh K. Sitaraman;Puneet Gupta
This paper investigates thermal management in tightly integrated heterogeneous chiplet systems, focusing on a novel approach using embedded thermal isolators. In many 2.5D systems, such as modern enterprise GPUs, thermally sensitive chiplets like High Bandwidth Memory (HBM) are thermally coupled to high-power compute chiplets, leading to performance degradation. We propose and evaluate the use of thermal isolators embedded within the heat spreader to effectively thermally decouple chiplets. Our thermal simulations of a water-cooled 2.5D integrated GPU system indicate that conventional approaches like thermally-aware floorplanning are less effective due to the dominant heat transfer through the heat spreader. In contrast, our proposed thermal isolators can significantly increase thermal isolation between chiplets (by up to 61%), or even reduce overall average peak chip temperature (by up to 22.5%). We develop a closed-loop workflow incorporating thermal results to quantify performance impacts of thermal-induced throttling, finding that in an example GPU+HBM system, the isolator approach can yield performance gains of up to 37% for memory-bound workloads. These findings open up new avenues for thermal management and thermal-system co-optimization in 2.5D heterogeneous integrated systems, potentially enabling more efficient and higher-performing chiplet-based architectures.
本文研究了紧密集成的异质芯片系统的热管理,重点研究了一种使用嵌入式热隔离器的新方法。在许多2.5D系统中,例如现代企业gpu,像高带宽存储器(HBM)这样的热敏小芯片与高功率计算小芯片热耦合,导致性能下降。我们提出并评估了在散热器内嵌入热隔离器的使用,以有效地对芯片进行热解耦。我们对水冷2.5D集成GPU系统的热模拟表明,由于主要通过散热器进行热量传递,传统方法(如热感知地板规划)的效果较差。相比之下,我们提出的热隔离器可以显着提高芯片之间的热隔离(高达61%),甚至可以降低芯片的总体平均峰值温度(高达22.5%)。我们开发了一个包含热结果的闭环工作流程来量化热诱导节流的性能影响,发现在一个示例GPU+HBM系统中,隔离器方法可以在内存受限的工作负载下产生高达37%的性能提升。这些发现为2.5D异构集成系统的热管理和热系统协同优化开辟了新的途径,有可能实现更高效、性能更高的基于芯片的架构。
{"title":"Optimizing Thermal Performance in 2.5D Systems Using Embedded Isolators","authors":"George Karfakis;Myriam Bouzidi;Yunhyeok Im;Alexander Graening;Suresh K. Sitaraman;Puneet Gupta","doi":"10.1109/JETCAS.2025.3595909","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3595909","url":null,"abstract":"This paper investigates thermal management in tightly integrated heterogeneous chiplet systems, focusing on a novel approach using embedded thermal isolators. In many 2.5D systems, such as modern enterprise GPUs, thermally sensitive chiplets like High Bandwidth Memory (HBM) are thermally coupled to high-power compute chiplets, leading to performance degradation. We propose and evaluate the use of thermal isolators embedded within the heat spreader to effectively thermally decouple chiplets. Our thermal simulations of a water-cooled 2.5D integrated GPU system indicate that conventional approaches like thermally-aware floorplanning are less effective due to the dominant heat transfer through the heat spreader. In contrast, our proposed thermal isolators can significantly increase thermal isolation between chiplets (by up to 61%), or even reduce overall average peak chip temperature (by up to 22.5%). We develop a closed-loop workflow incorporating thermal results to quantify performance impacts of thermal-induced throttling, finding that in an example GPU+HBM system, the isolator approach can yield performance gains of up to 37% for memory-bound workloads. These findings open up new avenues for thermal management and thermal-system co-optimization in 2.5D heterogeneous integrated systems, potentially enabling more efficient and higher-performing chiplet-based architectures.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"458-468"},"PeriodicalIF":3.8,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAFET-HI: Secure Authentication-Based Framework for Encrypted Testing in Heterogeneous Integration 基于安全认证的异构集成加密测试框架
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-08-01 DOI: 10.1109/JETCAS.2025.3594675
Galib Ibne Haidar;Jingbo Zhou;Md Sami Ul Islam Sami;Mark M. Tehranipoor;Farimah Farahmandi
System-in-Packages (SiPs) are gaining traction due to their enhanced performance, high yield rates, and accelerated time-to-market. However, integrating chiplets from untrusted sources introduces security risks during post-integration testing. Malicious chiplets within the SiP can intercept, modify, or block sensitive test data intended for specific chiplets. This article presents SAFET-HI, a framework designed to ensure a secure testing environment for SiPs. Within this framework, sensitive test data are accessible only to authenticated chiplets. To counter sniffing and spoofing attacks, SAFET-HI encrypts sensitive test patterns while maintaining minimal timing overhead. During post-integration testing, another major threat arises from outsourcing test patterns to untrusted testing facilities, increasing the risk of overproduction and counterfeiting. To address this, SAFET-HI incorporates a functional locking mechanism that prevents unauthorized production and distribution of defective SiPs. Additionally, scan encryption blocks are implemented to stop untrusted test facilities from generating a golden response database. To further enhance security, a watermark bitstream is embedded within the SiP to prevent remarking attacks by untrusted distributors. Simulation results show that SAFET-HI incurs area and timing overheads of only 1.42-4.27% and 13.7%, respectively, demonstrating its effectiveness in securing the SiP testing process.
系统级封装(sip)由于其增强的性能、高产出率和加速的上市时间而受到越来越多的关注。但是,集成来自不可信来源的小程序会在集成后测试期间引入安全风险。SiP内的恶意小芯片可以拦截、修改或阻断特定小芯片的敏感测试数据。本文介绍了safe - hi,这是一个旨在确保sip安全测试环境的框架。在这个框架中,敏感的测试数据只有经过身份验证的小芯片才能访问。为了对抗嗅探和欺骗攻击,SAFET-HI对敏感的测试模式进行加密,同时保持最小的定时开销。在集成后测试期间,另一个主要威胁来自于将测试模式外包给不可信的测试机构,增加了生产过剩和伪造的风险。为了解决这个问题,SAFET-HI采用了功能性锁定机制,防止未经授权生产和分发有缺陷的sip。此外,还实现了扫描加密块,以阻止不受信任的测试设施生成黄金响应数据库。为了进一步提高安全性,在SiP协议中嵌入水印比特流,以防止不可信分发者的备注攻击。仿真结果表明,SAFET-HI的面积开销和时间开销分别仅为1.42-4.27%和13.7%,证明了其在SiP测试过程中的有效性。
{"title":"SAFET-HI: Secure Authentication-Based Framework for Encrypted Testing in Heterogeneous Integration","authors":"Galib Ibne Haidar;Jingbo Zhou;Md Sami Ul Islam Sami;Mark M. Tehranipoor;Farimah Farahmandi","doi":"10.1109/JETCAS.2025.3594675","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3594675","url":null,"abstract":"System-in-Packages (SiPs) are gaining traction due to their enhanced performance, high yield rates, and accelerated time-to-market. However, integrating chiplets from untrusted sources introduces security risks during post-integration testing. Malicious chiplets within the SiP can intercept, modify, or block sensitive test data intended for specific chiplets. This article presents SAFET-HI, a framework designed to ensure a secure testing environment for SiPs. Within this framework, sensitive test data are accessible only to authenticated chiplets. To counter sniffing and spoofing attacks, SAFET-HI encrypts sensitive test patterns while maintaining minimal timing overhead. During post-integration testing, another major threat arises from outsourcing test patterns to untrusted testing facilities, increasing the risk of overproduction and counterfeiting. To address this, SAFET-HI incorporates a functional locking mechanism that prevents unauthorized production and distribution of defective SiPs. Additionally, scan encryption blocks are implemented to stop untrusted test facilities from generating a golden response database. To further enhance security, a watermark bitstream is embedded within the SiP to prevent remarking attacks by untrusted distributors. Simulation results show that SAFET-HI incurs area and timing overheads of only 1.42-4.27% and 13.7%, respectively, demonstrating its effectiveness in securing the SiP testing process.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"478-492"},"PeriodicalIF":3.8,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Miniaturized and Cost-Effective Programmable 2.5D/3.5D Platforms Enabled by Scalable Embedded Active Bridge Chipset 可扩展嵌入式有源桥芯片组支持小型化和高性价比的可编程2.5D/3.5D平台
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-30 DOI: 10.1109/JETCAS.2025.3594169
Wei Lu;Jie Zhang;Yi-Hui Wei;Hsu-Ming Hsiao;Sih-Han Li;Chao-Kai Hsu;Chih-Cheng Hsiao;Feng-Hsiang Lo;Shyh-Shyuan Sheu;Chin-Hung Wang;Ching-Iang Li;Yung-Sheng Chang;Ming-Ji Dai;Wei-Chung Lo;Shih-Chieh Chang;Hung-Ming Chen;Kuan-Neng Chen;Po-Tsang Huang
This paper presents the Embedded Multi-die Active Bridge (EMAB) chip, a programmable bridge for cost-effective 2.5D/3.5D packaging technologies. The EMAB chip features a reconfigurable switch array to establish flexible I/O links for connecting multiple chiplets, forming an EMAB chipset based on user needs. It integrates low-dropout regulators (LDOs) for in-package voltage regulation and supports various transmission interfaces, including checkerboard I/Os (50 Mbps–1 Gbps) and MUX I/Os (up to 8 Gbps). Moreover, multiple EMAB chips can be interconnected in a daisy-chain configuration, enabling easy expansion of the EMAB chipset. Additionally, the EMAB chip eliminates TSVs in silicon interposer-based 2.5D packaging technologies and reduces redistribution layer (RDL) complexity through flexible I/O links established within the EMAB chip. Furthermore, EMAB chip can be pre-manufactured as a precast supporting layer (known good die, KGD), which shortens the product development cycle and enhance integration yield. Overall, the EMAB chip offers a miniaturized, low-cost, fast time-to-market and scalable solution for advanced 2.5D/3.5D packaging.
本文介绍了嵌入式多模有源桥接(EMAB)芯片,这是一种可编程桥接,用于经济高效的2.5D/3.5D封装技术。EMAB芯片采用可重新配置的开关阵列,建立灵活的I/O链路,用于连接多个小芯片,根据用户需求组成EMAB芯片组。它集成了用于封装内电压调节的低差稳压器(ldo),并支持各种传输接口,包括棋盘I/ o (50 Mbps-1 Gbps)和MUX I/ o(高达8 Gbps)。此外,多个EMAB芯片可以在菊花链配置中互连,使EMAB芯片组易于扩展。此外,EMAB芯片消除了基于硅介层的2.5D封装技术中的tsv,并通过在EMAB芯片内建立灵活的I/O链路降低了再分配层(RDL)的复杂性。此外,EMAB芯片可以作为预制支撑层(称为good die, KGD)进行预制造,缩短了产品开发周期,提高了成品率。总体而言,EMAB芯片为先进的2.5D/3.5D封装提供了小型化、低成本、快速上市和可扩展的解决方案。
{"title":"Miniaturized and Cost-Effective Programmable 2.5D/3.5D Platforms Enabled by Scalable Embedded Active Bridge Chipset","authors":"Wei Lu;Jie Zhang;Yi-Hui Wei;Hsu-Ming Hsiao;Sih-Han Li;Chao-Kai Hsu;Chih-Cheng Hsiao;Feng-Hsiang Lo;Shyh-Shyuan Sheu;Chin-Hung Wang;Ching-Iang Li;Yung-Sheng Chang;Ming-Ji Dai;Wei-Chung Lo;Shih-Chieh Chang;Hung-Ming Chen;Kuan-Neng Chen;Po-Tsang Huang","doi":"10.1109/JETCAS.2025.3594169","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3594169","url":null,"abstract":"This paper presents the Embedded Multi-die Active Bridge (EMAB) chip, a programmable bridge for cost-effective 2.5D/3.5D packaging technologies. The EMAB chip features a reconfigurable switch array to establish flexible I/O links for connecting multiple chiplets, forming an EMAB chipset based on user needs. It integrates low-dropout regulators (LDOs) for in-package voltage regulation and supports various transmission interfaces, including checkerboard I/Os (50 Mbps–1 Gbps) and MUX I/Os (up to 8 Gbps). Moreover, multiple EMAB chips can be interconnected in a daisy-chain configuration, enabling easy expansion of the EMAB chipset. Additionally, the EMAB chip eliminates TSVs in silicon interposer-based 2.5D packaging technologies and reduces redistribution layer (RDL) complexity through flexible I/O links established within the EMAB chip. Furthermore, EMAB chip can be pre-manufactured as a precast supporting layer (known good die, KGD), which shortens the product development cycle and enhance integration yield. Overall, the EMAB chip offers a miniaturized, low-cost, fast time-to-market and scalable solution for advanced 2.5D/3.5D packaging.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"379-391"},"PeriodicalIF":3.8,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing DFT Security in Chiplet-Based Systems With Encryption and Integrity Checking 用加密和完整性检查增强基于芯片系统的DFT安全性
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-28 DOI: 10.1109/JETCAS.2025.3592984
Juan Suzano;Anthony Philippe;Fady Abouzeid;Giorgio Di Natale;Philippe Roche
Chiplet-based chips are the natural evolution of traditional 2D SoCs. In the future, off-the-shelf chiplets are expected to represent an important component of the semiconductor industry. The IEEE Std 1838(TM)-2019 design-for-testability (DFT) standard enable testing of stacked chiplets from multiple vendors. However, the shared DFT network threatens the confidentiality and integrity of test data and other sensitive information. This paper addresses the security concerns associated with DFT infrastructures in chiplet-based systems. We discuss the necessity of securing DFT infrastructures to prevent unauthorized access and malicious activities. Furthermore, we propose a hardware countermeasure that combines encryption and encoding to secure communication over the DFT network. Results show that the DFT can be protected from misbehavior from malicious chiplets on the stack, scan-based attacks, and brute force attacks with minimal overhead in terms of area and test time. The proposed solution causes less than 1% area overhead on designs composed of more than 5 million gates and less than 1% test time overhead for typical DFT implementations.
基于芯片的芯片是传统2D soc的自然进化。在未来,现成的小芯片有望成为半导体行业的重要组成部分。IEEE Std 1838(TM)-2019可测试性设计(DFT)标准支持对来自多个供应商的堆叠小芯片进行测试。然而,共享DFT网络威胁到测试数据和其他敏感信息的保密性和完整性。本文讨论了基于芯片的系统中与DFT基础结构相关的安全问题。我们讨论了保护DFT基础设施以防止未经授权的访问和恶意活动的必要性。此外,我们提出了一种结合加密和编码的硬件对策,以确保DFT网络上的通信安全。结果表明,DFT可以在最小的面积和测试时间开销的情况下免受堆栈上恶意小芯片、基于扫描的攻击和暴力破解攻击的不良行为。对于由超过500万个门组成的设计,所提出的解决方案的面积开销小于1%,对于典型的DFT实现,测试时间开销小于1%。
{"title":"Enhancing DFT Security in Chiplet-Based Systems With Encryption and Integrity Checking","authors":"Juan Suzano;Anthony Philippe;Fady Abouzeid;Giorgio Di Natale;Philippe Roche","doi":"10.1109/JETCAS.2025.3592984","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3592984","url":null,"abstract":"Chiplet-based chips are the natural evolution of traditional 2D SoCs. In the future, off-the-shelf chiplets are expected to represent an important component of the semiconductor industry. The IEEE Std 1838(TM)-2019 design-for-testability (DFT) standard enable testing of stacked chiplets from multiple vendors. However, the shared DFT network threatens the confidentiality and integrity of test data and other sensitive information. This paper addresses the security concerns associated with DFT infrastructures in chiplet-based systems. We discuss the necessity of securing DFT infrastructures to prevent unauthorized access and malicious activities. Furthermore, we propose a hardware countermeasure that combines encryption and encoding to secure communication over the DFT network. Results show that the DFT can be protected from misbehavior from malicious chiplets on the stack, scan-based attacks, and brute force attacks with minimal overhead in terms of area and test time. The proposed solution causes less than 1% area overhead on designs composed of more than 5 million gates and less than 1% test time overhead for typical DFT implementations.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"493-505"},"PeriodicalIF":3.8,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast and Accurate Jitter Amplification Modeling With Variable Pulse Width Response for Statistical BER Analysis in Chiplet Interconnects and Beyond 基于可变脉宽响应的芯片互连统计误码率分析的快速准确抖动放大建模
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-28 DOI: 10.1109/JETCAS.2025.3592902
Shenggao Li;Maher Amer
In this paper, we investigate Statistical Bit Error Rate (BER) analysis for low-loss short-reach chiplet interface and high-loss long-reach serial interface. We used jitter filtering to account for the residue jitter not tracked by a forwarded clock system and proposed a fast and exact Statistical BER method to account for the Tx jitter amplification effect in a high-loss channel. Our proposed method achieves a linear computation complexity.
本文研究了低损耗短距离芯片接口和高损耗远距离串行接口的统计误码率分析。我们使用抖动滤波来解释未被转发时钟系统跟踪的残留抖动,并提出了一种快速准确的统计误码率方法来解释高损耗信道中的Tx抖动放大效应。我们提出的方法实现了线性计算复杂度。
{"title":"Fast and Accurate Jitter Amplification Modeling With Variable Pulse Width Response for Statistical BER Analysis in Chiplet Interconnects and Beyond","authors":"Shenggao Li;Maher Amer","doi":"10.1109/JETCAS.2025.3592902","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3592902","url":null,"abstract":"In this paper, we investigate Statistical Bit Error Rate (BER) analysis for low-loss short-reach chiplet interface and high-loss long-reach serial interface. We used jitter filtering to account for the residue jitter not tracked by a forwarded clock system and proposed a fast and exact Statistical BER method to account for the Tx jitter amplification effect in a high-loss channel. Our proposed method achieves a linear computation complexity.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"609-618"},"PeriodicalIF":3.8,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11097288","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Democratizing Customization for ML at the Edge Through Hetero-Chiplet SiP Architectures 通过异构芯片SiP架构实现边缘机器学习定制的民主化
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-25 DOI: 10.1109/JETCAS.2025.3592677
Matthew Joseph Adiletta;Gu-Yeon Wei;David Brooks
The demand for efficient machine learning in edge devices is challenging the capabilities of general-purpose computing systems. While domain-specific System on Chip (SoCs) are efficient, they are often prohibitively expensive due to long design times and high design costs. To address these limitations, the community has begun to explore System in Package (SiP) designs for low-cost assembly of reusable accelerators, available as chiplets, to democratize customization. This presents a new challenge of macro-architecture design space exploration (DSE). Prior works do not address this problem, having only investigated micro-architecture design and optimization of homogeneous SiPs. To address this need, and unlock the potential of assembling custom SiPs, comprising heterogeneous chiplets, we introduce an early DSE framework, CASCADE – A. CASCADE employs fast, first-order performance models to capture the tradeoffs of composable compute chiplets, leveraging tool-generated traces to comprehend dataflow patterns in the context of state-of-the-art machine learning tasks. Using CASCADE, we assess the performance benefits of composable SiPs comprising hetero-chiplets for single-tenant and two-tenant scenarios. Notably, we demonstrate that hetero-chiplet systems can deliver speedups in the range of 3-5x, depending on the application, compared to a baseline GPU chiplet system.
边缘设备对高效机器学习的需求正在挑战通用计算系统的能力。虽然特定领域的片上系统(soc)效率很高,但由于设计时间长和设计成本高,它们通常价格昂贵。为了解决这些限制,社区已经开始探索用于可重用加速器的低成本组装的系统封装(SiP)设计,以小芯片的形式提供,以实现民主化定制。这对宏观建筑设计空间探索(DSE)提出了新的挑战。先前的工作没有解决这个问题,只研究了同构sip的微架构设计和优化。为了满足这一需求,并释放组装由异构小芯片组成的定制sip的潜力,我们引入了早期的DSE框架CASCADE - A. CASCADE采用快速的一阶性能模型来捕获可组合计算小芯片的权衡,利用工具生成的跟踪来理解最先进的机器学习任务背景下的数据流模式。使用CASCADE,我们评估了单租户和双租户场景下包含异构小线程的可组合sip的性能优势。值得注意的是,我们证明了与基线GPU芯片系统相比,异晶片系统可以提供3-5倍的加速,具体取决于应用程序。
{"title":"Democratizing Customization for ML at the Edge Through Hetero-Chiplet SiP Architectures","authors":"Matthew Joseph Adiletta;Gu-Yeon Wei;David Brooks","doi":"10.1109/JETCAS.2025.3592677","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3592677","url":null,"abstract":"The demand for efficient machine learning in edge devices is challenging the capabilities of general-purpose computing systems. While domain-specific System on Chip (SoCs) are efficient, they are often prohibitively expensive due to long design times and high design costs. To address these limitations, the community has begun to explore System in Package (SiP) designs for low-cost assembly of reusable accelerators, available as chiplets, to democratize customization. This presents a new challenge of <italic>macro-architecture</i> design space exploration (DSE). Prior works do not address this problem, having only investigated micro-architecture design and optimization of homogeneous SiPs. To address this need, and unlock the potential of assembling custom SiPs, comprising heterogeneous chiplets, we introduce an early DSE framework, <italic>CASCADE</i> – A. <italic>CASCADE</i> employs fast, first-order performance models to capture the tradeoffs of composable compute chiplets, leveraging tool-generated traces to comprehend dataflow patterns in the context of state-of-the-art machine learning tasks. Using <italic>CASCADE</i>, we assess the performance benefits of composable SiPs comprising hetero-chiplets for single-tenant and two-tenant scenarios. Notably, we demonstrate that hetero-chiplet systems can deliver speedups in the range of 3-5x, depending on the application, compared to a baseline GPU chiplet system.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"634-647"},"PeriodicalIF":3.8,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11096615","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extending Energy-Efficient and Scalable DNN Training and Inference With 3-D Photonic Accelerator 利用3-D光子加速器扩展高能效和可扩展的深度神经网络训练和推理
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-23 DOI: 10.1109/JETCAS.2025.3591812
Juliana Curry;Yuan Li;Ahmed Louri;Avinash Karanth;Razvan Bunescu
As deep neural network (DNN) models continue to grow in complexity, analog computing architectures have emerged as a promising solution to meet increasing computational demands. Among these, silicon photonic computing excels at efficiently executing dot product operations while leveraging inherent parallelism. Photonic phase change memory (photonic-PCM) further enhances photonic computing by enabling scalable, non-volatile storage. In this work, we introduce the 3D Large-Scale Photonic Accelerator (LSPA), a novel photonic computing architecture designed for large-scale DNN models. LSPA employs multi-layered 3D stacking of non-volatile photonic-PCM cells, creating a high-density computational fabric that optimizes energy efficiency, flexibility, and scalability. LSPA’s custom 3D photonic network enables simultaneous data multicast in two dimensions and accumulation in three dimensions, optimizing communication patterns essential for efficient DNN training. A distinctive feature of LSPA is its ability to execute multiple forward and backward passes in parallel within each mini-batch, reducing latency associated with data movement and photonic-PCM programming. This unique capability combined with high-bandwidth photonic interconnects allows LSPA to sustain efficient training across a wide range of DNN workloads. When evaluated against a range of neural network models including VGG-16, ResNet-50, GoogLeNet, Transformer, GNMT, LLaMA 7B, and LLaMA 30B, LSPA reduces execution time by up to 92% and energy consumption by up to 90%. These results highlight LSPA as a transformative advancement in scalable, high-performance photonic computing for deep learning.
随着深度神经网络(DNN)模型的复杂性不断增加,模拟计算架构已经成为满足日益增长的计算需求的有前途的解决方案。其中,硅光子计算擅长于有效地执行点积运算,同时利用固有的并行性。光子相变存储器(Photonic - pcm)通过实现可扩展的非易失性存储进一步增强了光子计算。在这项工作中,我们介绍了3D大规模光子加速器(LSPA),这是一种为大规模深度神经网络模型设计的新型光子计算架构。LSPA采用非易失性光子- pcm细胞的多层3D堆叠,创建了一个高密度的计算结构,优化了能源效率、灵活性和可扩展性。LSPA的定制3D光子网络可以同时实现二维数据多播和三维数据积累,优化有效DNN训练所必需的通信模式。LSPA的一个显著特点是它能够在每个小批中并行执行多个向前和向后传递,减少与数据移动和光子- pcm编程相关的延迟。这种独特的能力与高带宽光子互连相结合,使LSPA能够在广泛的DNN工作负载中保持高效的训练。当对一系列神经网络模型(包括VGG-16、ResNet-50、GoogLeNet、Transformer、GNMT、LLaMA 7B和LLaMA 30B)进行评估时,LSPA将执行时间缩短了92%,能耗降低了90%。这些结果突出了LSPA在深度学习的可扩展、高性能光子计算方面的变革性进步。
{"title":"Extending Energy-Efficient and Scalable DNN Training and Inference With 3-D Photonic Accelerator","authors":"Juliana Curry;Yuan Li;Ahmed Louri;Avinash Karanth;Razvan Bunescu","doi":"10.1109/JETCAS.2025.3591812","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3591812","url":null,"abstract":"As deep neural network (DNN) models continue to grow in complexity, analog computing architectures have emerged as a promising solution to meet increasing computational demands. Among these, silicon photonic computing excels at efficiently executing dot product operations while leveraging inherent parallelism. Photonic phase change memory (photonic-PCM) further enhances photonic computing by enabling scalable, non-volatile storage. In this work, we introduce the 3D Large-Scale Photonic Accelerator (LSPA), a novel photonic computing architecture designed for large-scale DNN models. LSPA employs multi-layered 3D stacking of non-volatile photonic-PCM cells, creating a high-density computational fabric that optimizes energy efficiency, flexibility, and scalability. LSPA’s custom 3D photonic network enables simultaneous data multicast in two dimensions and accumulation in three dimensions, optimizing communication patterns essential for efficient DNN training. A distinctive feature of LSPA is its ability to execute multiple forward and backward passes in parallel within each mini-batch, reducing latency associated with data movement and photonic-PCM programming. This unique capability combined with high-bandwidth photonic interconnects allows LSPA to sustain efficient training across a wide range of DNN workloads. When evaluated against a range of neural network models including VGG-16, ResNet-50, GoogLeNet, Transformer, GNMT, LLaMA 7B, and LLaMA 30B, LSPA reduces execution time by up to 92% and energy consumption by up to 90%. These results highlight LSPA as a transformative advancement in scalable, high-performance photonic computing for deep learning.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"560-576"},"PeriodicalIF":3.8,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Through Silicon Via (TSV) Architecture of the Bumpless Build Cube (BBCube) for Stacked Memory Devices 一种用于堆叠存储器器件的无碰撞构建立方体(BBCube)的TSV架构
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-22 DOI: 10.1109/JETCAS.2025.3591627
Shinji Sugatani;Hiroyuki Ryoson;Norio Chujo;Masao Taguchi;Koji Sakui;Takayuki Ohba
This paper describes the architecture of the wafer-on-wafer (WOW) via-last through silicon via (TSV), named Bumpless Build Cube-TSV (BBCube-TSV). At first, the three types of TSVs, $mu $ -bump technology, hybrid bonding technology, and BBCube-TSV are overviewed, addressing the detailed structures and the opportunities of applying for 3D-memories. Then, the process steps of the BBCube-TSV are summarized to figure out the key process steps. Three types of applications are reviewed to illustrate and discuss the potentiality of the BBCube-TSV to enhance 3D-memories, power delivery wiring in processor on stacked memory devices, and advantage in defect management with sophisticated ideas on stacked memories. The simplicity of the structure and the occupation of copper in the TSV structure are found to provide the advantages. The role of the TSV as a vertical interconnect in the hierarchy of multilayer wiring is discussed.
本文介绍了一种晶圆上晶圆(WOW)通孔通孔(TSV)的架构,名为Bumpless Build Cube-TSV (BBCube-TSV)。首先,概述了三种类型的tsv, $mu $ -bump技术,混合键合技术和BBCube-TSV,讨论了详细的结构和应用于3d存储器的机会。然后对BBCube-TSV的工艺步骤进行总结,找出关键的工艺步骤。本文回顾了三种类型的应用,以说明和讨论BBCube-TSV在增强3d存储器、堆叠存储器器件上处理器的电源传输布线以及在堆叠存储器上具有复杂思想的缺陷管理方面的优势。结构简单,不占用铜,是TSV结构的优点。讨论了TSV在多层布线层次结构中作为垂直互连的作用。
{"title":"A Through Silicon Via (TSV) Architecture of the Bumpless Build Cube (BBCube) for Stacked Memory Devices","authors":"Shinji Sugatani;Hiroyuki Ryoson;Norio Chujo;Masao Taguchi;Koji Sakui;Takayuki Ohba","doi":"10.1109/JETCAS.2025.3591627","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3591627","url":null,"abstract":"This paper describes the architecture of the wafer-on-wafer (WOW) via-last through silicon via (TSV), named Bumpless Build Cube-TSV (BBCube-TSV). At first, the three types of TSVs, <inline-formula> <tex-math>$mu $ </tex-math></inline-formula>-bump technology, hybrid bonding technology, and BBCube-TSV are overviewed, addressing the detailed structures and the opportunities of applying for 3D-memories. Then, the process steps of the BBCube-TSV are summarized to figure out the key process steps. Three types of applications are reviewed to illustrate and discuss the potentiality of the BBCube-TSV to enhance 3D-memories, power delivery wiring in processor on stacked memory devices, and advantage in defect management with sophisticated ideas on stacked memories. The simplicity of the structure and the occupation of copper in the TSV structure are found to provide the advantages. The role of the TSV as a vertical interconnect in the hierarchy of multilayer wiring is discussed.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"368-378"},"PeriodicalIF":3.8,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11088080","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145059835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1