首页 > 最新文献

IEEE Journal on Emerging and Selected Topics in Circuits and Systems最新文献

英文 中文
3-D Thermal Model With Lateral Thermal Resistance for Fast Thermal Analysis of Complex Stacked Structures 基于横向热阻的复杂堆叠结构快速热分析三维热模型
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-17 DOI: 10.1109/JETCAS.2025.3590269
Yang Liu;Feiyang Ma;Yuzhang Zang;Jun Wang;Zhifei Xu;Kai-Da Xu
With increasing density and complexity in 3-D integrated circuits, thermal management has become a major design challenge. In this paper, we present a precise 3-D thermal analysis model incorporating lateral thermal resistance, based on physical structure and material thermal properties. Analytical expressions for lateral thermal resistance and capacitance are derived, enabling accurate thermal modeling of complex 3-D stacked structures. We incorporate these analytical expressions into the RC-Tensorial Analysis Network (RC-TAN) framework, resulting in the 3-D RC-TAN method, which enhances computational efficiency while maintaining high accuracy. Simulation and experimental results demonstrate that the 3-D RC-TAN method outperforms traditional 1-D thermal analysis approaches, offering more than a 97% reduction in computation time compared with finite element method (FEM).
随着三维集成电路的密度和复杂性的增加,热管理已成为主要的设计挑战。在本文中,我们提出了一个精确的三维热分析模型,包括横向热阻,基于物理结构和材料的热性能。导出了横向热阻和电容的解析表达式,实现了复杂三维堆叠结构的精确热建模。我们将这些解析表达式整合到rc -张量分析网络(RC-TAN)框架中,形成了3-D RC-TAN方法,该方法在保持高精度的同时提高了计算效率。仿真和实验结果表明,三维RC-TAN方法优于传统的一维热分析方法,与有限元方法(FEM)相比,计算时间减少了97%以上。
{"title":"3-D Thermal Model With Lateral Thermal Resistance for Fast Thermal Analysis of Complex Stacked Structures","authors":"Yang Liu;Feiyang Ma;Yuzhang Zang;Jun Wang;Zhifei Xu;Kai-Da Xu","doi":"10.1109/JETCAS.2025.3590269","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3590269","url":null,"abstract":"With increasing density and complexity in 3-D integrated circuits, thermal management has become a major design challenge. In this paper, we present a precise 3-D thermal analysis model incorporating lateral thermal resistance, based on physical structure and material thermal properties. Analytical expressions for lateral thermal resistance and capacitance are derived, enabling accurate thermal modeling of complex 3-D stacked structures. We incorporate these analytical expressions into the RC-Tensorial Analysis Network (RC-TAN) framework, resulting in the 3-D RC-TAN method, which enhances computational efficiency while maintaining high accuracy. Simulation and experimental results demonstrate that the 3-D RC-TAN method outperforms traditional 1-D thermal analysis approaches, offering more than a 97% reduction in computation time compared with finite element method (FEM).","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"445-457"},"PeriodicalIF":3.8,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cool-3D: An End-to-End Thermal-Aware Framework for Early-Phase Design Space Exploration of Microfluidic-Cooled 3DICs Cool-3D:微流冷3dic早期设计空间探索的端到端热感知框架
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-17 DOI: 10.1109/JETCAS.2025.3590065
Runxi Wang;Ziheng Wang;Ting Lin;Jacob Michael Raby;Mircea R. Stan;Xinfei Guo
The rapid advancement of three-dimensional integrated circuits (3DICs) has heightened the need for early-phase design space exploration (DSE) to minimize design iterations and unexpected challenges. Emphasizing the pre-register-transfer level (Pre-RTL) design phase is crucial for reducing trial-and-error costs. However, 3DIC design introduces additional complexities due to thermal constraints and an expanded design space resulting from vertical stacking and various cooling strategies. Despite this need, existing Pre-RTL DSE tools for 3DICs remain scarce, with available solutions often lacking comprehensive design options and full customization support. To bridge this gap, we present Cool-3D, an end-to-end, thermal-aware framework for 3DIC design that integrates mainstream architectural-level simulators, including gem5, McPAT, and HotSpot 7.0, with advanced cooling models. Cool-3D enables broad and fine-grained design space exploration, built-in microfluidic cooling support for thermal analysis, and an extension interface for non-parameterizable customization, allowing designers to model and optimize 3DIC architectures with greater flexibility and accuracy. To validate the Cool-3D framework, we conduct four case studies demonstrating its ability to model various hardware design options and accurately capture thermal behaviors. Cool-3D serves as a foundational framework that not only facilitates comprehensive 3DIC design space exploration but also enables future innovations in 3DIC architecture, cooling strategies, and optimization techniques. The entire framework, along with the experimental data, is in the process of being released on GitHub. The GitHub link is available on https://github.com/iCAS-SJTU/Cool-3D
三维集成电路(3dic)的快速发展提高了对早期设计空间探索(DSE)的需求,以最大限度地减少设计迭代和意外挑战。强调预寄存器转移水平(Pre-RTL)设计阶段对于减少试错成本至关重要。然而,由于垂直堆叠和各种冷却策略导致的热限制和扩展的设计空间,3DIC设计引入了额外的复杂性。尽管有这种需求,但现有的用于3dic的Pre-RTL DSE工具仍然很少,可用的解决方案通常缺乏全面的设计选项和完整的定制支持。为了弥补这一差距,我们提出了Cool-3D,这是一种端到端,用于3DIC设计的热感知框架,将主流架构级模拟器(包括gem5, McPAT和HotSpot 7.0)与先进的冷却模型集成在一起。Cool-3D支持广泛和细粒度的设计空间探索,内置微流控冷却支持热分析,以及非参数化定制的扩展接口,允许设计人员以更大的灵活性和准确性建模和优化3DIC架构。为了验证Cool-3D框架,我们进行了四个案例研究,展示了其模拟各种硬件设计选项和准确捕获热行为的能力。Cool-3D作为一个基础框架,不仅可以促进全面的3DIC设计空间探索,还可以实现未来3DIC架构、冷却策略和优化技术的创新。整个框架以及实验数据正在GitHub上发布。GitHub链接可在https://github.com/iCAS-SJTU/Cool-3D上获得
{"title":"Cool-3D: An End-to-End Thermal-Aware Framework for Early-Phase Design Space Exploration of Microfluidic-Cooled 3DICs","authors":"Runxi Wang;Ziheng Wang;Ting Lin;Jacob Michael Raby;Mircea R. Stan;Xinfei Guo","doi":"10.1109/JETCAS.2025.3590065","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3590065","url":null,"abstract":"The rapid advancement of three-dimensional integrated circuits (3DICs) has heightened the need for early-phase design space exploration (DSE) to minimize design iterations and unexpected challenges. Emphasizing the pre-register-transfer level (Pre-RTL) design phase is crucial for reducing trial-and-error costs. However, 3DIC design introduces additional complexities due to thermal constraints and an expanded design space resulting from vertical stacking and various cooling strategies. Despite this need, existing Pre-RTL DSE tools for 3DICs remain scarce, with available solutions often lacking comprehensive design options and full customization support. To bridge this gap, we present Cool-3D, an end-to-end, thermal-aware framework for 3DIC design that integrates mainstream architectural-level simulators, including gem5, McPAT, and HotSpot 7.0, with advanced cooling models. Cool-3D enables broad and fine-grained design space exploration, built-in microfluidic cooling support for thermal analysis, and an extension interface for non-parameterizable customization, allowing designers to model and optimize 3DIC architectures with greater flexibility and accuracy. To validate the Cool-3D framework, we conduct four case studies demonstrating its ability to model various hardware design options and accurately capture thermal behaviors. Cool-3D serves as a foundational framework that not only facilitates comprehensive 3DIC design space exploration but also enables future innovations in 3DIC architecture, cooling strategies, and optimization techniques. The entire framework, along with the experimental data, is in the process of being released on GitHub. The GitHub link is available on <uri>https://github.com/iCAS-SJTU/Cool-3D</uri>","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 4","pages":"659-673"},"PeriodicalIF":3.8,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145808589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Journal on Emerging and Selected Topics in Circuits and Systems IEEE电路与系统中新兴和选定主题杂志
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-25 DOI: 10.1109/JETCAS.2025.3573432
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","authors":"","doi":"10.1109/JETCAS.2025.3573432","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3573432","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"C3-C3"},"PeriodicalIF":3.7,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11050017","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Journal on Emerging and Selected Topics in Circuits and Systems Publication Information IEEE关于电路和系统中新兴和选定主题的期刊出版信息
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-25 DOI: 10.1109/JETCAS.2025.3573428
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems Publication Information","authors":"","doi":"10.1109/JETCAS.2025.3573428","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3573428","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"C2-C2"},"PeriodicalIF":3.7,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11050009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Journal on Emerging and Selected Topics in Circuits and Systems Information for Authors IEEE关于电路和系统信息中新兴和选定主题的作者期刊
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-25 DOI: 10.1109/JETCAS.2025.3573430
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems Information for Authors","authors":"","doi":"10.1109/JETCAS.2025.3573430","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3573430","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"361-361"},"PeriodicalIF":3.7,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11050010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial Generative Artificial Intelligence Compute: Algorithms, Implementations, and Applications to CAS 生成人工智能计算:算法、实现和在CAS中的应用
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-25 DOI: 10.1109/JETCAS.2025.3572258
Chuan Zhang;Naigang Wang;Jongsun Park;Li Zhang
{"title":"Guest Editorial Generative Artificial Intelligence Compute: Algorithms, Implementations, and Applications to CAS","authors":"Chuan Zhang;Naigang Wang;Jongsun Park;Li Zhang","doi":"10.1109/JETCAS.2025.3572258","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3572258","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"144-148"},"PeriodicalIF":3.7,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11050018","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative AI Through CAS Lens: An Integrated Overview of Algorithmic Optimizations, Architectural Advances, and Automated Designs 通过CAS镜头生成人工智能:算法优化,架构进步和自动化设计的综合概述
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-06-06 DOI: 10.1109/JETCAS.2025.3575272
Chuan Zhang;You You;Naigang Wang;Jongsun Park;Li Zhang
Generative artificial intelligence (GenAI) has emerged as a pivotal focus in global innovation agendas, revealing transformative potential that extends beyond technological applications to reshape diverse societal domains. Given the fundamental dependency of GenAI deployment on circuits and systems (CAS), a co-evolutionary approach integrating both technological paradigms becomes imperative. This synergistic framework confronts three interrelated challenges: 1) developing deployment-ready GenAI algorithms, 2) engineering implementation-efficient CAS architectures, and 3) leveraging GenAI for autonomous CAS designs - each representing critical innovations vectors. Given the rapid advancement of GenAI-CAS technologies, a comprehensive synthesis has become an urgent priority across academia and industry. Consequently, this timely review systematically analyzes current advancements, provides integrative perspectives, and identifies emerging research trajectories. This review endeavors to serve both AI and CAS communities, thereby catalyzing an innovation feedback loop: GenAI-optimized CAS architectures in turn accelerate GenAI evolution through algorithm-hardware co-empowerment.
生成式人工智能(GenAI)已成为全球创新议程的关键焦点,它揭示了超越技术应用、重塑不同社会领域的变革潜力。鉴于GenAI部署对电路和系统(CAS)的基本依赖,集成两种技术范式的共同进化方法变得势在必行。这种协同框架面临着三个相互关联的挑战:1)开发可部署的GenAI算法,2)工程实现高效的CAS架构,以及3)利用GenAI进行自主CAS设计-每个都代表着关键的创新向量。鉴于GenAI-CAS技术的快速发展,综合合成已成为学术界和工业界的当务之急。因此,这篇及时的评论系统地分析了当前的进展,提供了综合的观点,并确定了新兴的研究轨迹。本综述努力为人工智能和CAS社区服务,从而催化创新反馈循环:通过算法-硬件协同授权,优化了GenAI的CAS架构反过来加速了GenAI的进化。
{"title":"Generative AI Through CAS Lens: An Integrated Overview of Algorithmic Optimizations, Architectural Advances, and Automated Designs","authors":"Chuan Zhang;You You;Naigang Wang;Jongsun Park;Li Zhang","doi":"10.1109/JETCAS.2025.3575272","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3575272","url":null,"abstract":"Generative artificial intelligence (GenAI) has emerged as a pivotal focus in global innovation agendas, revealing transformative potential that extends beyond technological applications to reshape diverse societal domains. Given the fundamental dependency of GenAI deployment on circuits and systems (CAS), a co-evolutionary approach integrating both technological paradigms becomes imperative. This synergistic framework confronts three interrelated challenges: 1) developing deployment-ready GenAI algorithms, 2) engineering implementation-efficient CAS architectures, and 3) leveraging GenAI for autonomous CAS designs - each representing critical innovations vectors. Given the rapid advancement of GenAI-CAS technologies, a comprehensive synthesis has become an urgent priority across academia and industry. Consequently, this timely review systematically analyzes current advancements, provides integrative perspectives, and identifies emerging research trajectories. This review endeavors to serve both AI and CAS communities, thereby catalyzing an innovation feedback loop: GenAI-optimized CAS architectures in turn accelerate GenAI evolution through algorithm-hardware co-empowerment.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"149-185"},"PeriodicalIF":3.7,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11024158","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RDDM: A Rate-Distortion Guided Diffusion Model for Learned Image Compression Enhancement RDDM:用于学习图像压缩增强的速率失真引导扩散模型
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-22 DOI: 10.1109/JETCAS.2025.3563228
Sanxin Jiang;Jiro Katto;Heming Sun
Currently, denoising diffusion probability models (DDPM) have achieved significant success in various image generation tasks, but their application in image compression, especially in the context of learned image compression (LIC), is quite limited. In this study, we introduce a rate-distortion (RD) guided diffusion model, referred to as RDDM, to enhance the performance of LIC. In RDDM, LIC is treated as a lossy codec function constrained by RD, dividing the input image into two parts through encoding and decoding operations: the reconstructed image and the residual image. The construction of RDDM is primarily based on two points. First, RDDM treats diffusion models as repositories of image structures and textures, built using extensive real-world datasets. Under the guidance of RD constraints, it extracts and utilizes the necessary structural and textural priors from these repositories to restore the input image. Second, RDDM employs a Bayesian network to progressively infer the input image based on the reconstructed image and its codec function. Additionally, our research reveals that RDDM’s performance declines when its codec function does not match the reconstructed image. However, using the highest bitrate codec function minimizes this performance drop. The resulting model is referred to as $text{RDDM}^{star }$ . The experimental results indicate that both RDDM and $text{RDDM}^{star }$ can be applied to various architectures of LICs, such as CNN, Transformer, and their hybrid. They can significantly improve the fidelity of these codecs while maintaining or even enhancing perceptual quality to some extent.
目前,去噪扩散概率模型(DDPM)在各种图像生成任务中取得了显著的成功,但其在图像压缩中的应用,特别是在学习图像压缩(LIC)中的应用非常有限。在这项研究中,我们引入了一种速率失真(RD)引导的扩散模型,简称RDDM,以提高LIC的性能。在RDDM中,LIC被视为受RD约束的有损编解码函数,通过编解码操作将输入图像分成重构图像和残差图像两部分。RDDM的构建主要基于两点。首先,RDDM将扩散模型视为图像结构和纹理的存储库,使用广泛的现实世界数据集构建。在RD约束的指导下,从这些存储库中提取并利用必要的结构和纹理先验来恢复输入图像。其次,RDDM基于重构图像及其编解码功能,采用贝叶斯网络逐步推断输入图像。此外,我们的研究表明,当其编解码功能与重建图像不匹配时,RDDM的性能会下降。然而,使用最高比特率编解码器功能可以最大限度地减少这种性能下降。生成的模型被称为$text{RDDM}^{star}$。实验结果表明,RDDM和$text{RDDM}^{star}$都可以应用于各种结构的lic,如CNN、Transformer及其混合结构。它们可以显著提高编解码器的保真度,同时在一定程度上保持甚至增强感知质量。
{"title":"RDDM: A Rate-Distortion Guided Diffusion Model for Learned Image Compression Enhancement","authors":"Sanxin Jiang;Jiro Katto;Heming Sun","doi":"10.1109/JETCAS.2025.3563228","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3563228","url":null,"abstract":"Currently, denoising diffusion probability models (DDPM) have achieved significant success in various image generation tasks, but their application in image compression, especially in the context of learned image compression (LIC), is quite limited. In this study, we introduce a rate-distortion (RD) guided diffusion model, referred to as RDDM, to enhance the performance of LIC. In RDDM, LIC is treated as a lossy codec function constrained by RD, dividing the input image into two parts through encoding and decoding operations: the reconstructed image and the residual image. The construction of RDDM is primarily based on two points. First, RDDM treats diffusion models as repositories of image structures and textures, built using extensive real-world datasets. Under the guidance of RD constraints, it extracts and utilizes the necessary structural and textural priors from these repositories to restore the input image. Second, RDDM employs a Bayesian network to progressively infer the input image based on the reconstructed image and its codec function. Additionally, our research reveals that RDDM’s performance declines when its codec function does not match the reconstructed image. However, using the highest bitrate codec function minimizes this performance drop. The resulting model is referred to as <inline-formula> <tex-math>$text{RDDM}^{star }$ </tex-math></inline-formula>. The experimental results indicate that both RDDM and <inline-formula> <tex-math>$text{RDDM}^{star }$ </tex-math></inline-formula> can be applied to various architectures of LICs, such as CNN, Transformer, and their hybrid. They can significantly improve the fidelity of these codecs while maintaining or even enhancing perceptual quality to some extent.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"186-199"},"PeriodicalIF":3.7,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Flexible Template for Edge Generative AI With High-Accuracy Accelerated Softmax and GELU 基于高精度加速Softmax和GELU的边缘生成人工智能灵活模板
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-21 DOI: 10.1109/JETCAS.2025.3562734
Andrea Belano;Yvan Tortorella;Angelo Garofalo;Luca Benini;Davide Rossi;Francesco Conti
Transformer-based generative Artificial Intelligence (GenAI) models achieve remarkable results in a wide range of fields, including natural language processing, computer vision, and audio processing. However, this comes at the cost of increased complexity and the need of sophisticated non-linearities such as softmax and GELU. Even if Transformers are computationally dominated by matrix multiplications (MatMul), these non-linearities can become a performance bottleneck, especially if dedicated hardware is used to accelerate MatMul operators. In this work, we introduce a GenAI BFloat16 Transformer acceleration template based on a heterogeneous tightly-coupled cluster containing 256KiB of shared SRAM, 8 general-purpose RISC-V cores, a $24times 8$ systolic array MatMul accelerator, and a novel accelerator for Transformer softmax, GELU and SiLU non-linearities: SoftEx. SoftEx introduces an approximate exponentiation algorithm balancing efficiency ( $121times $ speedup over glibc’s implementation) with accuracy (mean relative error of 0.14%). In 12 nm technology, SoftEx occupies 0.039 mm2, only 3.22% of the cluster, which achieves an operating frequency of 1.12 GHz. Compared to optimized software running on the RISC-V cores, SoftEx achieves significant improvements, accelerating softmax and GELU computations by up to $10.8times $ and $5.11times $ , respectively, while reducing their energy consumption by up to $10.8times $ and $5.29times $ . These enhancements translate into a $1.58times $ increase in throughput (310 GOPS at 0.8 V) and a $1.42times $ improvement in energy efficiency (1.34 TOPS/W at 0.55 V) on end-to-end ViT inference workloads.
基于变压器的生成式人工智能(GenAI)模型在包括自然语言处理、计算机视觉和音频处理在内的广泛领域取得了显著的成果。然而,这是以增加复杂性和需要复杂的非线性(如softmax和GELU)为代价的。即使transformer在计算上由矩阵乘法(MatMul)主导,这些非线性也可能成为性能瓶颈,特别是如果使用专用硬件来加速MatMul运算符。在这项工作中,我们介绍了一个基于异构紧密耦合集群的GenAI BFloat16 Transformer加速模板,该集群包含256KiB的共享SRAM, 8个通用RISC-V内核,一个$24 × 8$收缩阵列MatMul加速器,以及一个用于Transformer softmax, GELU和SiLU非线性的新型加速器:SoftEx。SoftEx引入了一种近似的指数算法来平衡效率(比glibc的实现加速121倍)和精度(平均相对误差为0.14%)。在12nm技术中,SoftEx占据了0.039 mm2,仅占集群的3.22%,实现了1.12 GHz的工作频率。与运行在RISC-V内核上的优化软件相比,SoftEx实现了显著的改进,分别将softmax和GELU的计算速度提高了10.8倍和5.11倍,同时将能耗降低了10.8倍和5.29倍。这些增强转化为端到端ViT推断工作负载的吞吐量增加1.58美元(0.8 V时310 GOPS),能效提高1.42美元(0.55 V时1.34 TOPS/W)。
{"title":"A Flexible Template for Edge Generative AI With High-Accuracy Accelerated Softmax and GELU","authors":"Andrea Belano;Yvan Tortorella;Angelo Garofalo;Luca Benini;Davide Rossi;Francesco Conti","doi":"10.1109/JETCAS.2025.3562734","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3562734","url":null,"abstract":"Transformer-based generative Artificial Intelligence (GenAI) models achieve remarkable results in a wide range of fields, including natural language processing, computer vision, and audio processing. However, this comes at the cost of increased complexity and the need of sophisticated non-linearities such as softmax and GELU. Even if Transformers are computationally dominated by matrix multiplications (MatMul), these non-linearities can become a performance bottleneck, especially if dedicated hardware is used to accelerate MatMul operators. In this work, we introduce a GenAI BFloat16 Transformer acceleration template based on a heterogeneous tightly-coupled cluster containing 256KiB of shared SRAM, 8 general-purpose RISC-V cores, a <inline-formula> <tex-math>$24times 8$ </tex-math></inline-formula> systolic array MatMul accelerator, and a novel accelerator for Transformer softmax, GELU and SiLU non-linearities: SoftEx. SoftEx introduces an approximate exponentiation algorithm balancing efficiency (<inline-formula> <tex-math>$121times $ </tex-math></inline-formula> speedup over glibc’s implementation) with accuracy (mean relative error of 0.14%). In 12 nm technology, SoftEx occupies 0.039 mm<sup>2</sup>, only 3.22% of the cluster, which achieves an operating frequency of 1.12 GHz. Compared to optimized software running on the RISC-V cores, SoftEx achieves significant improvements, accelerating softmax and GELU computations by up to <inline-formula> <tex-math>$10.8times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$5.11times $ </tex-math></inline-formula>, respectively, while reducing their energy consumption by up to <inline-formula> <tex-math>$10.8times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$5.29times $ </tex-math></inline-formula>. These enhancements translate into a <inline-formula> <tex-math>$1.58times $ </tex-math></inline-formula> increase in throughput (310 GOPS at 0.8 V) and a <inline-formula> <tex-math>$1.42times $ </tex-math></inline-formula> improvement in energy efficiency (1.34 TOPS/W at 0.55 V) on end-to-end ViT inference workloads.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"200-216"},"PeriodicalIF":3.7,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Two-Range Quantization and Hardware Co-Design for Large Language Model Acceleration 大型语言模型加速的自适应双量程量化与硬件协同设计
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-21 DOI: 10.1109/JETCAS.2025.3562937
Siqi Cai;Gang Wang;Wenjie Li;Dongxu Lyu;Guanghui He
Large language models (LLMs) face high computational and memory demands. While prior studies have leveraged quantization to reduce memory requirements, critical challenges persist: unaligned memory accesses, significant quantization errors when handling outliers that span larger quantization ranges, and the increased hardware overhead associated with processing high-bit-width outliers. To address these issues, we propose a quantization algorithm and hardware architecture co-design for efficient LLM acceleration. Algorithmically, a grouped adaptive two-range quantization (ATRQ) with an in-group embedded identifier is proposed to encode outliers and normal values in distinct ranges, achieving hardware-friendly aligned memory access and reducing quantization errors. From a hardware perspective, we develop a low-overhead ATRQ decoder and an outlier-bit-split processing element (PE) to reduce the hardware overhead associated with high-bit-width outliers, effectively leveraging their inherent sparsity. To support mixed-precision computation and accommodate diverse dataflows during the prefilling and decoding phases, we design a reconfigurable local accumulator that mitigates the overhead associated with additional adders. Experimental results show that the ATRQ-based accelerator outperforms existing solutions, achieving up to $2.48times $ speedup and $2.01times $ energy reduction in LLM prefilling phase, and $1.87times $ speedup and $2.03times $ energy reduction in the decoding phase, with superior model performance under post-training quantization.
大型语言模型(llm)面临着高计算和内存需求。虽然以前的研究已经利用量化来减少内存需求,但关键的挑战仍然存在:未对齐的内存访问,处理跨越更大量化范围的异常值时显着的量化错误,以及与处理高位宽异常值相关的硬件开销增加。为了解决这些问题,我们提出了一种量化算法和硬件架构协同设计,以实现高效的LLM加速。在算法上,提出了一种组内嵌入标识符的自适应双量程量化(ATRQ)方法,对不同范围内的离群值和正态值进行编码,实现了硬件友好的对齐存储器访问,减少了量化误差。从硬件的角度来看,我们开发了一个低开销的ATRQ解码器和一个离群值位分割处理元件(PE),以减少与高位宽离群值相关的硬件开销,有效地利用其固有的稀疏性。为了支持混合精度计算,并在预填充和解码阶段适应不同的数据流,我们设计了一个可重构的局部累加器,以减轻与附加加法器相关的开销。实验结果表明,基于atrq的加速方案优于现有方案,在LLM预填充阶段加速高达$2.48倍,能量降低$2.01倍,在解码阶段加速高达$1.87倍,能量降低$2.03倍,在训练后量化下模型性能优越。
{"title":"Adaptive Two-Range Quantization and Hardware Co-Design for Large Language Model Acceleration","authors":"Siqi Cai;Gang Wang;Wenjie Li;Dongxu Lyu;Guanghui He","doi":"10.1109/JETCAS.2025.3562937","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3562937","url":null,"abstract":"Large language models (LLMs) face high computational and memory demands. While prior studies have leveraged quantization to reduce memory requirements, critical challenges persist: unaligned memory accesses, significant quantization errors when handling outliers that span larger quantization ranges, and the increased hardware overhead associated with processing high-bit-width outliers. To address these issues, we propose a quantization algorithm and hardware architecture co-design for efficient LLM acceleration. Algorithmically, a grouped adaptive two-range quantization (ATRQ) with an in-group embedded identifier is proposed to encode outliers and normal values in distinct ranges, achieving hardware-friendly aligned memory access and reducing quantization errors. From a hardware perspective, we develop a low-overhead ATRQ decoder and an outlier-bit-split processing element (PE) to reduce the hardware overhead associated with high-bit-width outliers, effectively leveraging their inherent sparsity. To support mixed-precision computation and accommodate diverse dataflows during the prefilling and decoding phases, we design a reconfigurable local accumulator that mitigates the overhead associated with additional adders. Experimental results show that the ATRQ-based accelerator outperforms existing solutions, achieving up to <inline-formula> <tex-math>$2.48times $ </tex-math></inline-formula> speedup and <inline-formula> <tex-math>$2.01times $ </tex-math></inline-formula> energy reduction in LLM prefilling phase, and <inline-formula> <tex-math>$1.87times $ </tex-math></inline-formula> speedup and <inline-formula> <tex-math>$2.03times $ </tex-math></inline-formula> energy reduction in the decoding phase, with superior model performance under post-training quantization.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"272-284"},"PeriodicalIF":3.7,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1