首页 > 最新文献

IEEE Journal on Emerging and Selected Topics in Circuits and Systems最新文献

英文 中文
An Overview of Neural Rendering Accelerators: Challenges, Trends, and Future Directions 神经渲染加速器概述:挑战、趋势和未来方向
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-17 DOI: 10.1109/JETCAS.2025.3561777
Junha Ryu;Hoi-Jun Yoo
Rapid advancements in neural rendering have revolutionized the fields of augmented reality (AR) and virtual reality (VR) by enabling photorealistic 3D modeling and rendering. However, deploying neural rendering on edge devices presents significant challenges due to computational complexity, memory inefficiencies, and energy constraints. This paper provides a comprehensive overview of neural rendering accelerators, identifying the major hardware inefficiencies across sampling, positional encoding, and multi-layer perception (MLP) stages. We explore hardware-software co-optimization techniques that address these challenges and provide a summary for in-depth analysis. Additionally, emerging trends like 3D Gaussian Splatting (3DGS) and hybrid rendering approaches are briefly introduced, highlighting their potential to improve rendering quality and efficiency. By presenting a unified analysis of challenges, solutions, and future directions, this work aims to guide the development of next-generation neural rendering accelerators, especially for resource-constrained environments.
神经渲染的快速发展通过实现逼真的3D建模和渲染,彻底改变了增强现实(AR)和虚拟现实(VR)领域。然而,由于计算复杂性、内存效率低下和能量限制,在边缘设备上部署神经渲染存在重大挑战。本文提供了神经渲染加速器的全面概述,确定了采样,位置编码和多层感知(MLP)阶段的主要硬件效率低下。我们将探讨解决这些挑战的软硬件协同优化技术,并为深入分析提供总结。此外,还简要介绍了3D高斯喷溅(3DGS)和混合渲染方法等新兴趋势,强调了它们提高渲染质量和效率的潜力。通过对挑战、解决方案和未来方向的统一分析,本工作旨在指导下一代神经渲染加速器的开发,特别是在资源受限的环境中。
{"title":"An Overview of Neural Rendering Accelerators: Challenges, Trends, and Future Directions","authors":"Junha Ryu;Hoi-Jun Yoo","doi":"10.1109/JETCAS.2025.3561777","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3561777","url":null,"abstract":"Rapid advancements in neural rendering have revolutionized the fields of augmented reality (AR) and virtual reality (VR) by enabling photorealistic 3D modeling and rendering. However, deploying neural rendering on edge devices presents significant challenges due to computational complexity, memory inefficiencies, and energy constraints. This paper provides a comprehensive overview of neural rendering accelerators, identifying the major hardware inefficiencies across sampling, positional encoding, and multi-layer perception (MLP) stages. We explore hardware-software co-optimization techniques that address these challenges and provide a summary for in-depth analysis. Additionally, emerging trends like 3D Gaussian Splatting (3DGS) and hybrid rendering approaches are briefly introduced, highlighting their potential to improve rendering quality and efficiency. By presenting a unified analysis of challenges, solutions, and future directions, this work aims to guide the development of next-generation neural rendering accelerators, especially for resource-constrained environments.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"299-311"},"PeriodicalIF":3.7,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10967345","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LightRot: A Light-Weighted Rotation Scheme and Architecture for Accurate Low-Bit Large Language Model Inference LightRot:用于精确低比特大语言模型推理的轻量级旋转方案和体系结构
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-04-08 DOI: 10.1109/JETCAS.2025.3558300
Sangjin Kim;Yuseon Choi;Jungjun Oh;Byeongcheol Kim;Hoi-Jun Yoo
As large language models (LLMs) continue to demonstrate exceptional capabilities across various domains, the challenge of achieving energy-efficient and accurate inference becomes increasingly critical. This work presents LightRot, a lightweight rotation scheme and dedicated hardware accelerator designed for low-bit LLM inference. The proposed architecture integrates Grouped Local Rotation (GLR) and Outlier Direction Aligning (ODA) algorithms with a hierarchical Fast Hadamard Transform (FHT)-based rotation unit to address key challenges in low-bit quantization, including the energy overhead of rotation operations. The proposed accelerator, implemented in a 28nm CMOS process, achieves a peak energy efficiency of 27.4TOPS/W for 4-bit inference, surpassing prior state-of-the-art designs. Unlike conventional approaches that rely on higher-precision inference or evaluate on basic language modeling tasks like GPT-2, LightRot is optimized for advanced models such as LLaMA2-13B and LLaMA3-8B. Its performance is further validated on MT-Bench, demonstrating robust applicability to real-world conversational scenarios and redefining benchmarks for chat-based AI systems. By synergizing algorithmic innovations and hardware efficiency, this work sets a new paradigm for scalable, low-bit LLM inference, paving the way for sustainable AI advancements.
随着大型语言模型(llm)在各个领域不断展示出卓越的能力,实现节能和准确推理的挑战变得越来越重要。这项工作提出了LightRot,一个轻量级的旋转方案和专用硬件加速器,专为低比特LLM推理而设计。该架构将分组局部旋转(GLR)和离群方向对齐(ODA)算法与基于分层快速哈达玛变换(FHT)的旋转单元集成在一起,以解决低比特量化的关键挑战,包括旋转操作的能量开销。该加速器采用28nm CMOS工艺,在4位推理中实现了27.4TOPS/W的峰值能量效率,超过了之前最先进的设计。与依赖于更高精度推理或评估基本语言建模任务(如GPT-2)的传统方法不同,LightRot针对高级模型(如LLaMA2-13B和LLaMA3-8B)进行了优化。在MT-Bench上进一步验证了其性能,展示了对现实世界会话场景的强大适用性,并重新定义了基于聊天的人工智能系统的基准。通过协同算法创新和硬件效率,这项工作为可扩展的低比特LLM推理设定了一个新的范例,为可持续的人工智能发展铺平了道路。
{"title":"LightRot: A Light-Weighted Rotation Scheme and Architecture for Accurate Low-Bit Large Language Model Inference","authors":"Sangjin Kim;Yuseon Choi;Jungjun Oh;Byeongcheol Kim;Hoi-Jun Yoo","doi":"10.1109/JETCAS.2025.3558300","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3558300","url":null,"abstract":"As large language models (LLMs) continue to demonstrate exceptional capabilities across various domains, the challenge of achieving energy-efficient and accurate inference becomes increasingly critical. This work presents LightRot, a lightweight rotation scheme and dedicated hardware accelerator designed for low-bit LLM inference. The proposed architecture integrates Grouped Local Rotation (GLR) and Outlier Direction Aligning (ODA) algorithms with a hierarchical Fast Hadamard Transform (FHT)-based rotation unit to address key challenges in low-bit quantization, including the energy overhead of rotation operations. The proposed accelerator, implemented in a 28nm CMOS process, achieves a peak energy efficiency of 27.4TOPS/W for 4-bit inference, surpassing prior state-of-the-art designs. Unlike conventional approaches that rely on higher-precision inference or evaluate on basic language modeling tasks like GPT-2, LightRot is optimized for advanced models such as LLaMA2-13B and LLaMA3-8B. Its performance is further validated on MT-Bench, demonstrating robust applicability to real-world conversational scenarios and redefining benchmarks for chat-based AI systems. By synergizing algorithmic innovations and hardware efficiency, this work sets a new paradigm for scalable, low-bit LLM inference, paving the way for sustainable AI advancements.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"231-243"},"PeriodicalIF":3.7,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Hardware Architecture Design for Rotary Position Embedding of Large Language Models 大型语言模型旋转位置嵌入的高效硬件架构设计
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-31 DOI: 10.1109/JETCAS.2025.3556443
Wenjie Li;Gang Wang;Dongxu Lyu;Ningyi Xu;Guanghui He
Due to the substantial demands of storage and computation imposed by large language models (LLMs), there has been a surge of research interest in their hardware acceleration. As a technique involving non-linear operations, rotary position embedding (RoPE) has been adopted by some recently released LLMs. However, there is currently no reported research on its hardware design. This paper, for the first time, presents an efficient hardware architecture design for RoPE of LLMs. We first explore the similarities between RoPE and the coordinate rotation digital computer (CORDIC) algorithm, while also considering the commonly used quantization scheme for LLMs. Additionally, we propose a hardware-friendly solution to address the issue of excessively large input angle ranges. Then we present a CORDIC-based approximation for RoPE and develop a hardware architecture for it. The experimental results demonstrate that our design can save up to 45.7% area cost and 31.0% power consumption when compared with the fixed-point counterpart, while maintaining almost the same model performance. Compared to the straightforward implementation using floating-point arithmetic, our design can reduce up to 91.4% area cost and 88.9% power consumption, with negligible performance loss.
由于大型语言模型(llm)对存储和计算的巨大需求,对其硬件加速的研究兴趣激增。旋转位置嵌入(RoPE)作为一种涉及非线性操作的技术,已被一些新近发布的llm所采用。然而,目前还没有关于其硬件设计的研究报道。本文首次提出了一种高效的llm的硬件架构设计。我们首先探讨了RoPE和坐标旋转数字计算机(CORDIC)算法之间的相似性,同时也考虑了llm常用的量化方案。此外,我们提出了一个硬件友好的解决方案,以解决过大的输入角度范围的问题。然后,我们提出了基于cordic的RoPE近似算法,并为其开发了硬件体系结构。实验结果表明,在保持基本相同的模型性能的情况下,我们的设计与定点设计相比,可以节省45.7%的面积成本和31.0%的功耗。与使用浮点运算的直接实现相比,我们的设计可以减少高达91.4%的面积成本和88.9%的功耗,而性能损失可以忽略不计。
{"title":"Efficient Hardware Architecture Design for Rotary Position Embedding of Large Language Models","authors":"Wenjie Li;Gang Wang;Dongxu Lyu;Ningyi Xu;Guanghui He","doi":"10.1109/JETCAS.2025.3556443","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3556443","url":null,"abstract":"Due to the substantial demands of storage and computation imposed by large language models (LLMs), there has been a surge of research interest in their hardware acceleration. As a technique involving non-linear operations, rotary position embedding (RoPE) has been adopted by some recently released LLMs. However, there is currently no reported research on its hardware design. This paper, for the first time, presents an efficient hardware architecture design for RoPE of LLMs. We first explore the similarities between RoPE and the coordinate rotation digital computer (CORDIC) algorithm, while also considering the commonly used quantization scheme for LLMs. Additionally, we propose a hardware-friendly solution to address the issue of excessively large input angle ranges. Then we present a CORDIC-based approximation for RoPE and develop a hardware architecture for it. The experimental results demonstrate that our design can save up to 45.7% area cost and 31.0% power consumption when compared with the fixed-point counterpart, while maintaining almost the same model performance. Compared to the straightforward implementation using floating-point arithmetic, our design can reduce up to 91.4% area cost and 88.9% power consumption, with negligible performance loss.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"244-257"},"PeriodicalIF":3.7,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
F3: An FPGA-Based Transformer Fine-Tuning Accelerator With Flexible Floating Point Format F3:基于fpga的灵活浮点格式变压器微调加速器
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-31 DOI: 10.1109/JETCAS.2025.3555970
Zerong He;Xi Jin;Zhongguang Xu
Transformers have demonstrated remarkable success across various deep learning tasks. However, their inference and fine-tuning require substantial computation and memory resources, posing challenges for existing hardware platforms, particularly resource-constrained edge devices. To address these limitations, we propose F3, an FPGA-based accelerator for transformer fine-tuning. To reduce computation and memory overhead, this paper proposes a flexible floating point (FFP) format which consumes fewer resources than traditional floating-point formats of the same bitwidth. We adapt low-rank adaptation to FFP format and propose a fine-tuning strategy named LR-FFP which reduces the number of trainable parameters without compromising fine-tuning accuracy. At the hardware level, we design specialized processing elements (PEs) for the FFP format. The PE maximizes the utilization of DSP resources, enabling a single DSP to perform two multiply-accumulate operations per cycle. The PEs are organized into a systolic array (SA) to efficiently handle general matrix multiplication during fine-tuning. Through theoretical analysis and experimental evaluation, we determine the optimal dataflow and SA parameters to balance performance and resource consumption. We implement the architecture on the Xilinx VCU128 FPGA platform and F3 achieves a performance of 8.2 TFlops at 250 MHz. Compared with CPU and GPU implementations, F3 achieves speedups of $15.22 times $ and $3.44 times $ , respectively, and energy efficiency improvements of $70.52 times $ and $9.44 times $ .
变形金刚在各种深度学习任务中取得了显著的成功。然而,它们的推理和微调需要大量的计算和内存资源,对现有的硬件平台,特别是资源受限的边缘设备提出了挑战。为了解决这些限制,我们提出了F3,一种基于fpga的变压器微调加速器。为了减少计算和内存开销,本文提出了一种灵活的浮点(FFP)格式,该格式比相同位宽的传统浮点格式消耗更少的资源。我们对FFP格式进行了低秩自适应,并提出了一种名为LR-FFP的微调策略,该策略在不影响微调精度的情况下减少了可训练参数的数量。在硬件层面,我们为FFP格式设计了专门的处理元素(pe)。PE最大限度地利用了DSP资源,使单个DSP每个周期可以执行两次乘法累加操作。pe被组织成一个收缩数组(SA),以便在微调期间有效地处理一般的矩阵乘法。通过理论分析和实验评估,我们确定了最优的数据流和SA参数,以平衡性能和资源消耗。我们在Xilinx VCU128 FPGA平台上实现了该架构,F3在250 MHz时实现了8.2 TFlops的性能。与CPU和GPU实现相比,F3分别实现了15.22倍和3.44倍的速度提升,以及70.52倍和9.44倍的能效提升。
{"title":"F3: An FPGA-Based Transformer Fine-Tuning Accelerator With Flexible Floating Point Format","authors":"Zerong He;Xi Jin;Zhongguang Xu","doi":"10.1109/JETCAS.2025.3555970","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3555970","url":null,"abstract":"Transformers have demonstrated remarkable success across various deep learning tasks. However, their inference and fine-tuning require substantial computation and memory resources, posing challenges for existing hardware platforms, particularly resource-constrained edge devices. To address these limitations, we propose F<sup>3</sup>, an FPGA-based accelerator for transformer fine-tuning. To reduce computation and memory overhead, this paper proposes a flexible floating point (FFP) format which consumes fewer resources than traditional floating-point formats of the same bitwidth. We adapt low-rank adaptation to FFP format and propose a fine-tuning strategy named LR-FFP which reduces the number of trainable parameters without compromising fine-tuning accuracy. At the hardware level, we design specialized processing elements (PEs) for the FFP format. The PE maximizes the utilization of DSP resources, enabling a single DSP to perform two multiply-accumulate operations per cycle. The PEs are organized into a systolic array (SA) to efficiently handle general matrix multiplication during fine-tuning. Through theoretical analysis and experimental evaluation, we determine the optimal dataflow and SA parameters to balance performance and resource consumption. We implement the architecture on the Xilinx VCU128 FPGA platform and F<sup>3</sup> achieves a performance of 8.2 TFlops at 250 MHz. Compared with CPU and GPU implementations, F<sup>3</sup> achieves speedups of <inline-formula> <tex-math>$15.22 times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$3.44 times $ </tex-math></inline-formula>, respectively, and energy efficiency improvements of <inline-formula> <tex-math>$70.52 times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$9.44 times $ </tex-math></inline-formula>.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"258-271"},"PeriodicalIF":3.7,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Die-Level Transformation From 2D Shuttle Chips to 3D-IC With TSV for Advanced Rapid Prototyping Methodology With Meta Bonding 采用元键合的先进快速成型方法,从2D穿梭芯片到3d集成电路的TSV模级转换
IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-20 DOI: 10.1109/JETCAS.2025.3572003
Takafumi Fukushima;Tetsu Tanaka;Mitsumasa Koyanagi
3D-IC technology, it may be more appropriate to refer to this as TSV (Through-Si Via) formation technology, has been maturing year by year and is increasingly utilized in advanced semiconductor devices, such as 3D CIS (CMOS Image Sensor), HBM (High-Bandwidth Memory), and SRAM-on-CPU (named 3D V-Cache) devices. However, the initial development costs remain prohibitively high, largely due to the substantial investment required for TSV formation at the wafer level. Meanwhile, conventional System on a Chips (SoCs) are transitioning from Fin-FET to GAA (Gate All Around) using the latest beyond 3-nm technology nodes, incorporating extreme ultraviolet (EUV) and other cutting-edge techniques. Meanwhile, the academic community is establishing an environment conducive to the utilization of nodes ranging from legacy 180 nm to 7 nm, making it feasible for designers to obtain 2D IC chips with their novel architectures at a reduced cost. Despite these advancements, foundry shuttle services employing TSV are still almost impossible to utilize, and performing proof of principle and functional verification using 3D-ICs remains extremely challenging. This article introduces recent advancements in technology that can transform 2D-ICs into 3D-ICs using shuttle chips for Multi-Project Wafers (MPWs) at a small scale to a large scale. This article mainly focuses on discussing the facilitation of die-level short-TAT (turnaround time) 3D-IC fabrication with key elemental technologies of multi-chip thinning and TSV/microbump formation. In addition, the effectiveness of Meta Bonding, such as fine-pitch microbump and direct/hybrid bonding, is described for future high-performance 3D-IC prototyping.
3D- ic技术,更合适的说法是TSV (Through-Si Via)形成技术,已经逐年成熟,并越来越多地应用于先进的半导体器件,如3D CIS (CMOS图像传感器)、HBM(高带宽存储器)和SRAM-on-CPU(称为3D V-Cache)器件。然而,最初的开发成本仍然过高,这主要是由于在晶圆一级形成TSV所需的大量投资。与此同时,传统的片上系统(soc)正在使用最新的超3nm技术节点,结合极紫外(EUV)和其他尖端技术,从Fin-FET过渡到GAA (Gate All Around)。与此同时,学术界正在建立一个有利于利用从传统的180纳米到7纳米节点的环境,使设计人员能够以更低的成本获得具有新颖架构的2D IC芯片。尽管取得了这些进步,但采用TSV的代工厂穿梭服务仍然几乎不可能利用,并且使用3d - ic进行原理验证和功能验证仍然极具挑战性。本文介绍了利用多项目晶圆(mpw)的穿梭芯片从小规模到大规模将2d - ic转换为3d - ic的最新技术进展。本文主要讨论了多芯片细化和TSV/微凸点形成的关键基本技术对模级短tat(周转时间)3d集成电路制造的促进作用。此外,Meta键合的有效性,如细间距微碰撞和直接/混合键合,描述了未来高性能3D-IC原型的有效性。
{"title":"Die-Level Transformation From 2D Shuttle Chips to 3D-IC With TSV for Advanced Rapid Prototyping Methodology With Meta Bonding","authors":"Takafumi Fukushima;Tetsu Tanaka;Mitsumasa Koyanagi","doi":"10.1109/JETCAS.2025.3572003","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3572003","url":null,"abstract":"3D-IC technology, it may be more appropriate to refer to this as TSV (Through-Si Via) formation technology, has been maturing year by year and is increasingly utilized in advanced semiconductor devices, such as 3D CIS (CMOS Image Sensor), HBM (High-Bandwidth Memory), and SRAM-on-CPU (named 3D V-Cache) devices. However, the initial development costs remain prohibitively high, largely due to the substantial investment required for TSV formation at the wafer level. Meanwhile, conventional System on a Chips (SoCs) are transitioning from Fin-FET to GAA (Gate All Around) using the latest beyond 3-nm technology nodes, incorporating extreme ultraviolet (EUV) and other cutting-edge techniques. Meanwhile, the academic community is establishing an environment conducive to the utilization of nodes ranging from legacy 180 nm to 7 nm, making it feasible for designers to obtain 2D IC chips with their novel architectures at a reduced cost. Despite these advancements, foundry shuttle services employing TSV are still almost impossible to utilize, and performing proof of principle and functional verification using 3D-ICs remains extremely challenging. This article introduces recent advancements in technology that can transform 2D-ICs into 3D-ICs using shuttle chips for Multi-Project Wafers (MPWs) at a small scale to a large scale. This article mainly focuses on discussing the facilitation of die-level short-TAT (turnaround time) 3D-IC fabrication with key elemental technologies of multi-chip thinning and TSV/microbump formation. In addition, the effectiveness of Meta Bonding, such as fine-pitch microbump and direct/hybrid bonding, is described for future high-performance 3D-IC prototyping.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"415-426"},"PeriodicalIF":3.8,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11007580","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145060958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GenPolar: Generative AI-Aided Complexity Reduction for Polar SCL Decoding GenPolar:生成ai辅助的极性SCL解码复杂性降低
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-19 DOI: 10.1109/JETCAS.2025.3561330
Yutai Sun;Jingyi Chen;Yuqing Ren;Houren Ji;Yongming Huang;Xiaohu You;Chuan Zhang
The CRC-aided successive cancellation list (CA-SCL) decoding algorithm for polar codes has gained widespread adoption thanks to its outstanding performance. However, with the evolution of 6G technologies, the high complexity of CA-SCL decoding poses a challenge in meeting growing performance requirements. Consequently, it is crucial to devise strategies that reduce this complexity without compromising error rates. Current efforts to mitigate the complexity mainly depend on harnessing special nodes associated with the code construction sequences, such as Fast-SCL decoding. However, these strategies suffer from redundant complexity due to ill-suited construction sequences and unnecessary sorting operations within special nodes. Addressing this issue, this paper proposes a hardware-friendly and GenAI-aided complexity reduction approach for Fast-SCL decoding, named GenPolar. This approach involves two-step optimization techniques: 1) Transformer encoder models for generating polar construction sequences, and 2) a sorting entropy based method for sorting reduction. These two-step techniques result in reduced complexity with negligible performance loss. For polar codes of length-1024 with code rates of 0.25, 0.50, and 0.75, GenPolar achieves latency reductions of 20.6%, 29.8%, and 40.6%, respectively. Even benchmarking against the reduced-complexity version of Fast-SCL decoding, the relative gains are 14.0%, 17.8%, and 22.3%, respectively. It should be noted that the immediate application is not limited to Fast-SCL decoding but also extends to other node-based SCL decoding algorithms like SSCL-SPC and SR-SCL.
crc辅助连续消去表(CA-SCL)译码算法以其优异的性能得到了广泛的应用。然而,随着6G技术的发展,CA-SCL解码的高复杂性在满足日益增长的性能要求方面提出了挑战。因此,设计在不影响错误率的情况下降低这种复杂性的策略是至关重要的。当前减轻复杂性的努力主要依赖于利用与代码构建序列相关的特殊节点,例如Fast-SCL解码。然而,由于不合适的构造序列和特殊节点内不必要的排序操作,这些策略存在冗余复杂性。为了解决这个问题,本文提出了一种硬件友好和genai辅助的快速scl解码复杂性降低方法,称为GenPolar。该方法涉及两步优化技术:1)用于生成极性构造序列的变压器编码器模型,以及2)基于排序熵的排序缩减方法。这些两步技术降低了复杂性,性能损失可以忽略不计。对于长度为-1024、码率为0.25、0.50和0.75的极性码,GenPolar可以分别减少20.6%、29.8%和40.6%的延迟。即使对降低复杂度版本的Fast-SCL解码进行基准测试,相对增益也分别为14.0%、17.8%和22.3%。值得注意的是,即时应用不仅限于Fast-SCL解码,还扩展到其他基于节点的SCL解码算法,如SSCL-SPC和SR-SCL。
{"title":"GenPolar: Generative AI-Aided Complexity Reduction for Polar SCL Decoding","authors":"Yutai Sun;Jingyi Chen;Yuqing Ren;Houren Ji;Yongming Huang;Xiaohu You;Chuan Zhang","doi":"10.1109/JETCAS.2025.3561330","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3561330","url":null,"abstract":"The CRC-aided successive cancellation list (CA-SCL) decoding algorithm for polar codes has gained widespread adoption thanks to its outstanding performance. However, with the evolution of 6G technologies, the high complexity of CA-SCL decoding poses a challenge in meeting growing performance requirements. Consequently, it is crucial to devise strategies that reduce this complexity without compromising error rates. Current efforts to mitigate the complexity mainly depend on harnessing <monospace>special nodes</monospace> associated with the code construction sequences, such as Fast-SCL decoding. However, these strategies suffer from redundant complexity due to ill-suited construction sequences and unnecessary sorting operations within special nodes. Addressing this issue, this paper proposes a hardware-friendly and GenAI-aided complexity reduction approach for Fast-SCL decoding, named GenPolar. This approach involves two-step optimization techniques: 1) <italic>Transformer encoder models</i> for generating polar construction sequences, and 2) <italic>a sorting entropy based method</i> for sorting reduction. These two-step techniques result in reduced complexity with negligible performance loss. For polar codes of length-1024 with code rates of 0.25, 0.50, and 0.75, GenPolar achieves latency reductions of 20.6%, 29.8%, and 40.6%, respectively. Even benchmarking against the reduced-complexity version of Fast-SCL decoding, the relative gains are 14.0%, 17.8%, and 22.3%, respectively. It should be noted that the immediate application is not limited to Fast-SCL decoding but also extends to other node-based SCL decoding algorithms like SSCL-SPC and SR-SCL.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"312-324"},"PeriodicalIF":3.7,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11007206","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144481828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial on Circuits and Systems for Green Video Communications 绿色视频通信的电路和系统社论
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-12 DOI: 10.1109/JETCAS.2025.3541767
Christian Herglotz;Daniel Palomino;Olivier Le Meur;C.-C. Jay Kuo
{"title":"Editorial on Circuits and Systems for Green Video Communications","authors":"Christian Herglotz;Daniel Palomino;Olivier Le Meur;C.-C. Jay Kuo","doi":"10.1109/JETCAS.2025.3541767","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3541767","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 1","pages":"1-3"},"PeriodicalIF":3.7,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10924431","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Journal on Emerging and Selected Topics in Circuits and Systems Information for Authors IEEE关于电路和系统信息中新兴和选定主题的作者期刊
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-12 DOI: 10.1109/JETCAS.2025.3538141
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems Information for Authors","authors":"","doi":"10.1109/JETCAS.2025.3538141","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3538141","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 1","pages":"143-143"},"PeriodicalIF":3.7,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10924454","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Journal on Emerging and Selected Topics in Circuits and Systems Publication Information IEEE关于电路和系统中新兴和选定主题的期刊出版信息
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-12 DOI: 10.1109/JETCAS.2025.3538139
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems Publication Information","authors":"","doi":"10.1109/JETCAS.2025.3538139","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3538139","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 1","pages":"C2-C2"},"PeriodicalIF":3.7,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10924430","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Journal on Emerging and Selected Topics in Circuits and Systems IEEE电路与系统中新兴和选定主题杂志
IF 3.7 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-03-12 DOI: 10.1109/JETCAS.2025.3538143
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","authors":"","doi":"10.1109/JETCAS.2025.3538143","DOIUrl":"https://doi.org/10.1109/JETCAS.2025.3538143","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 1","pages":"C3-C3"},"PeriodicalIF":3.7,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10924450","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1