首页 > 最新文献

IEEE Transactions on Circuits and Systems I: Regular Papers最新文献

英文 中文
IEEE Transactions on Circuits and Systems--I: Regular Papers Information for Authors IEEE 《电路与系统》期刊--I:常规论文 作者须知
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-28 DOI: 10.1109/TCSI.2024.3441436
{"title":"IEEE Transactions on Circuits and Systems--I: Regular Papers Information for Authors","authors":"","doi":"10.1109/TCSI.2024.3441436","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3441436","url":null,"abstract":"","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":null,"pages":null},"PeriodicalIF":5.2,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10654560","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142090920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bidirectional High Step-Up/Down DC/DC Converter With a Coupled Inductor and Switched Capacitor 采用耦合电感器和开关电容器的双向高升/降压 DC/DC 转换器
IF 5.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-27 DOI: 10.1109/tcsi.2024.3436694
Sang-Wha Seo, Joon-Hyoung Ryu, June-Seok Lee
{"title":"Bidirectional High Step-Up/Down DC/DC Converter With a Coupled Inductor and Switched Capacitor","authors":"Sang-Wha Seo, Joon-Hyoung Ryu, June-Seok Lee","doi":"10.1109/tcsi.2024.3436694","DOIUrl":"https://doi.org/10.1109/tcsi.2024.3436694","url":null,"abstract":"","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Integer-Only-Inference of Gradient Boosting Decision Trees on Low-Power Devices 低功耗设备上梯度提升决策树的高效整数推理
IF 5.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-27 DOI: 10.1109/tcsi.2024.3446582
Majed Alsharari, Son T. Mai, Roger Woods, Carlos Reaño
{"title":"Efficient Integer-Only-Inference of Gradient Boosting Decision Trees on Low-Power Devices","authors":"Majed Alsharari, Son T. Mai, Roger Woods, Carlos Reaño","doi":"10.1109/tcsi.2024.3446582","DOIUrl":"https://doi.org/10.1109/tcsi.2024.3446582","url":null,"abstract":"","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Power High Precision Floating-Point Divider With Bidimensional Linear Approximation 采用双维线性逼近的低功耗高精度浮点运算除法器
IF 5.1 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-27 DOI: 10.1109/tcsi.2024.3447830
Gennaro Di Meo, Antonio Giuseppe Maria Strollo, Davide De Caro, Luca Tegazzini, Ettore Napoli
{"title":"Low-Power High Precision Floating-Point Divider With Bidimensional Linear Approximation","authors":"Gennaro Di Meo, Antonio Giuseppe Maria Strollo, Davide De Caro, Luca Tegazzini, Ettore Napoli","doi":"10.1109/tcsi.2024.3447830","DOIUrl":"https://doi.org/10.1109/tcsi.2024.3447830","url":null,"abstract":"","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":null,"pages":null},"PeriodicalIF":5.1,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient FPGA-Based Dilated and Transposed Convolutional Neural Network Accelerator 基于 FPGA 的高效稀释和变换卷积神经网络加速器
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-27 DOI: 10.1109/TCSI.2024.3428636
Tsung-Hsi Wu;Chang Shu;Tsung-Te Liu
This work presents a Field Programmable Gate Array (FPGA)-based deep neural network (DNN) accelerator that can maintain consistently high efficiency when executing various neural network architectures, including convolutional neural network (CNN), transposed and dilated convolution (TD-convolution) operations for modern computer vision (CV) tasks. To deal with the utilization degradation issue with a large processing unit (PE) array, a 3-D mapping strategy that adaptively tailors different layer configurations is proposed to optimize the parallelism dimensions of the PE, which significantly increases the hardware utilization to enhance the accelerator efficiency. Moreover, to minimize the implementation and performance overhead resulting from the TD-convolution operations, a unified processing flow is proposed to realize an integrated operation of traditional and TD-convolution. This allows the accelerator to bypass redundant zero operations, further boosting overall efficiency. The 4096-PE accelerator implementation on Intel Stratix 10 FPGA achieves a throughput performance of 2.597–2.870 TOPS with an efficiency of 0.63-0.70 GOPS/DSP across various DNN networks. This represents $1.72times $ and $1.73times $ improvement in throughput and efficiency, respectively, compared to the state-of-the-art designs.
本研究提出了一种基于现场可编程门阵列(FPGA)的深度神经网络(DNN)加速器,该加速器在执行各种神经网络架构(包括卷积神经网络(CNN)、转置卷积和扩张卷积(TD-convolution)操作)时能够保持持续的高效率,从而满足现代计算机视觉(CV)任务的要求。为解决大型处理单元(PE)阵列利用率下降的问题,提出了一种自适应调整不同层配置的三维映射策略,以优化 PE 的并行性维度,从而显著提高硬件利用率,提升加速器效率。此外,为了最大限度地减少 TD 卷积操作带来的执行和性能开销,还提出了一种统一的处理流程,以实现传统卷积和 TD 卷积的集成操作。这样,加速器就可以绕过多余的零操作,进一步提高整体效率。在英特尔 Stratix 10 FPGA 上实现的 4096-PE 加速器在各种 DNN 网络中实现了 2.597-2.870 TOPS 的吞吐量性能和 0.63-0.70 GOPS/DSP 的效率。与最先进的设计相比,吞吐量和效率分别提高了 1.72 倍和 1.73 倍。
{"title":"An Efficient FPGA-Based Dilated and Transposed Convolutional Neural Network Accelerator","authors":"Tsung-Hsi Wu;Chang Shu;Tsung-Te Liu","doi":"10.1109/TCSI.2024.3428636","DOIUrl":"10.1109/TCSI.2024.3428636","url":null,"abstract":"This work presents a Field Programmable Gate Array (FPGA)-based deep neural network (DNN) accelerator that can maintain consistently high efficiency when executing various neural network architectures, including convolutional neural network (CNN), transposed and dilated convolution (TD-convolution) operations for modern computer vision (CV) tasks. To deal with the utilization degradation issue with a large processing unit (PE) array, a 3-D mapping strategy that adaptively tailors different layer configurations is proposed to optimize the parallelism dimensions of the PE, which significantly increases the hardware utilization to enhance the accelerator efficiency. Moreover, to minimize the implementation and performance overhead resulting from the TD-convolution operations, a unified processing flow is proposed to realize an integrated operation of traditional and TD-convolution. This allows the accelerator to bypass redundant zero operations, further boosting overall efficiency. The 4096-PE accelerator implementation on Intel Stratix 10 FPGA achieves a throughput performance of 2.597–2.870 TOPS with an efficiency of 0.63-0.70 GOPS/DSP across various DNN networks. This represents \u0000<inline-formula> <tex-math>$1.72times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$1.73times $ </tex-math></inline-formula>\u0000 improvement in throughput and efficiency, respectively, compared to the state-of-the-art designs.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":null,"pages":null},"PeriodicalIF":5.2,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modular Expansion Method for Wireless Power Transfer Systems With Arbitrary Topologies 具有任意拓扑结构的无线电力传输系统的模块化扩展方法
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-27 DOI: 10.1109/TCSI.2024.3438563
Chao Cui;Chunbo Zhu;Xin Gao;Shumei Cui;Qianfan Zhang;C. C. Chan
Modular parallel inverter technology can enhance the power level and redundancy of wireless power transfer (WPT) systems, contributing to standardized production. It serves as an effective method for realizing high-power systems. However, inappropriate selection of compensation components can negatively affect the system’s efficiency, power factor, and the flexibility of modularization. This paper analyzes two key properties of modular-parallel-inverter WPT (MPI-WPT) systems: module number flexibility and modular deviation suppression. Firstly, to assess the modular deviation suppression of the system, its definition and calculation method are provided. Secondly, the expansion condition to achieve modular flexibility is examined, highlighting that the modular system needs to be a fully resonant system. Subsequently, an expansion design methodology from a single WPT system to an MPI-WPT system is proposed, ensuring the preservation of the original properties of the system during its extension. Finally, the parallel characteristics of MPI-WPT systems were experimentally verified.
模块化并联逆变器技术可提高无线电力传输(WPT)系统的功率水平和冗余度,有助于实现标准化生产。它是实现大功率系统的有效方法。然而,补偿元件选择不当会对系统的效率、功率因数和模块化的灵活性产生负面影响。本文分析了模块化并联逆变器 WPT(MPI-WPT)系统的两个关键特性:模块数量灵活性和模块偏差抑制。首先,为了评估系统的模块化偏差抑制,本文给出了模块化偏差抑制的定义和计算方法。其次,研究了实现模块灵活性的扩展条件,强调模块系统必须是一个完全谐振的系统。随后,提出了从单一 WPT 系统到 MPI-WPT 系统的扩展设计方法,确保系统在扩展过程中保持原有特性。最后,实验验证了 MPI-WPT 系统的并行特性。
{"title":"Modular Expansion Method for Wireless Power Transfer Systems With Arbitrary Topologies","authors":"Chao Cui;Chunbo Zhu;Xin Gao;Shumei Cui;Qianfan Zhang;C. C. Chan","doi":"10.1109/TCSI.2024.3438563","DOIUrl":"10.1109/TCSI.2024.3438563","url":null,"abstract":"Modular parallel inverter technology can enhance the power level and redundancy of wireless power transfer (WPT) systems, contributing to standardized production. It serves as an effective method for realizing high-power systems. However, inappropriate selection of compensation components can negatively affect the system’s efficiency, power factor, and the flexibility of modularization. This paper analyzes two key properties of modular-parallel-inverter WPT (MPI-WPT) systems: module number flexibility and modular deviation suppression. Firstly, to assess the modular deviation suppression of the system, its definition and calculation method are provided. Secondly, the expansion condition to achieve modular flexibility is examined, highlighting that the modular system needs to be a fully resonant system. Subsequently, an expansion design methodology from a single WPT system to an MPI-WPT system is proposed, ensuring the preservation of the original properties of the system during its extension. Finally, the parallel characteristics of MPI-WPT systems were experimentally verified.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":null,"pages":null},"PeriodicalIF":5.2,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis and Design of a 21.2-to-25.5-GHz Triple-Coil Transformer-Coupled QVCO 21.2 至 25.5 千兆赫三线圈变压器耦合 QVCO 的分析与设计
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-22 DOI: 10.1109/TCSI.2024.3445179
Ya Zhao;Chao Fan;Jun Yin;Pui-In Mak;Li Geng
This paper reports a triple-coil transformer-coupled quadrature voltage-controlled oscillator (TC-QVCO), which inherently provides the quadrature signal without using the noisy active-coupling transistors. The determinate correlation of tank voltages is verified by utilizing the initial state to facilitate the oscillation state analysis. Thus, the TC-QVCO would operate without the oscillation mode ambiguity. Additionally, thanks to the triple-coil transformer coupling, a large source coil $L_{S}$ aids in achieving in-phase coupling for phase noise (PN) improvement, and the intensified coupling factor $k_{gd}$ benefits reducing the PN and the quadrature phase error simultaneously. Therefore, our TC-QVCO would alleviate the tradeoff between PN and quadrature phase accuracy via using a large $L_{S}$ and $k_{gd}$ . The proposed QVCO prototyped in 65-nm CMOS exhibits a superior FoM $_{text {@10MHz}}$ (180.1 to 182.2 dBc/Hz) over a 18.2% frequency tuning range (21.2 to 25.5 GHz), and the estimated quadrature phase error <0.8°.
本文报告了一种三线圈变压器耦合正交压控振荡器(TC-QVCO),它无需使用噪声有源耦合晶体管即可提供正交信号。利用初始状态验证了槽电压的确定相关性,以方便振荡状态分析。因此,TC-QVCO 在运行时不会出现振荡模式模糊的问题。此外,由于采用了三线圈变压器耦合,大源线圈 $L_{S}$ 有助于实现同相耦合以改善相位噪声 (PN),而增强的耦合系数 $k_{gd}$ 则有利于同时降低 PN 和正交相位误差。因此,我们的 TC-QVCO 将通过使用较大的 $L_{S}$ 和 $k_{gd}$ 来减轻 PN 和正交相位精度之间的权衡。拟议的 QVCO 原型采用 65-nm CMOS 制成,在 18.2% 的频率调整范围内(21.2 至 25.5 GHz)显示出卓越的 FoM $_{text {@10MHz}}$ (180.1 至 182.2 dBc/Hz),估计正交相位误差小于 0.8°。
{"title":"Analysis and Design of a 21.2-to-25.5-GHz Triple-Coil Transformer-Coupled QVCO","authors":"Ya Zhao;Chao Fan;Jun Yin;Pui-In Mak;Li Geng","doi":"10.1109/TCSI.2024.3445179","DOIUrl":"10.1109/TCSI.2024.3445179","url":null,"abstract":"This paper reports a triple-coil transformer-coupled quadrature voltage-controlled oscillator (TC-QVCO), which inherently provides the quadrature signal without using the noisy active-coupling transistors. The determinate correlation of tank voltages is verified by utilizing the initial state to facilitate the oscillation state analysis. Thus, the TC-QVCO would operate without the oscillation mode ambiguity. Additionally, thanks to the triple-coil transformer coupling, a large source coil \u0000<inline-formula> <tex-math>$L_{S}$ </tex-math></inline-formula>\u0000 aids in achieving in-phase coupling for phase noise (PN) improvement, and the intensified coupling factor \u0000<inline-formula> <tex-math>$k_{gd}$ </tex-math></inline-formula>\u0000 benefits reducing the PN and the quadrature phase error simultaneously. Therefore, our TC-QVCO would alleviate the tradeoff between PN and quadrature phase accuracy via using a large \u0000<inline-formula> <tex-math>$L_{S}$ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$k_{gd}$ </tex-math></inline-formula>\u0000. The proposed QVCO prototyped in 65-nm CMOS exhibits a superior FoM\u0000<inline-formula> <tex-math>$_{text {@10MHz}}$ </tex-math></inline-formula>\u0000 (180.1 to 182.2 dBc/Hz) over a 18.2% frequency tuning range (21.2 to 25.5 GHz), and the estimated quadrature phase error <0.8°.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":null,"pages":null},"PeriodicalIF":5.2,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 0.5-V 0.02% THD Bulk-Driven OTA for Continuous-Time Applications in 180 nm CMOS 用于 180 nm CMOS 连续时间应用的 0.5 V 0.02% THD 块状驱动 OTA
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-22 DOI: 10.1109/TCSI.2024.3443452
Yangxin Xiang;Huajun Yao;Minghao Jiang;Junkun Chen;Yongzhen Chen;Jiangfeng Wu
This paper introduces a 0.5-V, two-stage, pseudo-differential bulk-driven operational transconductance amplifier with high gain and linearity for low-power continuous-time applications. The input stage’s common-mode feedback utilizes a linear resistor detector with a conductance reduction cross-coupled pair to mitigate the loading effect of the common-mode detector resistor at ultra-low power operations. Temperature dependence of the conductance reduction circuit is compensated. The impact of the conductance reduction circuit on linearity performance is explored. The output stage employs a class-AB topology and utilizes a phase lead compensator to stabilize the OTA. Implemented in 180 nm CMOS technology, this OTA achieves a DC gain and slew rate of 68 dB and 26.1 V/ $mu $ s, respectively, under a capacitive load of 10 pF. The total power consumption is $34.2~mu $ W. With unit gain feedback configuration, the measured total harmonic distortion is only 0.02% at an output amplitude of 500 mVpp. Test results validate the proposed circuit, positioning it competitively compared to state-of-the-art designs.
本文介绍了一种 0.5 V、两级、伪差分批量驱动运算跨导放大器,具有高增益和线性度,适用于低功耗连续时间应用。输入级的共模反馈采用线性电阻检测器和电导降低交叉耦合对,以减轻超低功耗工作时共模检测器电阻的负载效应。电导降低电路的温度依赖性得到了补偿。探讨了电导降低电路对线性性能的影响。输出级采用 AB 类拓扑结构,并利用相位导联补偿器来稳定 OTA。该 OTA 采用 180 nm CMOS 技术实现,在 10 pF 的电容负载下,直流增益和压摆率分别达到 68 dB 和 26.1 V/ $mu$s。在单位增益反馈配置下,输出振幅为 500 mVpp 时,测得的总谐波失真仅为 0.02%。测试结果验证了所提出的电路,使其与最先进的设计相比更具竞争力。
{"title":"A 0.5-V 0.02% THD Bulk-Driven OTA for Continuous-Time Applications in 180 nm CMOS","authors":"Yangxin Xiang;Huajun Yao;Minghao Jiang;Junkun Chen;Yongzhen Chen;Jiangfeng Wu","doi":"10.1109/TCSI.2024.3443452","DOIUrl":"10.1109/TCSI.2024.3443452","url":null,"abstract":"This paper introduces a 0.5-V, two-stage, pseudo-differential bulk-driven operational transconductance amplifier with high gain and linearity for low-power continuous-time applications. The input stage’s common-mode feedback utilizes a linear resistor detector with a conductance reduction cross-coupled pair to mitigate the loading effect of the common-mode detector resistor at ultra-low power operations. Temperature dependence of the conductance reduction circuit is compensated. The impact of the conductance reduction circuit on linearity performance is explored. The output stage employs a class-AB topology and utilizes a phase lead compensator to stabilize the OTA. Implemented in 180 nm CMOS technology, this OTA achieves a DC gain and slew rate of 68 dB and 26.1 V/\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000 s, respectively, under a capacitive load of 10 pF. The total power consumption is \u0000<inline-formula> <tex-math>$34.2~mu $ </tex-math></inline-formula>\u0000 W. With unit gain feedback configuration, the measured total harmonic distortion is only 0.02% at an output amplitude of 500 mVpp. Test results validate the proposed circuit, positioning it competitively compared to state-of-the-art designs.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":null,"pages":null},"PeriodicalIF":5.2,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of Group-Approximate Expectation Propagation Algorithm for Uplink MIMO-SCMA Detection Using 16-Point Codebook 使用 16 点编码本实现上行链路 MIMO-SCMA 检测的组近似期望传播算法
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-20 DOI: 10.1109/TCSI.2024.3439616
Mei-Hsuan Chang;Pei-Yun Tsai
The complexity of conventional massage propagation algorithm (MPA) for detection of sparse code multiple access (SCMA) grows exponentially as the size of the codebook increases, posing a challenge for hardware implementation of large-size codebooks. Expectation propagation algorithm (EPA) has shown its superiority owing to its linear complexity with respect to the codebook size. In this paper, we propose log-domain group-approximate EPA (GA-EPA) for further complexity reduction. The mother constellation points are partitioned into several groups, which can simplify the calculation of posterior probability. Compared to conventional EPA, log-domain GA-EPA can reduce approximately 76.4% of multiplications and 53.8% of divisions for MIMO-SCMA signal detection. A GA-EPA detector is then designed in 40nm CMOS technology, we use customized floating-point to shorten word-lengths and to exploit the property of exponential function for accomplishing 17% total area reduction and more than 99% table reduction. From the synthesis results, our design for MIMO-SCMA detection with 16-point codebook from 4 receiving antennas can achieve a throughput of 364Mbps at an operating frequency of 167MHz. Compared to the prior MPA-related implementations, our work outperforms in normalized hardware efficiency and demonstrates a promising solution for large codebook cardinality.
用于稀疏码多路存取(SCMA)检测的传统按摩传播算法(MPA)的复杂度随着码本大小的增加而呈指数增长,这给大容量码本的硬件实现带来了挑战。由于期望传播算法(EPA)的复杂度与码本大小呈线性关系,因此显示出其优越性。本文提出了对数域分组近似 EPA(GA-EPA),以进一步降低复杂度。将母星座点划分为若干组,可以简化后验概率的计算。与传统的 EPA 相比,对数域 GA-EPA 可为 MIMO-SCMA 信号检测减少约 76.4% 的乘法和 53.8% 的除法。我们使用定制浮点来缩短字长,并利用指数函数的特性,使总面积减少了 17%,表格减少了 99%。从综合结果来看,我们设计的 MIMO-SCMA 检测系统采用 4 个接收天线 16 点编码本,在 167MHz 工作频率下可实现 364Mbps 的吞吐量。与之前的 MPA 相关实现相比,我们的工作在归一化硬件效率方面表现出色,并为大编码本卡数提供了一种有前途的解决方案。
{"title":"Implementation of Group-Approximate Expectation Propagation Algorithm for Uplink MIMO-SCMA Detection Using 16-Point Codebook","authors":"Mei-Hsuan Chang;Pei-Yun Tsai","doi":"10.1109/TCSI.2024.3439616","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3439616","url":null,"abstract":"The complexity of conventional massage propagation algorithm (MPA) for detection of sparse code multiple access (SCMA) grows exponentially as the size of the codebook increases, posing a challenge for hardware implementation of large-size codebooks. Expectation propagation algorithm (EPA) has shown its superiority owing to its linear complexity with respect to the codebook size. In this paper, we propose log-domain group-approximate EPA (GA-EPA) for further complexity reduction. The mother constellation points are partitioned into several groups, which can simplify the calculation of posterior probability. Compared to conventional EPA, log-domain GA-EPA can reduce approximately 76.4% of multiplications and 53.8% of divisions for MIMO-SCMA signal detection. A GA-EPA detector is then designed in 40nm CMOS technology, we use customized floating-point to shorten word-lengths and to exploit the property of exponential function for accomplishing 17% total area reduction and more than 99% table reduction. From the synthesis results, our design for MIMO-SCMA detection with 16-point codebook from 4 receiving antennas can achieve a throughput of 364Mbps at an operating frequency of 167MHz. Compared to the prior MPA-related implementations, our work outperforms in normalized hardware efficiency and demonstrates a promising solution for large codebook cardinality.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":null,"pages":null},"PeriodicalIF":5.2,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142368459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware-Efficient SoftMax Architecture With Bit-Wise Exponentiation and Reciprocal Calculation 采用比特-明智幂级数和倒数计算的硬件高效 SoftMax 架构
IF 5.2 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-08-20 DOI: 10.1109/TCSI.2024.3443270
Jeongmin Kim;Sungho Kim;Kangjoon Choi;In-Cheol Park
The SoftMax function is one of the activation functions used in deep neural networks (DNN) to normalize input values to the range of (0,1). With the advent of DNN models including the Transformer, operations utilizing SoftMax have gained significant attention, and the efficient hardware implementation of such operations has become a prominent issue in hardware realization. Implementing SoftMax often involves exponential and division operations, which can be a significant bottleneck in terms of hardware cost and performance. Various efforts have been made to address this challenge, and this paper introduces a novel approach to efficiently implement SoftMax. In most previous works, the maximum input value is subtracted from all the input values to ensure numerical stability. In the proposed approach, the maximum value is replaced with a different value to reduce the hardware complexity with ensuring numerical stability. Additionally, in exponential operations, simple Look-Up Tables (LUTs) with only one entry each are used for bit-wise calculations, and the reciprocal of the total exponential sum is computed to replace division with multiplication. Applying the proposed methods reduces the computational complexity significantly compared to the previous log-sum-exp approach. As a result, the proposed 8-bit SoftMax accelerator achieves a high operating frequency of 3.12GHz and a high throughput of 25G inputs/s. It also improves area efficiency and power consumption by at least 2 times. From an accuracy perspective, furthermore, it is associated with similar or even better accuracy compared to previous works.
SoftMax 函数是深度神经网络(DNN)中使用的激活函数之一,用于将输入值归一化到(0,1)范围内。随着包括 Transformer 在内的 DNN 模型的出现,利用 SoftMax 进行的运算获得了极大关注,而如何高效地通过硬件实现此类运算已成为硬件实现中的一个突出问题。实现 SoftMax 通常涉及指数运算和除法运算,这可能成为硬件成本和性能方面的重大瓶颈。为了应对这一挑战,人们做出了各种努力,本文介绍了一种高效实现 SoftMax 的新方法。在以往的大多数研究中,最大输入值会从所有输入值中减去,以确保数值的稳定性。在本文提出的方法中,最大值被替换为一个不同的值,从而在确保数值稳定性的同时降低了硬件复杂性。此外,在指数运算中,使用每个只有一个条目的简单查找表(LUT)进行按位计算,并计算指数总和的倒数,以乘法取代除法。与以前的对数求和-指数法相比,采用所提出的方法大大降低了计算复杂度。因此,拟议的 8 位 SoftMax 加速器实现了 3.12GHz 的高工作频率和 25G 输入/秒的高吞吐量。它还将面积效率和功耗提高了至少 2 倍。此外,从精度角度来看,它的精度与之前的研究成果相近甚至更高。
{"title":"Hardware-Efficient SoftMax Architecture With Bit-Wise Exponentiation and Reciprocal Calculation","authors":"Jeongmin Kim;Sungho Kim;Kangjoon Choi;In-Cheol Park","doi":"10.1109/TCSI.2024.3443270","DOIUrl":"https://doi.org/10.1109/TCSI.2024.3443270","url":null,"abstract":"The SoftMax function is one of the activation functions used in deep neural networks (DNN) to normalize input values to the range of (0,1). With the advent of DNN models including the Transformer, operations utilizing SoftMax have gained significant attention, and the efficient hardware implementation of such operations has become a prominent issue in hardware realization. Implementing SoftMax often involves exponential and division operations, which can be a significant bottleneck in terms of hardware cost and performance. Various efforts have been made to address this challenge, and this paper introduces a novel approach to efficiently implement SoftMax. In most previous works, the maximum input value is subtracted from all the input values to ensure numerical stability. In the proposed approach, the maximum value is replaced with a different value to reduce the hardware complexity with ensuring numerical stability. Additionally, in exponential operations, simple Look-Up Tables (LUTs) with only one entry each are used for bit-wise calculations, and the reciprocal of the total exponential sum is computed to replace division with multiplication. Applying the proposed methods reduces the computational complexity significantly compared to the previous log-sum-exp approach. As a result, the proposed 8-bit SoftMax accelerator achieves a high operating frequency of 3.12GHz and a high throughput of 25G inputs/s. It also improves area efficiency and power consumption by at least 2 times. From an accuracy perspective, furthermore, it is associated with similar or even better accuracy compared to previous works.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":null,"pages":null},"PeriodicalIF":5.2,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142377007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Circuits and Systems I: Regular Papers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1