首页 > 最新文献

IEEE Transactions on Circuits and Systems II: Express Briefs最新文献

英文 中文
An Efficient Layer Normalization Training Module With Dynamic Quantization for Transformers 一种高效的变压器层归一化动态量化训练模块
IF 4.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-22 DOI: 10.1109/TCSII.2025.3591633
Haikuo Shao;Aotao Wang;Zhongfeng Wang
Layer normalization (LN) function is widely adopted in Transformer-based neural networks. The efficient training of Transformers on personal devices is attracting attention for data privacy and latency concerns. However, the critical LN function involves extreme outliers for quantization, as well as hardware-unfriendly square-root and division operations, posing resource challenges for training deployment on the edge. This brief proposes an efficient LN training architecture with algorithm and hardware co-optimization. Specifically, we present a dynamic quantized algorithm based on integer arithmetics to smooth outliers for sufficient training accuracy. Then, we develop a reconfigurable hardware architecture to efficiently support various operations during LN training, with a vector-wise pipelined dataflow to improve hardware efficiency further. Experimental results show that our architecture achieves up to 0.25 and 1.0 Giga input per Second (GinS) in throughput at FPGA and ASIC platforms, respectively, outperforming prior works.
层归一化(LN)函数在基于变压器的神经网络中被广泛采用。变压器在个人设备上的高效训练引起了人们对数据隐私和延迟问题的关注。然而,关键的LN函数涉及量化的极端异常值,以及硬件不友好的平方根和除法操作,这给边缘训练部署带来了资源挑战。本文提出了一种算法和硬件协同优化的高效LN训练体系结构。具体来说,我们提出了一种基于整数算法的动态量化算法来平滑异常值以获得足够的训练精度。然后,我们开发了一个可重构的硬件架构,以有效地支持LN训练期间的各种操作,并使用矢量管道数据流进一步提高硬件效率。实验结果表明,我们的架构在FPGA和ASIC平台上的吞吐量分别高达每秒0.25和1.0千兆输入(GinS),优于先前的工作。
{"title":"An Efficient Layer Normalization Training Module With Dynamic Quantization for Transformers","authors":"Haikuo Shao;Aotao Wang;Zhongfeng Wang","doi":"10.1109/TCSII.2025.3591633","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3591633","url":null,"abstract":"Layer normalization (LN) function is widely adopted in Transformer-based neural networks. The efficient training of Transformers on personal devices is attracting attention for data privacy and latency concerns. However, the critical LN function involves extreme outliers for quantization, as well as hardware-unfriendly square-root and division operations, posing resource challenges for training deployment on the edge. This brief proposes an efficient LN training architecture with algorithm and hardware co-optimization. Specifically, we present a dynamic quantized algorithm based on integer arithmetics to smooth outliers for sufficient training accuracy. Then, we develop a reconfigurable hardware architecture to efficiently support various operations during LN training, with a vector-wise pipelined dataflow to improve hardware efficiency further. Experimental results show that our architecture achieves up to 0.25 and 1.0 Giga input per Second (GinS) in throughput at FPGA and ASIC platforms, respectively, outperforming prior works.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 9","pages":"1288-1292"},"PeriodicalIF":4.9,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning-Based Scaling Scheme for Markov Jump Systems and Its Application in Operational Amplifier Circuit 基于学习的马尔可夫跳变系统标度方案及其在运放电路中的应用
IF 4.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-22 DOI: 10.1109/TCSII.2025.3590998
Qing Yang;Jing Wang;Hao Shen;Ju H. Park
This brief addresses the optimization problem for Markov jump systems (MJSs) with unknown dynamics via a novel scaling-based reinforcement learning scheme. First, by employing subsystem transformation, the optimal controller design problem for MJSs is reformulated into solving a set of parallel and decoupled algebraic Riccati equations (DAREs). Traditional learning schemes for solving these equations either require initially admissible control policies or suffer from slow convergence. To overcome these limitations, a novel scaling-based reinforcement learning algorithm is proposed. Several notable advantages are exhibited by the proposed algorithm: it eliminates the need for system dynamics during the learning process, achieves faster convergence, and relaxes the requirement for an initially admissible control policy. The effectiveness of the proposed scheme is rigorously proven through a mathematical induction method. Finally, the feasibility of the proposed scheme is verified using an operational amplifier circuit example, and its superiority is demonstrated through a series of comparative simulations.
本文通过一种新的基于尺度的强化学习方案,解决了具有未知动态的马尔可夫跳跃系统(MJSs)的优化问题。首先,通过子系统变换,将mjs的最优控制器设计问题转化为求解一组并行解耦的代数Riccati方程(dare)。求解这些方程的传统学习方案要么需要初始允许的控制策略,要么收敛缓慢。为了克服这些限制,提出了一种新的基于尺度的强化学习算法。该算法具有几个显著的优点:在学习过程中不需要系统动力学,收敛速度更快,并且放宽了对初始可接受控制策略的要求。通过数学归纳法严格证明了该方案的有效性。最后,通过一个运放电路实例验证了所提方案的可行性,并通过一系列对比仿真验证了其优越性。
{"title":"Learning-Based Scaling Scheme for Markov Jump Systems and Its Application in Operational Amplifier Circuit","authors":"Qing Yang;Jing Wang;Hao Shen;Ju H. Park","doi":"10.1109/TCSII.2025.3590998","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3590998","url":null,"abstract":"This brief addresses the optimization problem for Markov jump systems (MJSs) with unknown dynamics via a novel scaling-based reinforcement learning scheme. First, by employing subsystem transformation, the optimal controller design problem for MJSs is reformulated into solving a set of parallel and decoupled algebraic Riccati equations (DAREs). Traditional learning schemes for solving these equations either require initially admissible control policies or suffer from slow convergence. To overcome these limitations, a novel scaling-based reinforcement learning algorithm is proposed. Several notable advantages are exhibited by the proposed algorithm: it eliminates the need for system dynamics during the learning process, achieves faster convergence, and relaxes the requirement for an initially admissible control policy. The effectiveness of the proposed scheme is rigorously proven through a mathematical induction method. Finally, the feasibility of the proposed scheme is verified using an operational amplifier circuit example, and its superiority is demonstrated through a series of comparative simulations.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 9","pages":"1238-1242"},"PeriodicalIF":4.9,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Compact Dual-Channel WPT System Based on Decoupled Integrated Coils for Power Enhancement 基于解耦集成线圈的小型双通道WPT系统功率增强
IF 4.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-22 DOI: 10.1109/TCSII.2025.3591215
Jiawei Xie;Yandong Chen;Yuhang Zhou;Cong Luo;Jian Guo
The multi-coils configuration presents an effective approach for high-power wireless power transfer (WPT) systems. Among them, mitigating complex cross-coupling in magnetic couplers remains critical to achieving high efficiency and stable power delivery. Thus, this brief proposes a compact dual-channel WPT system with decoupled coils to enhance the overall power capacity. The transmitter and receiver have the same structure, with each charging pad constructed by solenoid coils wound around Q-coils and ferrite cores. Solenoid coils and Q-coils are naturally decoupled from each other, thereby eliminating additional coupling interference and only their main mutual inductance $M_{1}$ , $M_{2}$ are retained. Furthermore, the principle of power enhancement and constant current (CC) output is thoroughly analyzed, and a more generalized output model is derived. Finally, a 305 W experimental prototype was constructed, with results in agreement with theoretical analyses. Compared with the single-channel system, the output current (2.82 A) of the proposed system is amplified by (1+ $M_{1}$ / ${M} _{2}$ ), with the peak efficiency reaching 90.5%, an improvement of about 6%.
多线圈结构为大功率无线功率传输(WPT)系统提供了一种有效的方法。其中,减轻磁耦合器中复杂的交叉耦合对于实现高效率和稳定的电力输送至关重要。因此,本文提出了一种紧凑的双通道WPT系统,该系统具有解耦线圈,以增强整体功率容量。发射器和接收器的结构相同,每个充电垫由螺线管线圈绕在q线圈和铁氧体铁芯上构成。螺线管线圈和q线圈彼此自然解耦,从而消除了额外的耦合干扰,仅保留其主互感电感$M_{1}$, $M_{2}$。在此基础上,深入分析了功率增强和恒流输出的原理,并推导出更广义的输出模型。最后,建立了305w的实验样机,实验结果与理论分析一致。与单通道系统相比,该系统的输出电流(2.82 A)放大了(1+ $M_{1}$ / ${M} _{2}$),峰值效率达到90.5%,提高了约6%。
{"title":"A Compact Dual-Channel WPT System Based on Decoupled Integrated Coils for Power Enhancement","authors":"Jiawei Xie;Yandong Chen;Yuhang Zhou;Cong Luo;Jian Guo","doi":"10.1109/TCSII.2025.3591215","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3591215","url":null,"abstract":"The multi-coils configuration presents an effective approach for high-power wireless power transfer (WPT) systems. Among them, mitigating complex cross-coupling in magnetic couplers remains critical to achieving high efficiency and stable power delivery. Thus, this brief proposes a compact dual-channel WPT system with decoupled coils to enhance the overall power capacity. The transmitter and receiver have the same structure, with each charging pad constructed by solenoid coils wound around Q-coils and ferrite cores. Solenoid coils and Q-coils are naturally decoupled from each other, thereby eliminating additional coupling interference and only their main mutual inductance <inline-formula> <tex-math>$M_{1}$ </tex-math></inline-formula>, <inline-formula> <tex-math>$M_{2}$ </tex-math></inline-formula> are retained. Furthermore, the principle of power enhancement and constant current (CC) output is thoroughly analyzed, and a more generalized output model is derived. Finally, a 305 W experimental prototype was constructed, with results in agreement with theoretical analyses. Compared with the single-channel system, the output current (2.82 A) of the proposed system is amplified by (1+<inline-formula> <tex-math>$M_{1}$ </tex-math></inline-formula>/<inline-formula> <tex-math>${M} _{2}$ </tex-math></inline-formula>), with the peak efficiency reaching 90.5%, an improvement of about 6%.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 9","pages":"1243-1247"},"PeriodicalIF":4.9,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Voltage-Sensorless Grid Voltage Full Feedforward Estimator-Based Current Control Strategy for a Grid-Connected Inverter 一种新的无电压传感器并网逆变器电压全前馈估计电流控制策略
IF 4.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-21 DOI: 10.1109/TCSII.2025.3590699
Qifan Wang;Qiangsong Zhao;Yuanqing Xia
This brief presents a novel voltage-sensorless grid voltage full feedforward estimator (GVFFE)-based current control strategy for a grid-connected inverter with an LCL filter. The grid voltage full feedforward (GVFF) signal can be directly estimated by the GVFFE using a closed-loop structure based on a repetitive controller. Furthermore, the grid voltage can be reconstructed from the estimated GVFF signal without relying on a voltage sensor. Compared with traditional GVFF methods, the GVFFE eliminates the noise amplification caused by derivative operations and compensates for computational delay. As a result, the disturbance rejection performance for grid voltage is significantly improved. The stability and harmonic suppression capabilities of the proposed strategy are comprehensively analyzed. Experimental results validate the effectiveness of the proposed control strategy, demonstrating its potential for practical applications in grid-connected inverter systems.
本文提出了一种新的基于无电压传感器电网电压全前馈估计器(GVFFE)的LCL滤波器并网逆变器电流控制策略。电网电压全前馈(GVFF)信号可以通过基于重复控制器的闭环结构直接估计。此外,在不依赖电压传感器的情况下,可以从估计的GVFF信号重建电网电压。与传统的GVFF方法相比,GVFFE消除了导数运算带来的噪声放大,补偿了计算延迟。结果表明,该系统对电网电压的抗干扰性能得到了显著提高。综合分析了该策略的稳定性和谐波抑制能力。实验结果验证了所提控制策略的有效性,显示了其在并网逆变器系统中的实际应用潜力。
{"title":"A Novel Voltage-Sensorless Grid Voltage Full Feedforward Estimator-Based Current Control Strategy for a Grid-Connected Inverter","authors":"Qifan Wang;Qiangsong Zhao;Yuanqing Xia","doi":"10.1109/TCSII.2025.3590699","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3590699","url":null,"abstract":"This brief presents a novel voltage-sensorless grid voltage full feedforward estimator (GVFFE)-based current control strategy for a grid-connected inverter with an LCL filter. The grid voltage full feedforward (GVFF) signal can be directly estimated by the GVFFE using a closed-loop structure based on a repetitive controller. Furthermore, the grid voltage can be reconstructed from the estimated GVFF signal without relying on a voltage sensor. Compared with traditional GVFF methods, the GVFFE eliminates the noise amplification caused by derivative operations and compensates for computational delay. As a result, the disturbance rejection performance for grid voltage is significantly improved. The stability and harmonic suppression capabilities of the proposed strategy are comprehensively analyzed. Experimental results validate the effectiveness of the proposed control strategy, demonstrating its potential for practical applications in grid-connected inverter systems.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 9","pages":"1233-1237"},"PeriodicalIF":4.9,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-Driven Near-Optimal Reduced Tracking Control of SPSs With Application to PMSM SPSs数据驱动的近最优简化跟踪控制及其在永磁同步电机中的应用
IF 4.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-21 DOI: 10.1109/TCSII.2025.3590689
Yao Xu;Chunyu Yang;Gonghe Li;Ju H. Park
This brief focuses on the data-driven near-optimal reduced trackingcontrol problem of linear time-invariant (LTI) singularly perturbed systems (SPSs) from noisy data. Based on singular perturbation theory (SPT), the reduced subsystem of the SPSs is obtained, further, an augmented error system is constructed and an optimal trackingcontrol (OTC) problem is formulated. Then, the integral version of the continuous-time augmented error system is constructed to avoid the error-prone problem of derivative calculation. Next, the closed-loop augmented error system is parameterized by the system I/O data, and the data-based semi-definite program (SDP) is proposed for the OTC problem. In addition, considering that the I/O data of the virtual reduced system are actually unmeasurable, the virtual reduced system is reconstructed by the I/O data of the original system, and the system performance is analyzed. Finally, the experiment of speed tracking control of permanent magnet synchronous motor (PMSM) verifies the effectiveness of the proposed data-driven control scheme.
本文主要讨论了基于噪声数据的线性时不变奇摄动系统的数据驱动的近最优简化跟踪控制问题。基于奇异摄动理论(SPT),得到了系统的约简子系统,构造了增广误差系统,提出了最优跟踪控制(OTC)问题。然后,构造了连续时间增广误差系统的积分版本,避免了导数计算容易出错的问题。其次,利用系统I/O数据对闭环增广误差系统进行参数化,并针对OTC问题提出了基于数据的半确定规划(SDP)。此外,考虑到虚拟约简系统的I/O数据实际上是不可测量的,利用原系统的I/O数据重构虚拟约简系统,并对系统性能进行分析。最后,通过对永磁同步电机速度跟踪控制的实验,验证了所提数据驱动控制方案的有效性。
{"title":"Data-Driven Near-Optimal Reduced Tracking Control of SPSs With Application to PMSM","authors":"Yao Xu;Chunyu Yang;Gonghe Li;Ju H. Park","doi":"10.1109/TCSII.2025.3590689","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3590689","url":null,"abstract":"This brief focuses on the data-driven near-optimal reduced trackingcontrol problem of linear time-invariant (LTI) singularly perturbed systems (SPSs) from noisy data. Based on singular perturbation theory (SPT), the reduced subsystem of the SPSs is obtained, further, an augmented error system is constructed and an optimal trackingcontrol (OTC) problem is formulated. Then, the integral version of the continuous-time augmented error system is constructed to avoid the error-prone problem of derivative calculation. Next, the closed-loop augmented error system is parameterized by the system I/O data, and the data-based semi-definite program (SDP) is proposed for the OTC problem. In addition, considering that the I/O data of the virtual reduced system are actually unmeasurable, the virtual reduced system is reconstructed by the I/O data of the original system, and the system performance is analyzed. Finally, the experiment of speed tracking control of permanent magnet synchronous motor (PMSM) verifies the effectiveness of the proposed data-driven control scheme.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 9","pages":"1228-1232"},"PeriodicalIF":4.9,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 3.9-8.2-GHz Wideband Frequency Synthesizer With an Inductive Multiplexing Output Network for SATCOM Applications 一种用于卫星通信应用的带电感复用输出网络的3.9-8.2 ghz宽带频率合成器
IF 4.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-18 DOI: 10.1109/TCSII.2025.3590593
Xiaofei Liao;Dixian Zhao;Chenyu Xu;Hao Gong;Wendi Chen;Xiaohu You
This brief presents a wideband frequency synthesizer with 3.9 to 8.2 GHz continuous frequency coverage for satellite communication applications. The core fractional-N phase locked loop utilizes four LC-VCOs achieving a 4.3 GHz tuning range with a 50-MHz reference frequency. The frequency mapping of the four VCOs, along with module-level parameter optimization, is performed to maintain a stable figure of merit and minimize loop jitter across the entire tuning range. A high-isolation low-loss inductive multiplexing output technique is proposed, which uses only one active buffer to drive both the internal loop and the external load, significantly reducing power consumption. Moreover, an on-chip active loop filter is implemented, reducing the capacitance area by 80% and enhancing chip integration. Fabricated in a 65-nm CMOS technology, the frequency synthesizer occupies a chip area of 2.28 mm2 while consumes power of 25–33.5 mW. The phase noise reaches –123.72 dBc/Hz and –116.31 dBc/Hz at 1-MHz offset under 3.9- and 8.2-GHz carriers, respectively. Measured reference and fractional spurs remain below –65 and –55 dBc.
本文介绍了一种用于卫星通信应用的具有3.9至8.2 GHz连续频率覆盖的宽带频率合成器。核心分数n锁相环利用4个lc - vco实现4.3 GHz调谐范围,参考频率为50 mhz。执行四个vco的频率映射以及模块级参数优化,以保持稳定的性能值,并在整个调谐范围内最大限度地减少环路抖动。提出了一种高隔离低损耗的电感复用输出技术,该技术仅使用一个有源缓冲器同时驱动内环和外部负载,从而显著降低了功耗。此外,还实现了片上有源环路滤波器,使电容面积减少了80%,提高了芯片集成度。频率合成器采用65纳米CMOS技术制造,芯片面积为2.28 mm2,功耗为25-33.5 mW。在3.9 ghz和8.2 ghz载波下,相位噪声在1 mhz偏移时分别达到- 123.72 dBc/Hz和- 116.31 dBc/Hz。测量的参考杂散和分数杂散保持在-65和-55 dBc以下。
{"title":"A 3.9-8.2-GHz Wideband Frequency Synthesizer With an Inductive Multiplexing Output Network for SATCOM Applications","authors":"Xiaofei Liao;Dixian Zhao;Chenyu Xu;Hao Gong;Wendi Chen;Xiaohu You","doi":"10.1109/TCSII.2025.3590593","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3590593","url":null,"abstract":"This brief presents a wideband frequency synthesizer with 3.9 to 8.2 GHz continuous frequency coverage for satellite communication applications. The core fractional-N phase locked loop utilizes four LC-VCOs achieving a 4.3 GHz tuning range with a 50-MHz reference frequency. The frequency mapping of the four VCOs, along with module-level parameter optimization, is performed to maintain a stable figure of merit and minimize loop jitter across the entire tuning range. A high-isolation low-loss inductive multiplexing output technique is proposed, which uses only one active buffer to drive both the internal loop and the external load, significantly reducing power consumption. Moreover, an on-chip active loop filter is implemented, reducing the capacitance area by 80% and enhancing chip integration. Fabricated in a 65-nm CMOS technology, the frequency synthesizer occupies a chip area of 2.28 mm2 while consumes power of 25–33.5 mW. The phase noise reaches –123.72 dBc/Hz and –116.31 dBc/Hz at 1-MHz offset under 3.9- and 8.2-GHz carriers, respectively. Measured reference and fractional spurs remain below –65 and –55 dBc.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 9","pages":"1163-1167"},"PeriodicalIF":4.9,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hybrid CAM-SRAM Processing-in-Memory Architecture With Feature Level Sparsity for Attention Mechanisms 一种用于注意力机制的特征级稀疏的CAM-SRAM混合内存处理架构
IF 4.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-18 DOI: 10.1109/TCSII.2025.3590432
Haiqiu Huang;Mingyu Wang;Xiaojie Li;Baiqing Zhong;Zeqi Yang;Tao Lu;Yicong Zhang;Zhiyi Yu
The attention mechanism has become increasingly popular due to its ability to capture complex dependencies, enabling models like transformers to achieve remarkable performance in large language models (LLMs), computer vision, and other domains. However, the mechanism faces challenges such as low arithmetic intensity, leading to frequent data movement, and long sequence lengths, which introduce a large amount of redundant information. To mitigate both data movement and computational overhead in attention mechanisms, we propose a hybrid CAM-SRAM processing-in-memory architecture. By leveraging the parallel search and sort capabilities of content-addressable memory (CAM) arrays, we achieve dynamic fine-grained sparsification on features with varying variance, reducing the number of multiply-accumulate (MAC) operations in the matrix multiplication (MatMul). Furthermore, an approximate booth encoding is employed in our MAC unit to reduce the number of partial products and maintain the consistency of their signs. This eliminates the need for negation operations, simplifying the logic design. Experimental results show that, in different configurations, our feature-level sparsification scheme achieves over 80% sparsity with an acceptable accuracy drop. With sparsity up to 80%, our design achieves a performance of 0.252-1.26 TOPS and a power efficiency of 4.71-21.72 TOPS/W, operating at 1000 MHz on the TSMC 40nm process.
由于能够捕获复杂的依赖关系,注意力机制已经变得越来越流行,使得像变压器这样的模型能够在大型语言模型(llm)、计算机视觉和其他领域中实现卓越的性能。然而,该机制面临着算术强度低、导致数据移动频繁、序列长度长、引入大量冗余信息等挑战。为了减轻注意力机制中的数据移动和计算开销,我们提出了一种混合CAM-SRAM内存处理架构。通过利用内容可寻址内存(CAM)数组的并行搜索和排序功能,我们实现了对具有不同方差的特征的动态细粒度稀疏化,减少了矩阵乘法(MatMul)中的乘法累加(MAC)操作的数量。此外,在我们的MAC单元中采用了近似的展位编码,以减少部分产品的数量并保持其标志的一致性。这消除了对否定操作的需要,简化了逻辑设计。实验结果表明,在不同的配置下,我们的特征级稀疏化方案在可接受的精度下降下实现了80%以上的稀疏化。我们的设计具有高达80%的稀疏性,在台积电40nm工艺的1000 MHz下,实现了0.252-1.26 TOPS的性能和4.71-21.72 TOPS/W的功率效率。
{"title":"A Hybrid CAM-SRAM Processing-in-Memory Architecture With Feature Level Sparsity for Attention Mechanisms","authors":"Haiqiu Huang;Mingyu Wang;Xiaojie Li;Baiqing Zhong;Zeqi Yang;Tao Lu;Yicong Zhang;Zhiyi Yu","doi":"10.1109/TCSII.2025.3590432","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3590432","url":null,"abstract":"The attention mechanism has become increasingly popular due to its ability to capture complex dependencies, enabling models like transformers to achieve remarkable performance in large language models (LLMs), computer vision, and other domains. However, the mechanism faces challenges such as low arithmetic intensity, leading to frequent data movement, and long sequence lengths, which introduce a large amount of redundant information. To mitigate both data movement and computational overhead in attention mechanisms, we propose a hybrid CAM-SRAM processing-in-memory architecture. By leveraging the parallel search and sort capabilities of content-addressable memory (CAM) arrays, we achieve dynamic fine-grained sparsification on features with varying variance, reducing the number of multiply-accumulate (MAC) operations in the matrix multiplication (MatMul). Furthermore, an approximate booth encoding is employed in our MAC unit to reduce the number of partial products and maintain the consistency of their signs. This eliminates the need for negation operations, simplifying the logic design. Experimental results show that, in different configurations, our feature-level sparsification scheme achieves over 80% sparsity with an acceptable accuracy drop. With sparsity up to 80%, our design achieves a performance of 0.252-1.26 TOPS and a power efficiency of 4.71-21.72 TOPS/W, operating at 1000 MHz on the TSMC 40nm process.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 9","pages":"1283-1287"},"PeriodicalIF":4.9,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Fractional Momentum Enhanced Fractional Filter for the Memristor-Based Volume Controller 基于忆阻器的体积控制器分数阶动量增强分数阶滤波器
IF 4.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-17 DOI: 10.1109/TCSII.2025.3589991
Xuetao Xie;Yi-Fei Pu;Jian Wang
This brief proposes a memristor-based volume controller, thus providing a practical application scenario of system identification. In order to identify the parameter in this system, we propose a fractional momentum enhanced fractional least mean square (FM-EFLMS) algorithm by combining the enhanced fractional derivative and the fractional momentum term. We analyze the stability condition of the FM-EFLMS algorithm. The resource consumption of the FM-EFLMS algorithm is also analyzed. Simulation experiments demonstrate the potential advantage of the memristor-based volume controller. Moreover, the experimental results show that the convergence performance of the FM-EFLMS algorithm exhibits obvious advantages compared to the competing filter algorithms.
本文提出了一种基于忆阻器的体积控制器,从而提供了系统辨识的实际应用场景。为了识别系统中的参数,我们提出了一种将增强分数阶导数与分数阶动量项相结合的分数阶动量增强分数阶最小均方(FM-EFLMS)算法。分析了FM-EFLMS算法的稳定性条件。分析了FM-EFLMS算法的资源消耗。仿真实验证明了基于忆阻器的体积控制器的潜在优势。实验结果表明,与同类滤波算法相比,FM-EFLMS算法的收敛性能具有明显的优势。
{"title":"A Fractional Momentum Enhanced Fractional Filter for the Memristor-Based Volume Controller","authors":"Xuetao Xie;Yi-Fei Pu;Jian Wang","doi":"10.1109/TCSII.2025.3589991","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3589991","url":null,"abstract":"This brief proposes a memristor-based volume controller, thus providing a practical application scenario of system identification. In order to identify the parameter in this system, we propose a fractional momentum enhanced fractional least mean square (FM-EFLMS) algorithm by combining the enhanced fractional derivative and the fractional momentum term. We analyze the stability condition of the FM-EFLMS algorithm. The resource consumption of the FM-EFLMS algorithm is also analyzed. Simulation experiments demonstrate the potential advantage of the memristor-based volume controller. Moreover, the experimental results show that the convergence performance of the FM-EFLMS algorithm exhibits obvious advantages compared to the competing filter algorithms.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 9","pages":"1333-1337"},"PeriodicalIF":4.9,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrastive Learning-Based Dual Autoencoder for Anomaly Detection in Loader Gearboxes 基于对比学习的双自编码器在装载机变速箱异常检测中的应用
IF 4.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-17 DOI: 10.1109/TCSII.2025.3590139
Ruonan Lu;Da Zheng;Chengyuan Zhu;Weiwei Cao;Qinmin Yang
Anomaly detection (AD) of gearboxes is essential for ensuring the operational safety and reliability of the loader. However, identifying anomalies in non-stationary signals remains challenging as anomalies often emerge within the normal fluctuation, especially when normal and abnormal samples exhibit high similarity. This brief proposes a contrastive learning-based dual autoencoder (AE) AD method for loader gearboxes. Specifically, the continuous wavelet transform is employed to capture dynamic characteristics of non-stationary signals. A compound scaling network is then designed into the unified encoder to extract complex features while maintaining a lightweight architecture. Subsequently, a sparse representation channel is integrated into the second AE framework, complementing the basis for contrastive mechanisms and promoting the learning of consistency across normal samples with the reconstruction channel. By minimizing the contrastive loss between two samples from different channels, the model learns the inherent consistency of normal samples. Finally, the contrastive loss of the second AE and the reconstruction error of the first AE serve as indicators for detecting abnormalities. Experimental results on real-world loader gearbox data demonstrate that the proposed method achieves a high fault detection rate, a low false alarm rate, and robust reliability, validating its effectiveness.
齿轮箱异常检测是保证装载机安全可靠运行的重要手段。然而,识别非平稳信号中的异常仍然具有挑战性,因为异常通常出现在正常波动中,特别是当正常和异常样本表现出高度相似时。提出了一种基于对比学习的装载机齿轮箱双自编码器(AE) AD方法。具体而言,采用连续小波变换捕捉非平稳信号的动态特性。然后在统一编码器中设计复合缩放网络,在保持轻量级架构的同时提取复杂特征。随后,将稀疏表示通道集成到第二个AE框架中,补充了对比机制的基础,并通过重构通道促进了正常样本间一致性的学习。通过最小化来自不同通道的两个样本之间的对比损失,该模型学习了正态样本的固有一致性。最后,第二声发射的对比损失和第一声发射的重建误差作为检测异常的指标。在装载机变速箱实际数据上的实验结果表明,该方法具有较高的故障检出率、较低的虚警率和较强的可靠性,验证了该方法的有效性。
{"title":"Contrastive Learning-Based Dual Autoencoder for Anomaly Detection in Loader Gearboxes","authors":"Ruonan Lu;Da Zheng;Chengyuan Zhu;Weiwei Cao;Qinmin Yang","doi":"10.1109/TCSII.2025.3590139","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3590139","url":null,"abstract":"Anomaly detection (AD) of gearboxes is essential for ensuring the operational safety and reliability of the loader. However, identifying anomalies in non-stationary signals remains challenging as anomalies often emerge within the normal fluctuation, especially when normal and abnormal samples exhibit high similarity. This brief proposes a contrastive learning-based dual autoencoder (AE) AD method for loader gearboxes. Specifically, the continuous wavelet transform is employed to capture dynamic characteristics of non-stationary signals. A compound scaling network is then designed into the unified encoder to extract complex features while maintaining a lightweight architecture. Subsequently, a sparse representation channel is integrated into the second AE framework, complementing the basis for contrastive mechanisms and promoting the learning of consistency across normal samples with the reconstruction channel. By minimizing the contrastive loss between two samples from different channels, the model learns the inherent consistency of normal samples. Finally, the contrastive loss of the second AE and the reconstruction error of the first AE serve as indicators for detecting abnormalities. Experimental results on real-world loader gearbox data demonstrate that the proposed method achieves a high fault detection rate, a low false alarm rate, and robust reliability, validating its effectiveness.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 9","pages":"1223-1227"},"PeriodicalIF":4.9,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Swing- and Gain-Enhanced Mirrored Dynamic Amplifier 摆幅增益增强镜像动态放大器
IF 4.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-07-17 DOI: 10.1109/TCSII.2025.3590144
Ali Rezapour;Omid Shoaei
This brief presents a Mirrored Dynamic Amplifier (MDA) to enhance the swing and gain of prior art dynamic amplifiers (DAs). Instead of compensating the charge loss in the load capacitors due to the common-mode current, this technique resolves the dependency between common-mode and differential-mode currents. The differential and regulated common-mode currents are mirrored to the output branch for integration on the load capacitors. Furthermore, the output branch is made of only one transistor, which makes the proposed architecture to benefit from large output swing. An output swing of 1.272Vppdiff can be achieved with a supply voltage of 1.2V. In addition, a pre-discharge linearization technique is presented to compensate for nonlinearity induced by the current regulation mechanism, that results in an average improvement of 6 dB in THD. The proposed DA is designed and verified in a 65nm CMOS process. Post-layout simulation results show that gains of $16times $ and $32times $ can be achieved, along with output swings of 640 mVppdiff and 800 mVppdiff, respectively, while maintaining THDs better than −62 dB. A noise analysis and a detailed comparison of the proposed MDA with a few state-of-the-art designs are also elaborated.
本文简要介绍了一种镜像动态放大器(MDA),以提高现有技术动态放大器(DAs)的摆幅和增益。该技术解决了共模电流和差模电流之间的依赖关系,而不是补偿负载电容器中由于共模电流造成的电荷损失。差动和稳压共模电流被镜像到输出支路,在负载电容上进行集成。此外,输出支路仅由一个晶体管组成,这使得所提出的结构受益于大输出摆幅。当电源电压为1.2V时,输出摆幅可达1.272Vppdiff。此外,提出了一种预放电线性化技术来补偿电流调节机制引起的非线性,使THD平均提高了6 dB。提出的数据处理系统在65nm CMOS工艺中进行了设计和验证。布局后仿真结果表明,可以实现16倍和32倍的增益,输出振荡分别为640 mVppdiff和800 mVppdiff,同时保持优于- 62 dB的THDs。此外,本文还详细分析了拟议的导弹防御系统与一些最先进的设计方案的噪音分析和比较。
{"title":"A Swing- and Gain-Enhanced Mirrored Dynamic Amplifier","authors":"Ali Rezapour;Omid Shoaei","doi":"10.1109/TCSII.2025.3590144","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3590144","url":null,"abstract":"This brief presents a Mirrored Dynamic Amplifier (MDA) to enhance the swing and gain of prior art dynamic amplifiers (DAs). Instead of compensating the charge loss in the load capacitors due to the common-mode current, this technique resolves the dependency between common-mode and differential-mode currents. The differential and regulated common-mode currents are mirrored to the output branch for integration on the load capacitors. Furthermore, the output branch is made of only one transistor, which makes the proposed architecture to benefit from large output swing. An output swing of 1.272Vppdiff can be achieved with a supply voltage of 1.2V. In addition, a pre-discharge linearization technique is presented to compensate for nonlinearity induced by the current regulation mechanism, that results in an average improvement of 6 dB in THD. The proposed DA is designed and verified in a 65nm CMOS process. Post-layout simulation results show that gains of <inline-formula> <tex-math>$16times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$32times $ </tex-math></inline-formula> can be achieved, along with output swings of 640 mVppdiff and 800 mVppdiff, respectively, while maintaining THDs better than −62 dB. A noise analysis and a detailed comparison of the proposed MDA with a few state-of-the-art designs are also elaborated.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 9","pages":"1158-1162"},"PeriodicalIF":4.9,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Circuits and Systems II: Express Briefs
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1