Pub Date : 2025-08-19DOI: 10.1109/TCSII.2025.3600432
Xiong Yang;Ding Wang
This brief investigates a decentralized event-driven control (EDC) problem of multi-machine power systems having asymmetric constraints imposed on inputs. Initially, the decentralized input-constrained EDC problem is transformed into a set of input-unconstrained optimal EDC subproblems by introducing enhanced cost functions for nominal subsystems. Then, with the construction of dynamic event-triggering mechanisms, the event-driven Hamilton-Jacobi-Bellman equations (ED-HJBEs) are derived for these subproblems. To approximately solve these ED-HJBEs, only critic neural networks are utilized in the reinforcement learning framework, and their weights are updated via the gradient descent approach. After that, based on Lyapunov method, uniform ultimate boundedness of the closed-loop multi-machine power systems is established. Finally, simulations are conducted on a two-machine power system to validate the developed decentralized EDC policy.
{"title":"Reinforcement Learning for Dynamic Event-Driven Control of Multi-Machine Power Systems","authors":"Xiong Yang;Ding Wang","doi":"10.1109/TCSII.2025.3600432","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3600432","url":null,"abstract":"This brief investigates a decentralized event-driven control (EDC) problem of multi-machine power systems having asymmetric constraints imposed on inputs. Initially, the decentralized input-constrained EDC problem is transformed into a set of input-unconstrained optimal EDC subproblems by introducing enhanced cost functions for nominal subsystems. Then, with the construction of dynamic event-triggering mechanisms, the event-driven Hamilton-Jacobi-Bellman equations (ED-HJBEs) are derived for these subproblems. To approximately solve these ED-HJBEs, only critic neural networks are utilized in the reinforcement learning framework, and their weights are updated via the gradient descent approach. After that, based on Lyapunov method, uniform ultimate boundedness of the closed-loop multi-machine power systems is established. Finally, simulations are conducted on a two-machine power system to validate the developed decentralized EDC policy.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 10","pages":"1413-1417"},"PeriodicalIF":4.9,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-18DOI: 10.1109/TCSII.2025.3599886
Donghai Zhu;Xiangjinwen Li;Shiying Zhou;Xuejiao Zhong;Xudong Zou;Yong Kang
In this brief, the power decoupling mechanism of grid-forming converter is analyzed, which indicates that the suitable voltage compensation is crucial for decoupling. Then, an optimized power decoupling control is proposed, which can provide positive or negative voltage compensation for decoupling, resulting in efficient power decoupling under different R/X ratios of grid impedance.
{"title":"Optimized Power Decoupling Control for Grid-Forming Converter Under Different R/X Ratios of Grid Impedance","authors":"Donghai Zhu;Xiangjinwen Li;Shiying Zhou;Xuejiao Zhong;Xudong Zou;Yong Kang","doi":"10.1109/TCSII.2025.3599886","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3599886","url":null,"abstract":"In this brief, the power decoupling mechanism of grid-forming converter is analyzed, which indicates that the suitable voltage compensation is crucial for decoupling. Then, an optimized power decoupling control is proposed, which can provide positive or negative voltage compensation for decoupling, resulting in efficient power decoupling under different R/X ratios of grid impedance.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 12","pages":"2052-2056"},"PeriodicalIF":4.9,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145595102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-14DOI: 10.1109/TCSII.2025.3598759
Ziying Xie;Tianchi Ye;Ziyue Dang;Xi Xiao;Min Tan
Pulse-width-modulated (PWM) thermo-optic tuning in silicon photonics calls for a power supply featuring high-speed PWM power output with short settling time, high efficiency, and a compact size. However, the transient response of the traditional digital low-dropout regulators (DLDOs) is limited by the closed-loop response, which makes it difficult to meet the speed requirements of the PWM power output. This brief presents a State-Switching DLDO (SS-DLDO), specially optimized for PWM thermo-optic tuning. Two state selectors, controlled by a PWM signal, are inserted into the SS-DLDO structure to control the connections and operational states of the DLDO asynchronously. This enables the speed of PWM tuning to be decoupled from the feedback loop of the DLDO. The proposed design is fabricated in a 65nm CMOS process with an active area of 0.00634 mm2. Measurement results show that the rising-edge settling time and falling-edge settling time of the PWM power output are 16.3 ns and 14 ns, respectively, which effectively reduces the limit of the edge settling time to the achievable PWM duty cycle range. Under a 2 MHz PWM frequency, this design can achieve PWM duty cycles ranging from 5.92% to 97.2%, corresponding to output power ranging from 1.47 mW to 24.12 mW.
{"title":"A State-Switching Digital LDO for PWM Thermo-Optic Tuning in Silicon Photonics","authors":"Ziying Xie;Tianchi Ye;Ziyue Dang;Xi Xiao;Min Tan","doi":"10.1109/TCSII.2025.3598759","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3598759","url":null,"abstract":"Pulse-width-modulated (PWM) thermo-optic tuning in silicon photonics calls for a power supply featuring high-speed PWM power output with short settling time, high efficiency, and a compact size. However, the transient response of the traditional digital low-dropout regulators (DLDOs) is limited by the closed-loop response, which makes it difficult to meet the speed requirements of the PWM power output. This brief presents a State-Switching DLDO (SS-DLDO), specially optimized for PWM thermo-optic tuning. Two state selectors, controlled by a PWM signal, are inserted into the SS-DLDO structure to control the connections and operational states of the DLDO asynchronously. This enables the speed of PWM tuning to be decoupled from the feedback loop of the DLDO. The proposed design is fabricated in a 65nm CMOS process with an active area of 0.00634 mm2. Measurement results show that the rising-edge settling time and falling-edge settling time of the PWM power output are 16.3 ns and 14 ns, respectively, which effectively reduces the limit of the edge settling time to the achievable PWM duty cycle range. Under a 2 MHz PWM frequency, this design can achieve PWM duty cycles ranging from 5.92% to 97.2%, corresponding to output power ranging from 1.47 mW to 24.12 mW.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 10","pages":"1458-1462"},"PeriodicalIF":4.9,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145134918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-12DOI: 10.1109/TCSII.2025.3598216
Xinyue Cao;Ling Zhao;Yuanqing Xia;Hongjiu Yang
In this brief, hierarchical fusion estimation with feedback is researched in clustered sensor networks with leader and subordinate sensors. A local estimator is designed to obtain local estimates using feedback fusion estimates in each sensor. A two-layer fusion estimator is developed to achieve fusion estimates for improving estimation accuracy in each leader sensor. The first layer fusion estimator is proposed combining local estimates within the same cluster based on both current and past local estimation accuracy under inaccurate noise covariance matrices. The second layer fusion estimator is designed by fusing the first layer fusion estimates from all leader sensors. Validity of the hierarchical fusion estimation with feedback is shown based on a maneuvering target tracking system.
{"title":"Hierarchical Fusion Estimation With Feedback for Clustered Sensor Networks Subject to Leader and Subordinate Sensors","authors":"Xinyue Cao;Ling Zhao;Yuanqing Xia;Hongjiu Yang","doi":"10.1109/TCSII.2025.3598216","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3598216","url":null,"abstract":"In this brief, hierarchical fusion estimation with feedback is researched in clustered sensor networks with leader and subordinate sensors. A local estimator is designed to obtain local estimates using feedback fusion estimates in each sensor. A two-layer fusion estimator is developed to achieve fusion estimates for improving estimation accuracy in each leader sensor. The first layer fusion estimator is proposed combining local estimates within the same cluster based on both current and past local estimation accuracy under inaccurate noise covariance matrices. The second layer fusion estimator is designed by fusing the first layer fusion estimates from all leader sensors. Validity of the hierarchical fusion estimation with feedback is shown based on a maneuvering target tracking system.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 10","pages":"1403-1407"},"PeriodicalIF":4.9,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-08DOI: 10.1109/TCSII.2025.3597263
Wei Xie;Toshio Eisaka
The main goal of this brief is to present a methodology to design multi-objective $mathrm{H}_{infty } $ /r -stability interval estimation for linear discrete-time systems affected by bounded, but unknown disturbances. Compared with $mathrm{H}_{infty } $ norm criterion that is used to reject the influence of external disturbances on the output under the worst scenario, multi-objective $mathrm{H}_{infty } $ /r-stability interval observer takes pole-placement into account, the poles of the system matrix of the observes are configured as close as possible to the origin. This is to ensure not only a relatively fast convergence characteristic but also the minimum interval width. Finally, an illustrative example of a DC servo motor highlights the performance of our methodology.
{"title":"Multi-Objective H∞/r-Stability Optimal Interval Estimation for Linear Discrete-Time Systems and Application to DC Servo Motor","authors":"Wei Xie;Toshio Eisaka","doi":"10.1109/TCSII.2025.3597263","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3597263","url":null,"abstract":"The main goal of this brief is to present a methodology to design multi-objective <inline-formula> <tex-math>$mathrm{H}_{infty } $ </tex-math></inline-formula>/r -stability interval estimation for linear discrete-time systems affected by bounded, but unknown disturbances. Compared with <inline-formula> <tex-math>$mathrm{H}_{infty } $ </tex-math></inline-formula> norm criterion that is used to reject the influence of external disturbances on the output under the worst scenario, multi-objective <inline-formula> <tex-math>$mathrm{H}_{infty } $ </tex-math></inline-formula>/r-stability interval observer takes pole-placement into account, the poles of the system matrix of the observes are configured as close as possible to the origin. This is to ensure not only a relatively fast convergence characteristic but also the minimum interval width. Finally, an illustrative example of a DC servo motor highlights the performance of our methodology.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 10","pages":"1398-1402"},"PeriodicalIF":4.9,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-07DOI: 10.1109/TCSII.2025.3596708
Bram Veraverbeke;Filip Tavernier
Dopant freeze-out severely increases the bulk resistance of cryogenic bulk CMOS transistors by up to $10{^{{6}}} {times }$ at 4.2K compared to room temperature. This brief describes, for the first time in the literature, how this increased bulk resistance introduces a memory effect in the latch of dynamic comparators, which leads to hysteresis. To measure this hysteresis reliably in the presence of noise, a statistical characterization procedure is developed. For a 40nm bulk CMOS strongARM comparator with an input-referred noise voltage of 348$mu $ VRMS, a hysteresis voltage >898$mu $ V is measured at 6K, substantially deteriorating the precision. Therefore, this brief introduces a triple tail comparator with capacitive over-neutralization to increase the preamplification gain, suppressing the hysteresis >$6{times }$ to only 141$mu $ V.
{"title":"A Cryo-CMOS Triple Tail Comparator With Capacitive Over-Neutralization to Suppress Freeze-Out Induced Hysteresis","authors":"Bram Veraverbeke;Filip Tavernier","doi":"10.1109/TCSII.2025.3596708","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3596708","url":null,"abstract":"Dopant freeze-out severely increases the bulk resistance of cryogenic bulk CMOS transistors by up to <inline-formula> <tex-math>$10{^{{6}}} {times }$ </tex-math></inline-formula> at 4.2K compared to room temperature. This brief describes, for the first time in the literature, how this increased bulk resistance introduces a memory effect in the latch of dynamic comparators, which leads to hysteresis. To measure this hysteresis reliably in the presence of noise, a statistical characterization procedure is developed. For a 40nm bulk CMOS strongARM comparator with an input-referred noise voltage of 348<inline-formula> <tex-math>$mu $ </tex-math></inline-formula> VRMS, a hysteresis voltage >898<inline-formula> <tex-math>$mu $ </tex-math></inline-formula> V is measured at 6K, substantially deteriorating the precision. Therefore, this brief introduces a triple tail comparator with capacitive over-neutralization to increase the preamplification gain, suppressing the hysteresis ><inline-formula> <tex-math>$6{times }$ </tex-math></inline-formula> to only 141<inline-formula> <tex-math>$mu $ </tex-math></inline-formula> V.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 10","pages":"1358-1362"},"PeriodicalIF":4.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This brief introduces an integrated configurable frequency-modulated continuous wave (FMCW) radar baseband SoC, which integrates a baseband accelerator in 40-nm CMOS technology. This brief exhibits notable advantages in terms of miniaturization, configurability, and real-time performance. To enhance the real-time performance of baseband signal processing, the baseband accelerator employs a pipeline architecture that incorporates specifically designed parallel computation structures for each submodule. Furthermore, this design enables the accelerator to support diverse application scenarios by offering configurable dimensions for fast Fourier transform (FFT), constant false alarm rate (CFAR), and digital beamforming (DBF), along with adjustable parameters for time-frequency domain processing. Board-level testing results indicate that the chip can accurately distinguish targets with varying distances, speeds, and angles. Operating at a system clock frequency of 200 MHz, the processor achieves a frame processing time of 2.79 ms and a power consumption of 492 mW, under the maximum CFAR window configuration and 256 targets.
{"title":"An Integrated Configurable FMCW Radar Baseband SoC in 40-nm CMOS","authors":"Peng Zhang;Bo Wang;Ning Zhang;Pengfei Diao;Qisong Wu;Dixian Zhao","doi":"10.1109/TCSII.2025.3596605","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3596605","url":null,"abstract":"This brief introduces an integrated configurable frequency-modulated continuous wave (FMCW) radar baseband SoC, which integrates a baseband accelerator in 40-nm CMOS technology. This brief exhibits notable advantages in terms of miniaturization, configurability, and real-time performance. To enhance the real-time performance of baseband signal processing, the baseband accelerator employs a pipeline architecture that incorporates specifically designed parallel computation structures for each submodule. Furthermore, this design enables the accelerator to support diverse application scenarios by offering configurable dimensions for fast Fourier transform (FFT), constant false alarm rate (CFAR), and digital beamforming (DBF), along with adjustable parameters for time-frequency domain processing. Board-level testing results indicate that the chip can accurately distinguish targets with varying distances, speeds, and angles. Operating at a system clock frequency of 200 MHz, the processor achieves a frame processing time of 2.79 ms and a power consumption of 492 mW, under the maximum CFAR window configuration and 256 targets.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 10","pages":"1438-1442"},"PeriodicalIF":4.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-07DOI: 10.1109/TCSII.2025.3596834
Ahmed Abdelaziz;Yuang Cao;Tawfiq Musah
This brief explores the coefficient adaptation approaches for the recently presented contingent decision equalizer (CDE) for pulse amplitude modulation (PAM) signaling. A real-time realization of zero forcing (ZF) adaptation is first developed. Then, two minimum mean square error (MMSE) coefficient adaptation approaches that take advantage of the hybrid operation and modular architecture of the CDE are derived. The performance of the ZF adaptation architecture and the global and distributed least mean squares (LMS) realizations of the MMSE solutions targeted to the CDE architecture are evaluated analytically and using behavioral modeling. The computational and area overhead of the proposed solutions are discussed. The convergence of the ZF and modified LMS implementations is simulated for various channel and equalizer types with PAM4 signaling. The results show superior equalization for the CDE. The results also show faster convergence for the global LMS over ZF adaptation and a higher voltage margin for the global LMS with well-behaved channels.
{"title":"Adaptation Approaches for PAMN Contingent Decision Equalizers","authors":"Ahmed Abdelaziz;Yuang Cao;Tawfiq Musah","doi":"10.1109/TCSII.2025.3596834","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3596834","url":null,"abstract":"This brief explores the coefficient adaptation approaches for the recently presented contingent decision equalizer (CDE) for pulse amplitude modulation (PAM) signaling. A real-time realization of zero forcing (ZF) adaptation is first developed. Then, two minimum mean square error (MMSE) coefficient adaptation approaches that take advantage of the hybrid operation and modular architecture of the CDE are derived. The performance of the ZF adaptation architecture and the global and distributed least mean squares (LMS) realizations of the MMSE solutions targeted to the CDE architecture are evaluated analytically and using behavioral modeling. The computational and area overhead of the proposed solutions are discussed. The convergence of the ZF and modified LMS implementations is simulated for various channel and equalizer types with PAM4 signaling. The results show superior equalization for the CDE. The results also show faster convergence for the global LMS over ZF adaptation and a higher voltage margin for the global LMS with well-behaved channels.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 10","pages":"1388-1392"},"PeriodicalIF":4.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ${mathcal {L}}_{infty }$ bumpless transfer fault-tolerant control problem is addressed for continuous-time switched affine systems with actuator faults and bounded disturbances. A novel piecewise transition-dependent fault-tolerant controller is designed, by enforcing the specified bumps limitation constraints, sufficient conditions for the existence of the fault-tolerant controller are derived satisfying a new average dwell time constraint, which guarantees the practical stability of the closed-loop system and the bumpless transfer as switching and faults occur. Finally, both practicability and validity of the developed methods are illustrated through a case study of DC-DC boost converter.
{"title":"Fault Tolerant Control of Switched Affine Systems With Application to Boost Converter: The Transition-Dependent Bumpless Transfer Approach","authors":"Fang Liao;Yanzheng Zhu;Rongni Yang;Jian Zhang;Donghua Zhou","doi":"10.1109/TCSII.2025.3596833","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3596833","url":null,"abstract":"The <inline-formula> <tex-math>${mathcal {L}}_{infty }$ </tex-math></inline-formula> bumpless transfer fault-tolerant control problem is addressed for continuous-time switched affine systems with actuator faults and bounded disturbances. A novel piecewise transition-dependent fault-tolerant controller is designed, by enforcing the specified bumps limitation constraints, sufficient conditions for the existence of the fault-tolerant controller are derived satisfying a new average dwell time constraint, which guarantees the practical stability of the closed-loop system and the bumpless transfer as switching and faults occur. Finally, both practicability and validity of the developed methods are illustrated through a case study of DC-DC boost converter.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 9","pages":"1328-1332"},"PeriodicalIF":4.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144918192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Attention-based large language models (LLMs) have revolutionized the natural language processing (NLP). Despite their impressive effectiveness, the quadratic complexity of self-attention incurs heavy computational and memory burdens. Dynamic sparse attention techniques emerge as a solution, however, the introduced extra prediction stage, coupled with costly data memory access, diminishes their hardware efficiency. To address these limitations, this brief proposes BETA, a fine-grained algorithm-architecture co-design tailored for sparse attention. First, a bit-grained multi-round filter (BMF) prediction is proposed to unveil and eliminate redundant memory access hidden in the sparsity prediction stage. Second, an adaptive and lightweight max-based threshold selection (MTS) strategy is developed to work in concert with the bit-wise prediction process. Third, a bit-wise out-of-order execution (BOOE) scheme is employed to enhance hardware utilization during bit-wise prediction. Finally, an elaborate architecture is designed to translate the theoretical complexity reduction into practical performance improvement. Implementation results demonstrate that BETA achieves $5.4times $ , $6.5times $ , $1.8times $ improvements in energy efficiency than the state-of-the-art Transformer accelerators Sanger, Spatten and SOFA, respectively, while maintaining comparable inference accuracy.
{"title":"BETA: A Bit-Grained Transformer Attention Accelerator With Efficient Early Termination","authors":"Huizheng Wang;Hongbin Wang;Zhiheng Yue;Jingyao Liu;Taiquan Wei;Shaojun Wei;Yang Hu;Shouyi Yin","doi":"10.1109/TCSII.2025.3596228","DOIUrl":"https://doi.org/10.1109/TCSII.2025.3596228","url":null,"abstract":"Attention-based large language models (LLMs) have revolutionized the natural language processing (NLP). Despite their impressive effectiveness, the quadratic complexity of self-attention incurs heavy computational and memory burdens. Dynamic sparse attention techniques emerge as a solution, however, the introduced extra prediction stage, coupled with costly data memory access, diminishes their hardware efficiency. To address these limitations, this brief proposes BETA, a fine-grained algorithm-architecture co-design tailored for sparse attention. First, a bit-grained multi-round filter (BMF) prediction is proposed to unveil and eliminate redundant memory access hidden in the sparsity prediction stage. Second, an adaptive and lightweight max-based threshold selection (MTS) strategy is developed to work in concert with the bit-wise prediction process. Third, a bit-wise out-of-order execution (BOOE) scheme is employed to enhance hardware utilization during bit-wise prediction. Finally, an elaborate architecture is designed to translate the theoretical complexity reduction into practical performance improvement. Implementation results demonstrate that BETA achieves <inline-formula> <tex-math>$5.4times $ </tex-math></inline-formula>, <inline-formula> <tex-math>$6.5times $ </tex-math></inline-formula>, <inline-formula> <tex-math>$1.8times $ </tex-math></inline-formula> improvements in energy efficiency than the state-of-the-art Transformer accelerators Sanger, Spatten and SOFA, respectively, while maintaining comparable inference accuracy.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 10","pages":"1433-1437"},"PeriodicalIF":4.9,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}