首页 > 最新文献

Integration-The Vlsi Journal最新文献

英文 中文
Encoding and decoding devices based on memristor-diode crossbar-array and CMOS logic for spiking neural networks 基于忆阻二极管交叉栅阵列和CMOS逻辑的尖峰神经网络编解码装置
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-20 DOI: 10.1016/j.vlsi.2026.102670
A.N. Busygin , S.Yu. Udovichenko , A.H.A. Ebrahim
Electric circuits of the encoding and decoding devices for converting information from binary representation into a spike sequence and back for hardware spiking neural networks are proposed. The devices differ from the known ones by using fully digital circuitry and memristor-diode crossbars, which potentially reduces energy consumption, provides greater integration of elements and, accordingly, a smaller occupied area on the chip. In addition, changing the states of the memristors makes it possible to arbitrarily set the functions of direct and reverse conversion of binary numbers into spike sequences. The operability of the encoding device is confirmed by numerical simulation of the process of encoding input four-digit number to the times of the first spikes and the average spike frequency. The decoding process is verified during the simulation of the extracting of a four-digit binary number encoded in spike times of three neurons.
提出了用于硬件尖峰神经网络的将二进制表示的信息转换成尖峰序列再转换回来的编码和解码装置电路。该器件与已知器件的不同之处在于,它采用了全数字电路和忆阻二极管横条,这可能会降低能耗,提供更大的元件集成度,相应地,芯片上的占用面积也更小。另外,通过改变忆阻器的状态,可以任意设置二进制数到尖峰序列的正反转换功能。通过对输入四位数到第一尖峰次数和平均尖峰频率的编码过程进行数值模拟,验证了编码装置的可操作性。在三个神经元尖峰时间编码的四位数二进制数提取仿真过程中验证了解码过程。
{"title":"Encoding and decoding devices based on memristor-diode crossbar-array and CMOS logic for spiking neural networks","authors":"A.N. Busygin ,&nbsp;S.Yu. Udovichenko ,&nbsp;A.H.A. Ebrahim","doi":"10.1016/j.vlsi.2026.102670","DOIUrl":"10.1016/j.vlsi.2026.102670","url":null,"abstract":"<div><div>Electric circuits of the encoding and decoding devices for converting information from binary representation into a spike sequence and back for hardware spiking neural networks are proposed. The devices differ from the known ones by using fully digital circuitry and memristor-diode crossbars, which potentially reduces energy consumption, provides greater integration of elements and, accordingly, a smaller occupied area on the chip. In addition, changing the states of the memristors makes it possible to arbitrarily set the functions of direct and reverse conversion of binary numbers into spike sequences. The operability of the encoding device is confirmed by numerical simulation of the process of encoding input four-digit number to the times of the first spikes and the average spike frequency. The decoding process is verified during the simulation of the extracting of a four-digit binary number encoded in spike times of three neurons.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102670"},"PeriodicalIF":2.5,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MORL-IC: Multi-objective reinforcement learning approaches for analog integrated circuits optimization 模拟集成电路优化的多目标强化学习方法
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-19 DOI: 10.1016/j.vlsi.2026.102664
Hakan Taşkıran, Engin Afacan
Analog and RF integrated circuit (IC) design requires the simultaneous optimization of multiple, conflicting objectives under highly nonlinear and tightly coupled constraints. While prior studies — including our own — have demonstrated the feasibility of applying multi-objective reinforcement learning (MORL) to analog circuit optimization, the specific impact of workflow-level design choices on convergence behavior, simulation cost, and Pareto-front characteristics has remained insufficiently explored. This paper reformulates the Multi-Objective Deep Deterministic Policy Gradient (MODDPG) approach not as a single fixed algorithm, but as a family of optimization workflows that share an identical multi-objective actor–critic learning core while systematically differing in their initialization strategy and environment evaluation mechanism. Within this unified formulation, three configurations are investigated: (i) a baseline MODDPG workflow with random initialization and direct SPICE evaluation, (ii) MODDPG-2, which employs analytically derived extreme solutions to guide early exploration, and (iii) MODDPG-3, which introduces an ANN-based pseudo-designer to generate boundary solutions directly from performance specifications. In addition, a Fully-ANN execution mode is examined, where an ANN-based pseudo-simulator replaces SPICE during policy learning to accelerate environment interaction. By preserving the same reinforcement learning architecture across all variants, the proposed framework isolates the effects of structured initialization and surrogate-based environments on optimization outcomes. The workflows are evaluated on three analog circuits (active-loaded differential amplifier, folded-cascode amplifier, and voltage comparator) and one RF circuit (CMOS cross-coupled LC oscillator), as well as on standard analytical benchmarks. Comparative results against NSGA-II and MOEA/D show that no single method universally dominates; however, the proposed workflows consistently reduce the number of required SPICE simulations by approximately 30%–75% while maintaining competitive Pareto-front quality. This efficiency gain indicates that the reinforcement-learning agent progressively acquires design intuition comparable to that of an experienced human designer—learning to avoid unpromising regions of the design space and focusing evaluations on high-value candidates. The results therefore demonstrate that, beyond the choice of learning algorithm, workflow-level design decisions critically shape how effectively RL can emulate expert design behavior, offering practical guidance for balancing solution quality and computational cost in automated analog and RF circuit design flows.
模拟和射频集成电路(IC)设计需要在高度非线性和紧密耦合约束下同时优化多个相互冲突的目标。虽然之前的研究(包括我们自己的研究)已经证明了将多目标强化学习(MORL)应用于模拟电路优化的可行性,但工作流级设计选择对收敛行为、仿真成本和帕累托前特征的具体影响仍然没有得到充分的探讨。本文将多目标深度确定性策略梯度(MODDPG)方法重新表述为一组优化工作流,而不是单一的固定算法,这些工作流共享相同的多目标行为者-批评者学习核心,但在初始化策略和环境评估机制上存在系统差异。在这个统一的公式中,研究了三种配置:(i)随机初始化和直接SPICE评估的基线MODDPG工作流,(ii) MODDPG-2,它采用解析导出的极值解来指导早期探索,以及(iii) MODDPG-3,它引入了基于人工神经网络的伪设计器,直接从性能规范生成边界解。此外,研究了全神经网络执行模式,其中基于神经网络的伪模拟器在策略学习期间取代SPICE以加速环境交互。通过在所有变量中保留相同的强化学习架构,所提出的框架隔离了结构化初始化和基于代理的环境对优化结果的影响。工作流程在三个模拟电路(有源负载差分放大器、折叠级联放大器和电压比较器)和一个射频电路(CMOS交叉耦合LC振荡器)以及标准分析基准上进行了评估。与NSGA-II和MOEA/D的比较结果表明,没有一种方法具有普遍的优势;然而,所提出的工作流程始终如一地将所需SPICE模拟的数量减少了大约30%-75%,同时保持了具有竞争力的Pareto-front质量。这种效率增益表明,强化学习代理逐渐获得了与经验丰富的人类设计师相当的设计直觉——学习避免设计空间中没有希望的区域,并将评估集中在高价值的候选对象上。因此,结果表明,除了学习算法的选择之外,工作流级设计决策对强化学习如何有效地模拟专家设计行为至关重要,为在自动化模拟和射频电路设计流程中平衡解决方案质量和计算成本提供了实用指导。
{"title":"MORL-IC: Multi-objective reinforcement learning approaches for analog integrated circuits optimization","authors":"Hakan Taşkıran,&nbsp;Engin Afacan","doi":"10.1016/j.vlsi.2026.102664","DOIUrl":"10.1016/j.vlsi.2026.102664","url":null,"abstract":"<div><div>Analog and RF integrated circuit (IC) design requires the simultaneous optimization of multiple, conflicting objectives under highly nonlinear and tightly coupled constraints. While prior studies — including our own — have demonstrated the feasibility of applying multi-objective reinforcement learning (MORL) to analog circuit optimization, the specific impact of workflow-level design choices on convergence behavior, simulation cost, and Pareto-front characteristics has remained insufficiently explored. This paper reformulates the Multi-Objective Deep Deterministic Policy Gradient (MODDPG) approach not as a single fixed algorithm, but as a family of optimization workflows that share an identical multi-objective actor–critic learning core while systematically differing in their initialization strategy and environment evaluation mechanism. Within this unified formulation, three configurations are investigated: (i) a baseline MODDPG workflow with random initialization and direct SPICE evaluation, (ii) MODDPG-2, which employs analytically derived extreme solutions to guide early exploration, and (iii) MODDPG-3, which introduces an ANN-based pseudo-designer to generate boundary solutions directly from performance specifications. In addition, a Fully-ANN execution mode is examined, where an ANN-based pseudo-simulator replaces SPICE during policy learning to accelerate environment interaction. By preserving the same reinforcement learning architecture across all variants, the proposed framework isolates the effects of structured initialization and surrogate-based environments on optimization outcomes. The workflows are evaluated on three analog circuits (active-loaded differential amplifier, folded-cascode amplifier, and voltage comparator) and one RF circuit (CMOS cross-coupled LC oscillator), as well as on standard analytical benchmarks. Comparative results against NSGA-II and MOEA/D show that no single method universally dominates; however, the proposed workflows consistently reduce the number of required SPICE simulations by approximately 30%–75% while maintaining competitive Pareto-front quality. This efficiency gain indicates that the reinforcement-learning agent progressively acquires design intuition comparable to that of an experienced human designer—learning to avoid unpromising regions of the design space and focusing evaluations on high-value candidates. The results therefore demonstrate that, beyond the choice of learning algorithm, workflow-level design decisions critically shape how effectively RL can emulate expert design behavior, offering practical guidance for balancing solution quality and computational cost in automated analog and RF circuit design flows.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102664"},"PeriodicalIF":2.5,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive-precision SIMD architecture for high-throughput and resource-efficient DNN acceleration 用于高吞吐量和资源高效DNN加速的自适应精度SIMD架构
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-16 DOI: 10.1016/j.vlsi.2026.102666
Vasundhara Trivedi , Harman Singh Bagga , Gopal Raut , Santosh Kumar Vishvakarma
Deep Neural Network (DNN) accelerators require high computational throughput and flexible precision support while operating under stringent resource and power constraints. To address these challenges, we propose an adaptive-precision SIMD (Single Instruction, Multiple Data) Processing Element (PE) architecture for signed integer and fixed-point operations that maximizes resource utilization and enhances parallelism in multiply–accumulate (MAC) computations. The design introduces efficient resource reuse during partial product accumulation and supports both symmetric and asymmetric precision modes. Unlike state-of-the-art approaches, the proposed PE dynamically scales computation: processing 16 operands at low precision (4-bit), four operands at medium precision (8-bit), and a single operand at high precision (16-bit). Additionally, it supports asymmetric operations such as 16 × 4-bit multiplications in parallel, enabling unique flexibility and performance gains. The architecture is implemented and tested on ASIC and FPGA platforms. Accuracy evaluations across different DNN models and datasets show very small losses at reduced precision—less than 1% for LeNet on MNIST, 2.9% for AlexNet on CIFAR-10, 2.2% for VGG16 on CIFAR-10, and 3.5% for VGG16 on ImageNet-1000 compared to float32. Hardware synthesis yields significant improvements, including 46.2% fewer LUTs and 2.45 × less power on FPGA compared to existing designs. The proposed architecture delivers 2× higher throughput, upto 4.8× energy efficiency with 28.57% less area at 65 nm, compared to existing works, making it ideal for applications with variable precision and limited resources.
深度神经网络(DNN)加速器需要高计算吞吐量和灵活的精度支持,同时在严格的资源和功率限制下运行。为了解决这些挑战,我们提出了一种自适应精度SIMD(单指令,多数据)处理元素(PE)架构,用于有符号整数和定点操作,最大限度地提高资源利用率并增强乘法累积(MAC)计算的并行性。该设计在部分产品积累过程中引入了有效的资源重用,并支持对称和非对称精度模式。与最先进的方法不同,所提出的PE动态扩展计算:以低精度(4位)处理16个操作数,以中等精度(8位)处理4个操作数,以及以高精度(16位)处理单个操作数。此外,它还支持并行16 × 4位乘法等非对称操作,从而实现了独特的灵活性和性能提升。该体系结构在ASIC和FPGA平台上进行了实现和测试。不同DNN模型和数据集的精度评估显示,与float32相比,LeNet在MNIST上的精度损失非常小,AlexNet在CIFAR-10上的精度损失不到1%,VGG16在CIFAR-10上的精度损失不到2.2%,VGG16在ImageNet-1000上的精度损失不到3.5%。硬件合成产生了显著的改进,包括与现有设计相比,FPGA的lut减少了46.2%,功耗降低了2.45倍。与现有的架构相比,该架构的吞吐量提高了2倍,能效提高了4.8倍,65nm的面积减少了28.57%,使其成为可变精度和有限资源应用的理想选择。
{"title":"Adaptive-precision SIMD architecture for high-throughput and resource-efficient DNN acceleration","authors":"Vasundhara Trivedi ,&nbsp;Harman Singh Bagga ,&nbsp;Gopal Raut ,&nbsp;Santosh Kumar Vishvakarma","doi":"10.1016/j.vlsi.2026.102666","DOIUrl":"10.1016/j.vlsi.2026.102666","url":null,"abstract":"<div><div>Deep Neural Network (DNN) accelerators require high computational throughput and flexible precision support while operating under stringent resource and power constraints. To address these challenges, we propose an adaptive-precision SIMD (Single Instruction, Multiple Data) Processing Element (PE) architecture for signed integer and fixed-point operations that maximizes resource utilization and enhances parallelism in multiply–accumulate (MAC) computations. The design introduces efficient resource reuse during partial product accumulation and supports both symmetric and asymmetric precision modes. Unlike state-of-the-art approaches, the proposed PE dynamically scales computation: processing 16 operands at low precision (4-bit), four operands at medium precision (8-bit), and a single operand at high precision (16-bit). Additionally, it supports asymmetric operations such as 16 <span><math><mo>×</mo></math></span> 4-bit multiplications in parallel, enabling unique flexibility and performance gains. The architecture is implemented and tested on ASIC and FPGA platforms. Accuracy evaluations across different DNN models and datasets show very small losses at reduced precision—less than 1% for LeNet on MNIST, 2.9% for AlexNet on CIFAR-10, 2.2% for VGG16 on CIFAR-10, and 3.5% for VGG16 on ImageNet-1000 compared to float32. Hardware synthesis yields significant improvements, including 46.2% fewer LUTs and 2.45 <span><math><mo>×</mo></math></span> less power on FPGA compared to existing designs. The proposed architecture delivers 2<span><math><mo>×</mo></math></span> higher throughput, upto 4.8<span><math><mo>×</mo></math></span> energy efficiency with 28.57% less area at 65 nm, compared to existing works, making it ideal for applications with variable precision and limited resources.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102666"},"PeriodicalIF":2.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of a memristive CNN chaotic system: From chaotic behavior analysis to circuit implementation and robust control 记忆CNN混沌系统的设计:从混沌行为分析到电路实现和鲁棒控制
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-16 DOI: 10.1016/j.vlsi.2026.102669
Jie Zhang, Jiliang Lv, Nana Cheng, Liu Yang
Using a quadratic memristor as the connection weight between cells, a 4D memristive cellular neural network (CNN) chaotic system is constructed. Through a series of dynamical analyses, it is found that the system exhibits relatively rich dynamical characteristics. In this study, offset boosting is achieved by adding an offset parameter. Furthermore, the system’s amplitude control and attractor rotation are also realized. Based on the fundamental theory of attractor rotation, a multi-wing attractor transformation of the chaotic system is achieved. Additionally, circuit simulation software is used to design and implement a simulation circuit for the 4D memristive chaotic system, and the simulation results verified the physical feasibility of the constructed system. Finally, feedback control is implemented using the H control principle. Validation confirms that the designed controller can effectively counteract external disturbances and stabilize the system output.
利用二次型忆阻器作为单元间的连接权,构造了一个四维忆阻细胞神经网络(CNN)混沌系统。通过一系列的动力学分析,发现该系统具有较为丰富的动力学特性。在本研究中,偏移增强是通过增加偏移参数来实现的。此外,还实现了系统的幅值控制和吸引子旋转。基于吸引子旋转的基本理论,实现了混沌系统的多翼吸引子变换。利用电路仿真软件设计并实现了四维忆阻混沌系统的仿真电路,仿真结果验证了所构建系统的物理可行性。最后,利用H∞控制原理实现反馈控制。验证结果表明,所设计的控制器能够有效地抵消外部干扰,稳定系统输出。
{"title":"Design of a memristive CNN chaotic system: From chaotic behavior analysis to circuit implementation and robust control","authors":"Jie Zhang,&nbsp;Jiliang Lv,&nbsp;Nana Cheng,&nbsp;Liu Yang","doi":"10.1016/j.vlsi.2026.102669","DOIUrl":"10.1016/j.vlsi.2026.102669","url":null,"abstract":"<div><div>Using a quadratic memristor as the connection weight between cells, a 4D memristive cellular neural network (CNN) chaotic system is constructed. Through a series of dynamical analyses, it is found that the system exhibits relatively rich dynamical characteristics. In this study, offset boosting is achieved by adding an offset parameter. Furthermore, the system’s amplitude control and attractor rotation are also realized. Based on the fundamental theory of attractor rotation, a multi-wing attractor transformation of the chaotic system is achieved. Additionally, circuit simulation software is used to design and implement a simulation circuit for the 4D memristive chaotic system, and the simulation results verified the physical feasibility of the constructed system. Finally, feedback control is implemented using the <span><math><msub><mrow><mi>H</mi></mrow><mrow><mi>∞</mi></mrow></msub></math></span> control principle. Validation confirms that the designed controller can effectively counteract external disturbances and stabilize the system output.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102669"},"PeriodicalIF":2.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 180-nm CMOS fully digital chaotic Lorenz system 一个180纳米CMOS全数字混沌洛伦兹系统
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-15 DOI: 10.1016/j.vlsi.2026.102667
Alejandro Silva-Juarez , Sergio A. Rosales-Nunez , Luis C. Alvarez-Simon , Gregorio Zamora-Mejia , Julio Hernandez-Perez , Victor H. Carbajal-Gomez , Jose M. Rocha-Perez , Alejandro I. Bautista-Castillo
This paper presents the design, Application-Specific Integrated Circuit (ASIC) fabrication, and experimental characterization of a fully digital Lorenz chaotic system in 180 nm CMOS technology. Motivated by the growing demand for high-performance chaotic generators in secure communications, the design originates from a rigorous mathematical formulation, verifying its chaotic behavior via the Kaplan–Yorke dimension estimate. The architecture is based on the forward Euler method, implemented with a 32-bit fixed-point datapath, a control finite-state machine, and integrated 32-to-1 serializers. We detail the complete digital design flow, from RTL to GDSII, using a unified synthesis and place-and-route methodology with the Synopsys Fusion Compiler toolchain. For characterization, the serializers stream the state variables x(n), y(n), and z(n) to an external FPGA, enabling subsequent transmission to a PC via a UART interface. The integrated circuit was fabricated and tested; experimental measurements successfully validated the chaotic behavior predicted by post-layout simulations. The final design occupies a cell area of 0.873 mm2 and exhibits an estimated static power of 1.02 mW. Post-layout timing reports indicate that the design meets its target frequencies, achieving a maximum operating frequency above 57 MHz (with a positive slack of +2.61 ns on the 50 MHz clock). The ASIC implementation demonstrates significant advantages in power, density, and speed potential over FPGA-based alternatives, positioning this design as a robust solution for embedded security systems.
本文介绍了180nm CMOS技术下的全数字洛伦兹混沌系统的设计、专用集成电路(ASIC)制造和实验表征。由于安全通信中对高性能混沌发生器的需求日益增长,该设计源于严格的数学公式,通过Kaplan-Yorke维数估计验证其混沌行为。该体系结构基于前向欧拉方法,使用32位定点数据路径、控制有限状态机和集成的32对1串行化器实现。我们详细介绍了完整的数字设计流程,从RTL到GDSII,使用统一的合成和放置路径方法与Synopsys Fusion Compiler工具链。为了进行特性描述,串行化器将状态变量x(n)、y(n)和z(n)流式传输到外部FPGA,以便随后通过UART接口传输到PC。完成了集成电路的制作和测试;实验测量成功地验证了后布局仿真预测的混沌行为。最终设计的电池面积为0.873 mm2,估计静态功率为1.02 mW。布局后时序报告表明,该设计满足其目标频率,实现高于57 MHz的最大工作频率(在50 MHz时钟上具有+2.61 ns的正松弛)。与基于fpga的替代方案相比,ASIC实现在功率,密度和速度潜力方面具有显着优势,将该设计定位为嵌入式安全系统的强大解决方案。
{"title":"A 180-nm CMOS fully digital chaotic Lorenz system","authors":"Alejandro Silva-Juarez ,&nbsp;Sergio A. Rosales-Nunez ,&nbsp;Luis C. Alvarez-Simon ,&nbsp;Gregorio Zamora-Mejia ,&nbsp;Julio Hernandez-Perez ,&nbsp;Victor H. Carbajal-Gomez ,&nbsp;Jose M. Rocha-Perez ,&nbsp;Alejandro I. Bautista-Castillo","doi":"10.1016/j.vlsi.2026.102667","DOIUrl":"10.1016/j.vlsi.2026.102667","url":null,"abstract":"<div><div>This paper presents the design, Application-Specific Integrated Circuit (ASIC) fabrication, and experimental characterization of a fully digital Lorenz chaotic system in 180 nm CMOS technology. Motivated by the growing demand for high-performance chaotic generators in secure communications, the design originates from a rigorous mathematical formulation, verifying its chaotic behavior via the Kaplan–Yorke dimension estimate. The architecture is based on the forward Euler method, implemented with a 32-bit fixed-point datapath, a control finite-state machine, and integrated 32-to-1 serializers. We detail the complete digital design flow, from RTL to GDSII, using a unified synthesis and place-and-route methodology with the Synopsys Fusion Compiler toolchain. For characterization, the serializers stream the state variables <span><math><mrow><mi>x</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow></mrow></math></span>, <span><math><mrow><mi>y</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow></mrow></math></span>, and <span><math><mrow><mi>z</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow></mrow></math></span> to an external FPGA, enabling subsequent transmission to a PC via a UART interface. The integrated circuit was fabricated and tested; experimental measurements successfully validated the chaotic behavior predicted by post-layout simulations. The final design occupies a cell area of 0.873<!--> <!-->mm<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span> and exhibits an estimated static power of 1.02<!--> <!-->mW. Post-layout timing reports indicate that the design meets its target frequencies, achieving a maximum operating frequency above 57<!--> <!-->MHz (with a positive slack of +2.61<!--> <!-->ns on the 50<!--> <!-->MHz clock). The ASIC implementation demonstrates significant advantages in power, density, and speed potential over FPGA-based alternatives, positioning this design as a robust solution for embedded security systems.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102667"},"PeriodicalIF":2.5,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of a load modulated balance doherty power amplifier using parallel coupled line (PCL) structure line coupler for 5G IoT applications 5G物联网应用中采用并联耦合线(PCL)结构的负载调制平衡多赫蒂功率放大器的设计
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-15 DOI: 10.1016/j.vlsi.2026.102659
Rajesh Kumar , Sachin Kumar , Binod Kumar Kanaujia
This paper presents a new design of a load-modulated balance doherty power amplifier (LM-BDPA) using a parallel coupled line (PCL) structure line coupler for 5G internet of things (IoT) applications. The proposed design aims to achieve high efficiency and wide bandwidth, which are critical for 5G communication systems. The LM-BDPA utilizes a PCL structure to enhance the load modulation capability, resulting in improved power-added efficiency (PAE) and linearity. The design also incorporates advanced thermal management techniques to ensure stable operation under high-power conditions. The performance of the proposed LM-BDPA is evaluated through both simulations and measurements. The fabricated prototype achieves a measured gain of 11.2–13.4 dB and a saturated output power of ∼41 dBm. A PAE of 46.4–56.5 % and 43.2–50.3 % is achieved at 6 dB and 8 dB output power back-off, respectively, across the designed frequency band.
针对5G物联网应用,提出了一种采用平行耦合线(PCL)结构线耦合器的负载调制平衡型多谐功率放大器(LM-BDPA)的新设计。提出的设计旨在实现对5G通信系统至关重要的高效率和宽带宽。LM-BDPA采用PCL结构来增强负载调制能力,从而提高了功率附加效率(PAE)和线性度。该设计还采用了先进的热管理技术,以确保在高功率条件下稳定运行。通过仿真和测量对LM-BDPA的性能进行了评价。制作的原型实现了11.2-13.4 dB的测量增益和~ 41 dBm的饱和输出功率。在设计的频段内,在6 dB和8 dB输出功率回退时,PAE分别为46.4 - 56.5%和43.2 - 50.3%。
{"title":"Design of a load modulated balance doherty power amplifier using parallel coupled line (PCL) structure line coupler for 5G IoT applications","authors":"Rajesh Kumar ,&nbsp;Sachin Kumar ,&nbsp;Binod Kumar Kanaujia","doi":"10.1016/j.vlsi.2026.102659","DOIUrl":"10.1016/j.vlsi.2026.102659","url":null,"abstract":"<div><div>This paper presents a new design of a load-modulated balance doherty power amplifier (LM-BDPA) using a parallel coupled line (PCL) structure line coupler for 5G internet of things (IoT) applications. The proposed design aims to achieve high efficiency and wide bandwidth, which are critical for 5G communication systems. The LM-BDPA utilizes a PCL structure to enhance the load modulation capability, resulting in improved power-added efficiency (PAE) and linearity. The design also incorporates advanced thermal management techniques to ensure stable operation under high-power conditions. The performance of the proposed LM-BDPA is evaluated through both simulations and measurements. The fabricated prototype achieves a measured gain of 11.2–13.4 dB and a saturated output power of ∼41 dBm. A PAE of 46.4–56.5 % and 43.2–50.3 % is achieved at 6 dB and 8 dB output power back-off, respectively, across the designed frequency band.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102659"},"PeriodicalIF":2.5,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clock tree synthesis in modern VLSI: From foundational algorithms to AI-driven optimization 现代VLSI中的时钟树合成:从基础算法到人工智能驱动优化
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-14 DOI: 10.1016/j.vlsi.2026.102665
Manikandan B , Karthikumar S
As digital architectures scale to unprecedented complexity, driven by emerging domains such as artificial intelligence, edge inference, and ultra-low-power systems, the strategic orchestration of clock signal delivery has become a cornerstone of integrated circuit design. Clock Tree Synthesis (CTS), a critical stage in the physical design flow, plays a vital role in maintaining synchronous operation, balancing timing constraints, managing dynamic and leakage power, and ensuring signal integrity under aggressive scaling and variable workloads. This review systematically dissects the evolution of CTS, beginning with classical methodologies centered on recursive trees, buffer insertion, and delay balancing, before exploring advanced solutions tailored for variability tolerance, power optimization, and architectural irregularity. In addition, the growing influence of machine learning and data-driven models that replace rigid rule sets with adaptive, layout-aware strategies offers predictive insights and multi-objective optimization throughout the design process. In addition, this study examined specialized use cases in security-conscious designs, aging-resilient circuits, photonic interconnects, and neuromorphic platforms, each demanding unique timing models and synthesis heuristics. This discussion culminates in a reflection on prevailing gaps, including the need for transparent ML integration, benchmark standardization, and holistic frameworks that bridge logical design with physical realization. This work offers a comprehensive perspective on the shifting paradigm of CTS, illuminating its central role in shaping high-performance, energy-efficient, and scalable silicon systems.
在人工智能、边缘推理和超低功耗系统等新兴领域的推动下,随着数字架构规模达到前所未有的复杂性,时钟信号传递的战略编排已成为集成电路设计的基石。时钟树合成(Clock Tree Synthesis, CTS)是物理设计流程中的关键阶段,在保持同步运行、平衡时序约束、管理动态和泄漏功率以及确保大规模缩放和可变工作负载下的信号完整性方面发挥着至关重要的作用。本文系统地剖析了CTS的发展,从以递归树、缓冲区插入和延迟平衡为中心的经典方法开始,然后探索为可变性容忍度、功率优化和架构不规则性量身定制的高级解决方案。此外,机器学习和数据驱动模型的影响力越来越大,它们用自适应的布局感知策略取代了严格的规则集,在整个设计过程中提供了预测性见解和多目标优化。此外,本研究还研究了安全意识设计、抗老化电路、光子互连和神经形态平台等方面的特殊用例,每个用例都需要独特的时序模型和综合启发式。讨论的高潮是对当前差距的反思,包括对透明ML集成、基准标准化和将逻辑设计与物理实现连接起来的整体框架的需求。这项工作为CTS的转变范式提供了一个全面的视角,阐明了其在塑造高性能,节能和可扩展的硅系统中的核心作用。
{"title":"Clock tree synthesis in modern VLSI: From foundational algorithms to AI-driven optimization","authors":"Manikandan B ,&nbsp;Karthikumar S","doi":"10.1016/j.vlsi.2026.102665","DOIUrl":"10.1016/j.vlsi.2026.102665","url":null,"abstract":"<div><div>As digital architectures scale to unprecedented complexity, driven by emerging domains such as artificial intelligence, edge inference, and ultra-low-power systems, the strategic orchestration of clock signal delivery has become a cornerstone of integrated circuit design. Clock Tree Synthesis (CTS), a critical stage in the physical design flow, plays a vital role in maintaining synchronous operation, balancing timing constraints, managing dynamic and leakage power, and ensuring signal integrity under aggressive scaling and variable workloads. This review systematically dissects the evolution of CTS, beginning with classical methodologies centered on recursive trees, buffer insertion, and delay balancing, before exploring advanced solutions tailored for variability tolerance, power optimization, and architectural irregularity. In addition, the growing influence of machine learning and data-driven models that replace rigid rule sets with adaptive, layout-aware strategies offers predictive insights and multi-objective optimization throughout the design process. In addition, this study examined specialized use cases in security-conscious designs, aging-resilient circuits, photonic interconnects, and neuromorphic platforms, each demanding unique timing models and synthesis heuristics. This discussion culminates in a reflection on prevailing gaps, including the need for transparent ML integration, benchmark standardization, and holistic frameworks that bridge logical design with physical realization. This work offers a comprehensive perspective on the shifting paradigm of CTS, illuminating its central role in shaping high-performance, energy-efficient, and scalable silicon systems.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102665"},"PeriodicalIF":2.5,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A test data compression method based on sliding-window encoding and matching length reuse 一种基于滑动窗口编码和匹配长度复用的测试数据压缩方法
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-12 DOI: 10.1016/j.vlsi.2026.102663
Yuanfa Ji , Haihui Zhang , Xiyan Sun , Furong Jiang , Qiang Fu
With the continuous increase in chip integration density and reliability requirements, test data volume has grown significantly. At the same time, limitations of automatic test equipment in terms of physical I/O channel count, memory capacity, and data transmission bandwidth have further raised test costs. To address these challenges, this paper proposes a test data compression method based on sliding-window encoding. This approach identifies repeated sequences in the data to be encoded and replaces them with shorter codewords, thereby achieving effective compression. Furthermore, a match length reuse mechanism is introduced, which considerably enhances both codeword utilization efficiency and compression performance. Additionally, this paper systematically analyzes the impact of encoding parameters on the compression ratio, optimizes the encoding scheme considering hardware overhead, and designs a corresponding decompression architecture. Experimental results show that the proposed method achieves an average compression ratio of 66.86% on ISCAS’89 benchmark circuits. This provides an innovative and practical solution for test data compression.
随着芯片集成密度和可靠性要求的不断提高,测试数据量显著增长。同时,自动测试设备在物理I/O通道数、内存容量、数据传输带宽等方面的限制进一步提高了测试成本。为了解决这些问题,本文提出了一种基于滑动窗口编码的测试数据压缩方法。该方法识别出要编码的数据中的重复序列,并用较短的码字替换它们,从而实现有效的压缩。此外,还引入了匹配长度重用机制,大大提高了码字利用率和压缩性能。系统分析了编码参数对压缩比的影响,考虑硬件开销对编码方案进行了优化,并设计了相应的解压缩体系结构。实验结果表明,该方法在ISCAS’89基准电路上的平均压缩比达到66.86%。这为测试数据压缩提供了一种创新实用的解决方案。
{"title":"A test data compression method based on sliding-window encoding and matching length reuse","authors":"Yuanfa Ji ,&nbsp;Haihui Zhang ,&nbsp;Xiyan Sun ,&nbsp;Furong Jiang ,&nbsp;Qiang Fu","doi":"10.1016/j.vlsi.2026.102663","DOIUrl":"10.1016/j.vlsi.2026.102663","url":null,"abstract":"<div><div>With the continuous increase in chip integration density and reliability requirements, test data volume has grown significantly. At the same time, limitations of automatic test equipment in terms of physical I/O channel count, memory capacity, and data transmission bandwidth have further raised test costs. To address these challenges, this paper proposes a test data compression method based on sliding-window encoding. This approach identifies repeated sequences in the data to be encoded and replaces them with shorter codewords, thereby achieving effective compression. Furthermore, a match length reuse mechanism is introduced, which considerably enhances both codeword utilization efficiency and compression performance. Additionally, this paper systematically analyzes the impact of encoding parameters on the compression ratio, optimizes the encoding scheme considering hardware overhead, and designs a corresponding decompression architecture. Experimental results show that the proposed method achieves an average compression ratio of 66.86% on ISCAS’89 benchmark circuits. This provides an innovative and practical solution for test data compression.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102663"},"PeriodicalIF":2.5,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of a dynamic obfuscation-based strong PUF resistant to modeling attacks and mutual authentication protocol 设计了一种基于动态模糊的抗建模攻击的强PUF和互认证协议
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-11 DOI: 10.1016/j.vlsi.2026.102655
Yingchun Lu , Hongliang Lu , Yujie Liu , Huaguo Liang , Zhengfeng Huang , Jinlin Chen , Xiumin Xu , Liang Yao
Strong Physical Unclonable Functions (PUFs) are vulnerable to modeling attacks using Machine Learning (ML), and PUF-based authentication protocols also face security risks. To address these issues, this paper proposes a PUF structure with resistance to modeling attacks based on Dynamic Obfuscation (DO), composed of Linear Feedback Shift Registers (LFSRs), PUFs, and several logic gates. The characteristics of DO are as follows: (1) the initial state of the LFSR is determined by the PUF's response, making it uncontrollable; (2) the updated state of the LFSR determines the obfuscated bit of each input challenge, achieving a dynamic mapping between challenges and responses. An Arbiter PUF (APUF) based on DO is implemented on Xilinx Artix-7 FPGA, and experimental results show that the structure can effectively resist modeling attacks from various ML algorithms, with prediction accuracy close to 50 %. In addition, this paper proposes a mutual authentication protocol based on PUF, suitable for Internet of Things (IoT) systems.
强物理不可克隆函数(puf)容易受到机器学习(ML)的建模攻击,基于puf的认证协议也面临安全风险。为了解决这些问题,本文提出了一种基于动态混淆(DO)的PUF结构,该结构具有抗建模攻击的能力,由线性反馈移位寄存器(LFSRs), PUF和几个逻辑门组成。DO的特点是:(1)LFSR的初始状态由PUF的响应决定,不可控;(2) LFSR的更新状态决定了每个输入挑战的混淆位,实现了挑战与响应之间的动态映射。在Xilinx Artix-7 FPGA上实现了基于DO的Arbiter PUF (APUF),实验结果表明,该结构能够有效抵御各种ML算法的建模攻击,预测准确率接近50%。此外,本文还提出了一种适用于物联网(IoT)系统的基于PUF的互认证协议。
{"title":"Design of a dynamic obfuscation-based strong PUF resistant to modeling attacks and mutual authentication protocol","authors":"Yingchun Lu ,&nbsp;Hongliang Lu ,&nbsp;Yujie Liu ,&nbsp;Huaguo Liang ,&nbsp;Zhengfeng Huang ,&nbsp;Jinlin Chen ,&nbsp;Xiumin Xu ,&nbsp;Liang Yao","doi":"10.1016/j.vlsi.2026.102655","DOIUrl":"10.1016/j.vlsi.2026.102655","url":null,"abstract":"<div><div>Strong Physical Unclonable Functions (PUFs) are vulnerable to modeling attacks using Machine Learning (ML), and PUF-based authentication protocols also face security risks. To address these issues, this paper proposes a PUF structure with resistance to modeling attacks based on Dynamic Obfuscation (DO), composed of Linear Feedback Shift Registers (LFSRs), PUFs, and several logic gates. The characteristics of DO are as follows: (1) the initial state of the LFSR is determined by the PUF's response, making it uncontrollable; (2) the updated state of the LFSR determines the obfuscated bit of each input challenge, achieving a dynamic mapping between challenges and responses. An Arbiter PUF (APUF) based on DO is implemented on Xilinx Artix-7 FPGA, and experimental results show that the structure can effectively resist modeling attacks from various ML algorithms, with prediction accuracy close to 50 %. In addition, this paper proposes a mutual authentication protocol based on PUF, suitable for Internet of Things (IoT) systems.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102655"},"PeriodicalIF":2.5,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EOHEAA: Error-Optimized Hardware-Efficient Approximate Adder for energy-aware error-resilient applications EOHEAA:用于能量感知错误弹性应用的错误优化硬件高效近似加法器
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2026-01-09 DOI: 10.1016/j.vlsi.2026.102660
Prateek Goyal, Sujit Kumar Sahoo
This work introduces a novel Error-Optimized Hardware-Efficient Approximate Adder (EOHEAA) tailored for error-resilient computing tasks, where precision can be traded for improvements in energy, delay, and resource efficiency. The EOHEAA adopts a strategic method of controlled error propagation, enabling significant enhancement in accuracy metrics such as Mean Error Distance (MED), Mean Relative Error Distance (MRED), and Normalized MED (NMED), while maintaining minimal hardware overhead. Synthesized on the Artix-7 FPGA (XC7A35T1CPG236C) using Verilog HDL, EOHEAA achieves up to 38.6% reduction in power consumption, an 34% improvement in critical path delay, and notable savings in logic resources compared to conventional and state-of-the-art approximate adder designs. Comprehensive analysis across 8, 16, and 32-bit configurations further confirms its scalability and robustness, with PDP improvements reaching 71.5% in wider designs. Notably, EOHEAA outperforms several existing designs by achieving the lowest RMSE (32.21), minimum EDmax (71), and the highest accuracy-to-efficiency balance. ASIC-oriented design flow evaluation is further performed using Cadence Genus with predictive standard-cell libraries to analyze area, power, and timing behavior under advanced technology assumptions. To validate its real-world applicability, EOHEAA has been employed in Edge Detection and Color quantization using K-means clustering, both of which demonstrate high-quality outputs under relaxed accuracy constraints. Furthermore, a lightweight CNN-based validation framework is employed to examine the impact of approximate arithmetic on learning-based workloads, demonstrating that EOHEAA preserves inference accuracy while offering tangible energy and performance benefits. These results collectively position EOHEAA as a strong candidate for next-generation approximate arithmetic units in energy-aware image processing and machine-learning accelerators.
这项工作介绍了一种新颖的错误优化硬件高效近似加法器(EOHEAA),专为错误弹性计算任务量身定制,其中精度可以换取能源,延迟和资源效率的改进。EOHEAA采用了一种控制误差传播的策略方法,在保持最小硬件开销的同时,显著提高了精度指标,如平均误差距离(MED)、平均相对误差距离(MRED)和标准化误差距离(NMED)。EOHEAA在Artix-7 FPGA (XC7A35T - 1CPG236C)上使用Verilog HDL进行合成,与传统和最先进的近似加器设计相比,功耗降低38.6%,关键路径延迟提高34%,逻辑资源显著节省。对8位、16位和32位配置的综合分析进一步证实了其可扩展性和稳健性,在更宽的设计中,PDP改进达到71.5%。值得注意的是,EOHEAA通过实现最低RMSE(32.21),最小EDmax(71)和最高精度-效率平衡而优于几种现有设计。使用Cadence Genus和预测性标准单元库进一步执行面向asic的设计流程评估,以分析先进技术假设下的面积,功率和时序行为。为了验证其在现实世界中的适用性,EOHEAA被用于边缘检测和使用K-means聚类的颜色量化,两者都在宽松的精度约束下展示了高质量的输出。此外,采用轻量级的基于cnn的验证框架来检查近似算法对基于学习的工作负载的影响,证明EOHEAA在提供切实的能量和性能优势的同时保持了推理准确性。这些结果共同将EOHEAA定位为能量感知图像处理和机器学习加速器中下一代近似算术单元的强有力候选者。
{"title":"EOHEAA: Error-Optimized Hardware-Efficient Approximate Adder for energy-aware error-resilient applications","authors":"Prateek Goyal,&nbsp;Sujit Kumar Sahoo","doi":"10.1016/j.vlsi.2026.102660","DOIUrl":"10.1016/j.vlsi.2026.102660","url":null,"abstract":"<div><div>This work introduces a novel Error-Optimized Hardware-Efficient Approximate Adder (EOHEAA) tailored for error-resilient computing tasks, where precision can be traded for improvements in energy, delay, and resource efficiency. The EOHEAA adopts a strategic method of controlled error propagation, enabling significant enhancement in accuracy metrics such as Mean Error Distance (MED), Mean Relative Error Distance (MRED), and <em>Normalized MED (NMED)</em>, while maintaining minimal hardware overhead. Synthesized on the Artix-7 FPGA <span><math><mrow><mo>(</mo><mi>X</mi><mi>C</mi><mn>7</mn><mi>A</mi><mn>35</mn><mi>T</mi><mo>−</mo><mn>1</mn><mi>C</mi><mi>P</mi><mi>G</mi><mn>236</mn><mi>C</mi><mo>)</mo></mrow></math></span> using Verilog HDL, EOHEAA achieves up to 38.6% reduction in power consumption, an 34% improvement in critical path delay, and notable savings in logic resources compared to conventional and state-of-the-art approximate adder designs. Comprehensive analysis across 8, 16, and 32-bit configurations further confirms its scalability and robustness, with PDP improvements reaching 71.5% in wider designs. Notably, EOHEAA outperforms several existing designs by achieving the lowest RMSE <span><math><mrow><mo>(</mo><mn>32</mn><mo>.</mo><mn>21</mn><mo>)</mo></mrow></math></span>, minimum ED<sub>max</sub> <span><math><mrow><mo>(</mo><mn>71</mn><mo>)</mo></mrow></math></span>, and the highest accuracy-to-efficiency balance. ASIC-oriented design flow evaluation is further performed using Cadence Genus with predictive standard-cell libraries to analyze area, power, and timing behavior under advanced technology assumptions. To validate its real-world applicability, EOHEAA has been employed in Edge Detection and Color quantization using K-means clustering, both of which demonstrate high-quality outputs under relaxed accuracy constraints. Furthermore, a lightweight CNN-based validation framework is employed to examine the impact of approximate arithmetic on learning-based workloads, demonstrating that EOHEAA preserves inference accuracy while offering tangible energy and performance benefits. These results collectively position EOHEAA as a strong candidate for next-generation approximate arithmetic units in energy-aware image processing and machine-learning accelerators.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102660"},"PeriodicalIF":2.5,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Integration-The Vlsi Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1