Pub Date : 2026-01-20DOI: 10.1016/j.vlsi.2026.102670
A.N. Busygin , S.Yu. Udovichenko , A.H.A. Ebrahim
Electric circuits of the encoding and decoding devices for converting information from binary representation into a spike sequence and back for hardware spiking neural networks are proposed. The devices differ from the known ones by using fully digital circuitry and memristor-diode crossbars, which potentially reduces energy consumption, provides greater integration of elements and, accordingly, a smaller occupied area on the chip. In addition, changing the states of the memristors makes it possible to arbitrarily set the functions of direct and reverse conversion of binary numbers into spike sequences. The operability of the encoding device is confirmed by numerical simulation of the process of encoding input four-digit number to the times of the first spikes and the average spike frequency. The decoding process is verified during the simulation of the extracting of a four-digit binary number encoded in spike times of three neurons.
{"title":"Encoding and decoding devices based on memristor-diode crossbar-array and CMOS logic for spiking neural networks","authors":"A.N. Busygin , S.Yu. Udovichenko , A.H.A. Ebrahim","doi":"10.1016/j.vlsi.2026.102670","DOIUrl":"10.1016/j.vlsi.2026.102670","url":null,"abstract":"<div><div>Electric circuits of the encoding and decoding devices for converting information from binary representation into a spike sequence and back for hardware spiking neural networks are proposed. The devices differ from the known ones by using fully digital circuitry and memristor-diode crossbars, which potentially reduces energy consumption, provides greater integration of elements and, accordingly, a smaller occupied area on the chip. In addition, changing the states of the memristors makes it possible to arbitrarily set the functions of direct and reverse conversion of binary numbers into spike sequences. The operability of the encoding device is confirmed by numerical simulation of the process of encoding input four-digit number to the times of the first spikes and the average spike frequency. The decoding process is verified during the simulation of the extracting of a four-digit binary number encoded in spike times of three neurons.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102670"},"PeriodicalIF":2.5,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1016/j.vlsi.2026.102664
Hakan Taşkıran, Engin Afacan
Analog and RF integrated circuit (IC) design requires the simultaneous optimization of multiple, conflicting objectives under highly nonlinear and tightly coupled constraints. While prior studies — including our own — have demonstrated the feasibility of applying multi-objective reinforcement learning (MORL) to analog circuit optimization, the specific impact of workflow-level design choices on convergence behavior, simulation cost, and Pareto-front characteristics has remained insufficiently explored. This paper reformulates the Multi-Objective Deep Deterministic Policy Gradient (MODDPG) approach not as a single fixed algorithm, but as a family of optimization workflows that share an identical multi-objective actor–critic learning core while systematically differing in their initialization strategy and environment evaluation mechanism. Within this unified formulation, three configurations are investigated: (i) a baseline MODDPG workflow with random initialization and direct SPICE evaluation, (ii) MODDPG-2, which employs analytically derived extreme solutions to guide early exploration, and (iii) MODDPG-3, which introduces an ANN-based pseudo-designer to generate boundary solutions directly from performance specifications. In addition, a Fully-ANN execution mode is examined, where an ANN-based pseudo-simulator replaces SPICE during policy learning to accelerate environment interaction. By preserving the same reinforcement learning architecture across all variants, the proposed framework isolates the effects of structured initialization and surrogate-based environments on optimization outcomes. The workflows are evaluated on three analog circuits (active-loaded differential amplifier, folded-cascode amplifier, and voltage comparator) and one RF circuit (CMOS cross-coupled LC oscillator), as well as on standard analytical benchmarks. Comparative results against NSGA-II and MOEA/D show that no single method universally dominates; however, the proposed workflows consistently reduce the number of required SPICE simulations by approximately 30%–75% while maintaining competitive Pareto-front quality. This efficiency gain indicates that the reinforcement-learning agent progressively acquires design intuition comparable to that of an experienced human designer—learning to avoid unpromising regions of the design space and focusing evaluations on high-value candidates. The results therefore demonstrate that, beyond the choice of learning algorithm, workflow-level design decisions critically shape how effectively RL can emulate expert design behavior, offering practical guidance for balancing solution quality and computational cost in automated analog and RF circuit design flows.
{"title":"MORL-IC: Multi-objective reinforcement learning approaches for analog integrated circuits optimization","authors":"Hakan Taşkıran, Engin Afacan","doi":"10.1016/j.vlsi.2026.102664","DOIUrl":"10.1016/j.vlsi.2026.102664","url":null,"abstract":"<div><div>Analog and RF integrated circuit (IC) design requires the simultaneous optimization of multiple, conflicting objectives under highly nonlinear and tightly coupled constraints. While prior studies — including our own — have demonstrated the feasibility of applying multi-objective reinforcement learning (MORL) to analog circuit optimization, the specific impact of workflow-level design choices on convergence behavior, simulation cost, and Pareto-front characteristics has remained insufficiently explored. This paper reformulates the Multi-Objective Deep Deterministic Policy Gradient (MODDPG) approach not as a single fixed algorithm, but as a family of optimization workflows that share an identical multi-objective actor–critic learning core while systematically differing in their initialization strategy and environment evaluation mechanism. Within this unified formulation, three configurations are investigated: (i) a baseline MODDPG workflow with random initialization and direct SPICE evaluation, (ii) MODDPG-2, which employs analytically derived extreme solutions to guide early exploration, and (iii) MODDPG-3, which introduces an ANN-based pseudo-designer to generate boundary solutions directly from performance specifications. In addition, a Fully-ANN execution mode is examined, where an ANN-based pseudo-simulator replaces SPICE during policy learning to accelerate environment interaction. By preserving the same reinforcement learning architecture across all variants, the proposed framework isolates the effects of structured initialization and surrogate-based environments on optimization outcomes. The workflows are evaluated on three analog circuits (active-loaded differential amplifier, folded-cascode amplifier, and voltage comparator) and one RF circuit (CMOS cross-coupled LC oscillator), as well as on standard analytical benchmarks. Comparative results against NSGA-II and MOEA/D show that no single method universally dominates; however, the proposed workflows consistently reduce the number of required SPICE simulations by approximately 30%–75% while maintaining competitive Pareto-front quality. This efficiency gain indicates that the reinforcement-learning agent progressively acquires design intuition comparable to that of an experienced human designer—learning to avoid unpromising regions of the design space and focusing evaluations on high-value candidates. The results therefore demonstrate that, beyond the choice of learning algorithm, workflow-level design decisions critically shape how effectively RL can emulate expert design behavior, offering practical guidance for balancing solution quality and computational cost in automated analog and RF circuit design flows.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102664"},"PeriodicalIF":2.5,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep Neural Network (DNN) accelerators require high computational throughput and flexible precision support while operating under stringent resource and power constraints. To address these challenges, we propose an adaptive-precision SIMD (Single Instruction, Multiple Data) Processing Element (PE) architecture for signed integer and fixed-point operations that maximizes resource utilization and enhances parallelism in multiply–accumulate (MAC) computations. The design introduces efficient resource reuse during partial product accumulation and supports both symmetric and asymmetric precision modes. Unlike state-of-the-art approaches, the proposed PE dynamically scales computation: processing 16 operands at low precision (4-bit), four operands at medium precision (8-bit), and a single operand at high precision (16-bit). Additionally, it supports asymmetric operations such as 16 4-bit multiplications in parallel, enabling unique flexibility and performance gains. The architecture is implemented and tested on ASIC and FPGA platforms. Accuracy evaluations across different DNN models and datasets show very small losses at reduced precision—less than 1% for LeNet on MNIST, 2.9% for AlexNet on CIFAR-10, 2.2% for VGG16 on CIFAR-10, and 3.5% for VGG16 on ImageNet-1000 compared to float32. Hardware synthesis yields significant improvements, including 46.2% fewer LUTs and 2.45 less power on FPGA compared to existing designs. The proposed architecture delivers 2 higher throughput, upto 4.8 energy efficiency with 28.57% less area at 65 nm, compared to existing works, making it ideal for applications with variable precision and limited resources.
{"title":"Adaptive-precision SIMD architecture for high-throughput and resource-efficient DNN acceleration","authors":"Vasundhara Trivedi , Harman Singh Bagga , Gopal Raut , Santosh Kumar Vishvakarma","doi":"10.1016/j.vlsi.2026.102666","DOIUrl":"10.1016/j.vlsi.2026.102666","url":null,"abstract":"<div><div>Deep Neural Network (DNN) accelerators require high computational throughput and flexible precision support while operating under stringent resource and power constraints. To address these challenges, we propose an adaptive-precision SIMD (Single Instruction, Multiple Data) Processing Element (PE) architecture for signed integer and fixed-point operations that maximizes resource utilization and enhances parallelism in multiply–accumulate (MAC) computations. The design introduces efficient resource reuse during partial product accumulation and supports both symmetric and asymmetric precision modes. Unlike state-of-the-art approaches, the proposed PE dynamically scales computation: processing 16 operands at low precision (4-bit), four operands at medium precision (8-bit), and a single operand at high precision (16-bit). Additionally, it supports asymmetric operations such as 16 <span><math><mo>×</mo></math></span> 4-bit multiplications in parallel, enabling unique flexibility and performance gains. The architecture is implemented and tested on ASIC and FPGA platforms. Accuracy evaluations across different DNN models and datasets show very small losses at reduced precision—less than 1% for LeNet on MNIST, 2.9% for AlexNet on CIFAR-10, 2.2% for VGG16 on CIFAR-10, and 3.5% for VGG16 on ImageNet-1000 compared to float32. Hardware synthesis yields significant improvements, including 46.2% fewer LUTs and 2.45 <span><math><mo>×</mo></math></span> less power on FPGA compared to existing designs. The proposed architecture delivers 2<span><math><mo>×</mo></math></span> higher throughput, upto 4.8<span><math><mo>×</mo></math></span> energy efficiency with 28.57% less area at 65 nm, compared to existing works, making it ideal for applications with variable precision and limited resources.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102666"},"PeriodicalIF":2.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.vlsi.2026.102669
Jie Zhang, Jiliang Lv, Nana Cheng, Liu Yang
Using a quadratic memristor as the connection weight between cells, a 4D memristive cellular neural network (CNN) chaotic system is constructed. Through a series of dynamical analyses, it is found that the system exhibits relatively rich dynamical characteristics. In this study, offset boosting is achieved by adding an offset parameter. Furthermore, the system’s amplitude control and attractor rotation are also realized. Based on the fundamental theory of attractor rotation, a multi-wing attractor transformation of the chaotic system is achieved. Additionally, circuit simulation software is used to design and implement a simulation circuit for the 4D memristive chaotic system, and the simulation results verified the physical feasibility of the constructed system. Finally, feedback control is implemented using the control principle. Validation confirms that the designed controller can effectively counteract external disturbances and stabilize the system output.
{"title":"Design of a memristive CNN chaotic system: From chaotic behavior analysis to circuit implementation and robust control","authors":"Jie Zhang, Jiliang Lv, Nana Cheng, Liu Yang","doi":"10.1016/j.vlsi.2026.102669","DOIUrl":"10.1016/j.vlsi.2026.102669","url":null,"abstract":"<div><div>Using a quadratic memristor as the connection weight between cells, a 4D memristive cellular neural network (CNN) chaotic system is constructed. Through a series of dynamical analyses, it is found that the system exhibits relatively rich dynamical characteristics. In this study, offset boosting is achieved by adding an offset parameter. Furthermore, the system’s amplitude control and attractor rotation are also realized. Based on the fundamental theory of attractor rotation, a multi-wing attractor transformation of the chaotic system is achieved. Additionally, circuit simulation software is used to design and implement a simulation circuit for the 4D memristive chaotic system, and the simulation results verified the physical feasibility of the constructed system. Finally, feedback control is implemented using the <span><math><msub><mrow><mi>H</mi></mrow><mrow><mi>∞</mi></mrow></msub></math></span> control principle. Validation confirms that the designed controller can effectively counteract external disturbances and stabilize the system output.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102669"},"PeriodicalIF":2.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1016/j.vlsi.2026.102667
Alejandro Silva-Juarez , Sergio A. Rosales-Nunez , Luis C. Alvarez-Simon , Gregorio Zamora-Mejia , Julio Hernandez-Perez , Victor H. Carbajal-Gomez , Jose M. Rocha-Perez , Alejandro I. Bautista-Castillo
This paper presents the design, Application-Specific Integrated Circuit (ASIC) fabrication, and experimental characterization of a fully digital Lorenz chaotic system in 180 nm CMOS technology. Motivated by the growing demand for high-performance chaotic generators in secure communications, the design originates from a rigorous mathematical formulation, verifying its chaotic behavior via the Kaplan–Yorke dimension estimate. The architecture is based on the forward Euler method, implemented with a 32-bit fixed-point datapath, a control finite-state machine, and integrated 32-to-1 serializers. We detail the complete digital design flow, from RTL to GDSII, using a unified synthesis and place-and-route methodology with the Synopsys Fusion Compiler toolchain. For characterization, the serializers stream the state variables , , and to an external FPGA, enabling subsequent transmission to a PC via a UART interface. The integrated circuit was fabricated and tested; experimental measurements successfully validated the chaotic behavior predicted by post-layout simulations. The final design occupies a cell area of 0.873 mm and exhibits an estimated static power of 1.02 mW. Post-layout timing reports indicate that the design meets its target frequencies, achieving a maximum operating frequency above 57 MHz (with a positive slack of +2.61 ns on the 50 MHz clock). The ASIC implementation demonstrates significant advantages in power, density, and speed potential over FPGA-based alternatives, positioning this design as a robust solution for embedded security systems.
{"title":"A 180-nm CMOS fully digital chaotic Lorenz system","authors":"Alejandro Silva-Juarez , Sergio A. Rosales-Nunez , Luis C. Alvarez-Simon , Gregorio Zamora-Mejia , Julio Hernandez-Perez , Victor H. Carbajal-Gomez , Jose M. Rocha-Perez , Alejandro I. Bautista-Castillo","doi":"10.1016/j.vlsi.2026.102667","DOIUrl":"10.1016/j.vlsi.2026.102667","url":null,"abstract":"<div><div>This paper presents the design, Application-Specific Integrated Circuit (ASIC) fabrication, and experimental characterization of a fully digital Lorenz chaotic system in 180 nm CMOS technology. Motivated by the growing demand for high-performance chaotic generators in secure communications, the design originates from a rigorous mathematical formulation, verifying its chaotic behavior via the Kaplan–Yorke dimension estimate. The architecture is based on the forward Euler method, implemented with a 32-bit fixed-point datapath, a control finite-state machine, and integrated 32-to-1 serializers. We detail the complete digital design flow, from RTL to GDSII, using a unified synthesis and place-and-route methodology with the Synopsys Fusion Compiler toolchain. For characterization, the serializers stream the state variables <span><math><mrow><mi>x</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow></mrow></math></span>, <span><math><mrow><mi>y</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow></mrow></math></span>, and <span><math><mrow><mi>z</mi><mrow><mo>(</mo><mi>n</mi><mo>)</mo></mrow></mrow></math></span> to an external FPGA, enabling subsequent transmission to a PC via a UART interface. The integrated circuit was fabricated and tested; experimental measurements successfully validated the chaotic behavior predicted by post-layout simulations. The final design occupies a cell area of 0.873<!--> <!-->mm<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span> and exhibits an estimated static power of 1.02<!--> <!-->mW. Post-layout timing reports indicate that the design meets its target frequencies, achieving a maximum operating frequency above 57<!--> <!-->MHz (with a positive slack of +2.61<!--> <!-->ns on the 50<!--> <!-->MHz clock). The ASIC implementation demonstrates significant advantages in power, density, and speed potential over FPGA-based alternatives, positioning this design as a robust solution for embedded security systems.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102667"},"PeriodicalIF":2.5,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a new design of a load-modulated balance doherty power amplifier (LM-BDPA) using a parallel coupled line (PCL) structure line coupler for 5G internet of things (IoT) applications. The proposed design aims to achieve high efficiency and wide bandwidth, which are critical for 5G communication systems. The LM-BDPA utilizes a PCL structure to enhance the load modulation capability, resulting in improved power-added efficiency (PAE) and linearity. The design also incorporates advanced thermal management techniques to ensure stable operation under high-power conditions. The performance of the proposed LM-BDPA is evaluated through both simulations and measurements. The fabricated prototype achieves a measured gain of 11.2–13.4 dB and a saturated output power of ∼41 dBm. A PAE of 46.4–56.5 % and 43.2–50.3 % is achieved at 6 dB and 8 dB output power back-off, respectively, across the designed frequency band.
{"title":"Design of a load modulated balance doherty power amplifier using parallel coupled line (PCL) structure line coupler for 5G IoT applications","authors":"Rajesh Kumar , Sachin Kumar , Binod Kumar Kanaujia","doi":"10.1016/j.vlsi.2026.102659","DOIUrl":"10.1016/j.vlsi.2026.102659","url":null,"abstract":"<div><div>This paper presents a new design of a load-modulated balance doherty power amplifier (LM-BDPA) using a parallel coupled line (PCL) structure line coupler for 5G internet of things (IoT) applications. The proposed design aims to achieve high efficiency and wide bandwidth, which are critical for 5G communication systems. The LM-BDPA utilizes a PCL structure to enhance the load modulation capability, resulting in improved power-added efficiency (PAE) and linearity. The design also incorporates advanced thermal management techniques to ensure stable operation under high-power conditions. The performance of the proposed LM-BDPA is evaluated through both simulations and measurements. The fabricated prototype achieves a measured gain of 11.2–13.4 dB and a saturated output power of ∼41 dBm. A PAE of 46.4–56.5 % and 43.2–50.3 % is achieved at 6 dB and 8 dB output power back-off, respectively, across the designed frequency band.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102659"},"PeriodicalIF":2.5,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.vlsi.2026.102665
Manikandan B , Karthikumar S
As digital architectures scale to unprecedented complexity, driven by emerging domains such as artificial intelligence, edge inference, and ultra-low-power systems, the strategic orchestration of clock signal delivery has become a cornerstone of integrated circuit design. Clock Tree Synthesis (CTS), a critical stage in the physical design flow, plays a vital role in maintaining synchronous operation, balancing timing constraints, managing dynamic and leakage power, and ensuring signal integrity under aggressive scaling and variable workloads. This review systematically dissects the evolution of CTS, beginning with classical methodologies centered on recursive trees, buffer insertion, and delay balancing, before exploring advanced solutions tailored for variability tolerance, power optimization, and architectural irregularity. In addition, the growing influence of machine learning and data-driven models that replace rigid rule sets with adaptive, layout-aware strategies offers predictive insights and multi-objective optimization throughout the design process. In addition, this study examined specialized use cases in security-conscious designs, aging-resilient circuits, photonic interconnects, and neuromorphic platforms, each demanding unique timing models and synthesis heuristics. This discussion culminates in a reflection on prevailing gaps, including the need for transparent ML integration, benchmark standardization, and holistic frameworks that bridge logical design with physical realization. This work offers a comprehensive perspective on the shifting paradigm of CTS, illuminating its central role in shaping high-performance, energy-efficient, and scalable silicon systems.
在人工智能、边缘推理和超低功耗系统等新兴领域的推动下,随着数字架构规模达到前所未有的复杂性,时钟信号传递的战略编排已成为集成电路设计的基石。时钟树合成(Clock Tree Synthesis, CTS)是物理设计流程中的关键阶段,在保持同步运行、平衡时序约束、管理动态和泄漏功率以及确保大规模缩放和可变工作负载下的信号完整性方面发挥着至关重要的作用。本文系统地剖析了CTS的发展,从以递归树、缓冲区插入和延迟平衡为中心的经典方法开始,然后探索为可变性容忍度、功率优化和架构不规则性量身定制的高级解决方案。此外,机器学习和数据驱动模型的影响力越来越大,它们用自适应的布局感知策略取代了严格的规则集,在整个设计过程中提供了预测性见解和多目标优化。此外,本研究还研究了安全意识设计、抗老化电路、光子互连和神经形态平台等方面的特殊用例,每个用例都需要独特的时序模型和综合启发式。讨论的高潮是对当前差距的反思,包括对透明ML集成、基准标准化和将逻辑设计与物理实现连接起来的整体框架的需求。这项工作为CTS的转变范式提供了一个全面的视角,阐明了其在塑造高性能,节能和可扩展的硅系统中的核心作用。
{"title":"Clock tree synthesis in modern VLSI: From foundational algorithms to AI-driven optimization","authors":"Manikandan B , Karthikumar S","doi":"10.1016/j.vlsi.2026.102665","DOIUrl":"10.1016/j.vlsi.2026.102665","url":null,"abstract":"<div><div>As digital architectures scale to unprecedented complexity, driven by emerging domains such as artificial intelligence, edge inference, and ultra-low-power systems, the strategic orchestration of clock signal delivery has become a cornerstone of integrated circuit design. Clock Tree Synthesis (CTS), a critical stage in the physical design flow, plays a vital role in maintaining synchronous operation, balancing timing constraints, managing dynamic and leakage power, and ensuring signal integrity under aggressive scaling and variable workloads. This review systematically dissects the evolution of CTS, beginning with classical methodologies centered on recursive trees, buffer insertion, and delay balancing, before exploring advanced solutions tailored for variability tolerance, power optimization, and architectural irregularity. In addition, the growing influence of machine learning and data-driven models that replace rigid rule sets with adaptive, layout-aware strategies offers predictive insights and multi-objective optimization throughout the design process. In addition, this study examined specialized use cases in security-conscious designs, aging-resilient circuits, photonic interconnects, and neuromorphic platforms, each demanding unique timing models and synthesis heuristics. This discussion culminates in a reflection on prevailing gaps, including the need for transparent ML integration, benchmark standardization, and holistic frameworks that bridge logical design with physical realization. This work offers a comprehensive perspective on the shifting paradigm of CTS, illuminating its central role in shaping high-performance, energy-efficient, and scalable silicon systems.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102665"},"PeriodicalIF":2.5,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1016/j.vlsi.2026.102663
Yuanfa Ji , Haihui Zhang , Xiyan Sun , Furong Jiang , Qiang Fu
With the continuous increase in chip integration density and reliability requirements, test data volume has grown significantly. At the same time, limitations of automatic test equipment in terms of physical I/O channel count, memory capacity, and data transmission bandwidth have further raised test costs. To address these challenges, this paper proposes a test data compression method based on sliding-window encoding. This approach identifies repeated sequences in the data to be encoded and replaces them with shorter codewords, thereby achieving effective compression. Furthermore, a match length reuse mechanism is introduced, which considerably enhances both codeword utilization efficiency and compression performance. Additionally, this paper systematically analyzes the impact of encoding parameters on the compression ratio, optimizes the encoding scheme considering hardware overhead, and designs a corresponding decompression architecture. Experimental results show that the proposed method achieves an average compression ratio of 66.86% on ISCAS’89 benchmark circuits. This provides an innovative and practical solution for test data compression.
{"title":"A test data compression method based on sliding-window encoding and matching length reuse","authors":"Yuanfa Ji , Haihui Zhang , Xiyan Sun , Furong Jiang , Qiang Fu","doi":"10.1016/j.vlsi.2026.102663","DOIUrl":"10.1016/j.vlsi.2026.102663","url":null,"abstract":"<div><div>With the continuous increase in chip integration density and reliability requirements, test data volume has grown significantly. At the same time, limitations of automatic test equipment in terms of physical I/O channel count, memory capacity, and data transmission bandwidth have further raised test costs. To address these challenges, this paper proposes a test data compression method based on sliding-window encoding. This approach identifies repeated sequences in the data to be encoded and replaces them with shorter codewords, thereby achieving effective compression. Furthermore, a match length reuse mechanism is introduced, which considerably enhances both codeword utilization efficiency and compression performance. Additionally, this paper systematically analyzes the impact of encoding parameters on the compression ratio, optimizes the encoding scheme considering hardware overhead, and designs a corresponding decompression architecture. Experimental results show that the proposed method achieves an average compression ratio of 66.86% on ISCAS’89 benchmark circuits. This provides an innovative and practical solution for test data compression.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102663"},"PeriodicalIF":2.5,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-11DOI: 10.1016/j.vlsi.2026.102655
Yingchun Lu , Hongliang Lu , Yujie Liu , Huaguo Liang , Zhengfeng Huang , Jinlin Chen , Xiumin Xu , Liang Yao
Strong Physical Unclonable Functions (PUFs) are vulnerable to modeling attacks using Machine Learning (ML), and PUF-based authentication protocols also face security risks. To address these issues, this paper proposes a PUF structure with resistance to modeling attacks based on Dynamic Obfuscation (DO), composed of Linear Feedback Shift Registers (LFSRs), PUFs, and several logic gates. The characteristics of DO are as follows: (1) the initial state of the LFSR is determined by the PUF's response, making it uncontrollable; (2) the updated state of the LFSR determines the obfuscated bit of each input challenge, achieving a dynamic mapping between challenges and responses. An Arbiter PUF (APUF) based on DO is implemented on Xilinx Artix-7 FPGA, and experimental results show that the structure can effectively resist modeling attacks from various ML algorithms, with prediction accuracy close to 50 %. In addition, this paper proposes a mutual authentication protocol based on PUF, suitable for Internet of Things (IoT) systems.
{"title":"Design of a dynamic obfuscation-based strong PUF resistant to modeling attacks and mutual authentication protocol","authors":"Yingchun Lu , Hongliang Lu , Yujie Liu , Huaguo Liang , Zhengfeng Huang , Jinlin Chen , Xiumin Xu , Liang Yao","doi":"10.1016/j.vlsi.2026.102655","DOIUrl":"10.1016/j.vlsi.2026.102655","url":null,"abstract":"<div><div>Strong Physical Unclonable Functions (PUFs) are vulnerable to modeling attacks using Machine Learning (ML), and PUF-based authentication protocols also face security risks. To address these issues, this paper proposes a PUF structure with resistance to modeling attacks based on Dynamic Obfuscation (DO), composed of Linear Feedback Shift Registers (LFSRs), PUFs, and several logic gates. The characteristics of DO are as follows: (1) the initial state of the LFSR is determined by the PUF's response, making it uncontrollable; (2) the updated state of the LFSR determines the obfuscated bit of each input challenge, achieving a dynamic mapping between challenges and responses. An Arbiter PUF (APUF) based on DO is implemented on Xilinx Artix-7 FPGA, and experimental results show that the structure can effectively resist modeling attacks from various ML algorithms, with prediction accuracy close to 50 %. In addition, this paper proposes a mutual authentication protocol based on PUF, suitable for Internet of Things (IoT) systems.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102655"},"PeriodicalIF":2.5,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145980161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.1016/j.vlsi.2026.102660
Prateek Goyal, Sujit Kumar Sahoo
This work introduces a novel Error-Optimized Hardware-Efficient Approximate Adder (EOHEAA) tailored for error-resilient computing tasks, where precision can be traded for improvements in energy, delay, and resource efficiency. The EOHEAA adopts a strategic method of controlled error propagation, enabling significant enhancement in accuracy metrics such as Mean Error Distance (MED), Mean Relative Error Distance (MRED), and Normalized MED (NMED), while maintaining minimal hardware overhead. Synthesized on the Artix-7 FPGA using Verilog HDL, EOHEAA achieves up to 38.6% reduction in power consumption, an 34% improvement in critical path delay, and notable savings in logic resources compared to conventional and state-of-the-art approximate adder designs. Comprehensive analysis across 8, 16, and 32-bit configurations further confirms its scalability and robustness, with PDP improvements reaching 71.5% in wider designs. Notably, EOHEAA outperforms several existing designs by achieving the lowest RMSE , minimum EDmax , and the highest accuracy-to-efficiency balance. ASIC-oriented design flow evaluation is further performed using Cadence Genus with predictive standard-cell libraries to analyze area, power, and timing behavior under advanced technology assumptions. To validate its real-world applicability, EOHEAA has been employed in Edge Detection and Color quantization using K-means clustering, both of which demonstrate high-quality outputs under relaxed accuracy constraints. Furthermore, a lightweight CNN-based validation framework is employed to examine the impact of approximate arithmetic on learning-based workloads, demonstrating that EOHEAA preserves inference accuracy while offering tangible energy and performance benefits. These results collectively position EOHEAA as a strong candidate for next-generation approximate arithmetic units in energy-aware image processing and machine-learning accelerators.
{"title":"EOHEAA: Error-Optimized Hardware-Efficient Approximate Adder for energy-aware error-resilient applications","authors":"Prateek Goyal, Sujit Kumar Sahoo","doi":"10.1016/j.vlsi.2026.102660","DOIUrl":"10.1016/j.vlsi.2026.102660","url":null,"abstract":"<div><div>This work introduces a novel Error-Optimized Hardware-Efficient Approximate Adder (EOHEAA) tailored for error-resilient computing tasks, where precision can be traded for improvements in energy, delay, and resource efficiency. The EOHEAA adopts a strategic method of controlled error propagation, enabling significant enhancement in accuracy metrics such as Mean Error Distance (MED), Mean Relative Error Distance (MRED), and <em>Normalized MED (NMED)</em>, while maintaining minimal hardware overhead. Synthesized on the Artix-7 FPGA <span><math><mrow><mo>(</mo><mi>X</mi><mi>C</mi><mn>7</mn><mi>A</mi><mn>35</mn><mi>T</mi><mo>−</mo><mn>1</mn><mi>C</mi><mi>P</mi><mi>G</mi><mn>236</mn><mi>C</mi><mo>)</mo></mrow></math></span> using Verilog HDL, EOHEAA achieves up to 38.6% reduction in power consumption, an 34% improvement in critical path delay, and notable savings in logic resources compared to conventional and state-of-the-art approximate adder designs. Comprehensive analysis across 8, 16, and 32-bit configurations further confirms its scalability and robustness, with PDP improvements reaching 71.5% in wider designs. Notably, EOHEAA outperforms several existing designs by achieving the lowest RMSE <span><math><mrow><mo>(</mo><mn>32</mn><mo>.</mo><mn>21</mn><mo>)</mo></mrow></math></span>, minimum ED<sub>max</sub> <span><math><mrow><mo>(</mo><mn>71</mn><mo>)</mo></mrow></math></span>, and the highest accuracy-to-efficiency balance. ASIC-oriented design flow evaluation is further performed using Cadence Genus with predictive standard-cell libraries to analyze area, power, and timing behavior under advanced technology assumptions. To validate its real-world applicability, EOHEAA has been employed in Edge Detection and Color quantization using K-means clustering, both of which demonstrate high-quality outputs under relaxed accuracy constraints. Furthermore, a lightweight CNN-based validation framework is employed to examine the impact of approximate arithmetic on learning-based workloads, demonstrating that EOHEAA preserves inference accuracy while offering tangible energy and performance benefits. These results collectively position EOHEAA as a strong candidate for next-generation approximate arithmetic units in energy-aware image processing and machine-learning accelerators.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"108 ","pages":"Article 102660"},"PeriodicalIF":2.5,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}