Pub Date : 2025-11-29DOI: 10.1016/j.vlsi.2025.102616
Juntao Jian, Yan Xing, Shuting Cai, Weijun Li, Xiaoming Xiong
Detailed-routability optimization methods for three-dimensional global routing typically employ a two-stage process involving initial routing and multi-level maze routing (iterative rip-up and reroute, or RRR iterations). Within the coarse-grained maze route planning of RRR iterations, the resource model and cost scheme are paramount for optimization quality. However, current advancements in these areas often overlook the dynamic nature of routing resources throughout RRR iterations and fail to consider routability features beyond congestion. To mitigate these limitations, this paper introduces a novel detailed-routability optimization approach that integrates a dynamic resource model and a routability-aware cost scheme. The proposed dynamic resource model accounts for routing resources’ sensitivity to both spatial information and the progression of RRR iterations. Moreover, the routability-aware cost scheme, derived from coarse-grained routability features, is designed to optimize fine-grained routability. Experimental results validate that our approach surpasses baseline detailed-routability-driven global routers, exhibiting superior optimization performance by concurrently enhancing routability and overall quality scores (a weighted summation of wirelength and routability metrics), alongside achieving significant runtime reduction.
{"title":"Optimizing detailed-routability for 3D global routing through dynamic resource model and routability-aware cost scheme","authors":"Juntao Jian, Yan Xing, Shuting Cai, Weijun Li, Xiaoming Xiong","doi":"10.1016/j.vlsi.2025.102616","DOIUrl":"10.1016/j.vlsi.2025.102616","url":null,"abstract":"<div><div>Detailed-routability optimization methods for three-dimensional global routing typically employ a two-stage process involving initial routing and multi-level maze routing (iterative rip-up and reroute, or RRR iterations). Within the coarse-grained maze route planning of RRR iterations, the resource model and cost scheme are paramount for optimization quality. However, current advancements in these areas often overlook the dynamic nature of routing resources throughout RRR iterations and fail to consider routability features beyond congestion. To mitigate these limitations, this paper introduces a novel detailed-routability optimization approach that integrates a dynamic resource model and a routability-aware cost scheme. The proposed dynamic resource model accounts for routing resources’ sensitivity to both spatial information and the progression of RRR iterations. Moreover, the routability-aware cost scheme, derived from coarse-grained routability features, is designed to optimize fine-grained routability. Experimental results validate that our approach surpasses baseline detailed-routability-driven global routers, exhibiting superior optimization performance by concurrently enhancing routability and overall quality scores (a weighted summation of wirelength and routability metrics), alongside achieving significant runtime reduction.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102616"},"PeriodicalIF":2.5,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29DOI: 10.1016/j.vlsi.2025.102622
Luis Gerardo de la Fraga , Esteban Tlelo-Cuautle
A Pseudo Random Number Generator (PRNG) produces a sequence whose randomness is evaluated by statistical tests like NIST and TestU01. The random sequences are deterministic and reproducible when using the same seed value. In this manner, and for cryptographic applications, the key size of a PRNG must be increased to resist brute force attacks. Henceforth, a fractional-order chaotic system, like the Lorenz one, is suitable to be used to design a PRNG, which implementation can be performed by using embedded devices such as the low-cost ESP32 (32-bit LX6 microprocessor) and field-programmable gate array (FPGA). To increase the throughput, the fractional Lorenz system is integrated with an approximated two steps Runge–Kutta method. An analysis in performed to find the domain of attraction for each state variable, and to verify that the PRNG produces non-correlated sequences. The hardware implementation is detailed by establishing the number of bits (or keys) required for the PRNG to guarantee its suitability for cryptographic applications. Finally, the hardware design of a PRNG using the fractional Lorenz system provides a throughput of 4.99 Mbits/s in the ESP32 platform, and 112.96 Mbits/s in the FPGA.
{"title":"Design insights for implementing a PRNG with fractional Lorenz system on ESP32 and FPGA","authors":"Luis Gerardo de la Fraga , Esteban Tlelo-Cuautle","doi":"10.1016/j.vlsi.2025.102622","DOIUrl":"10.1016/j.vlsi.2025.102622","url":null,"abstract":"<div><div>A Pseudo Random Number Generator (PRNG) produces a sequence whose randomness is evaluated by statistical tests like NIST and TestU01. The random sequences are deterministic and reproducible when using the same seed value. In this manner, and for cryptographic applications, the key size of a PRNG must be increased to resist brute force attacks. Henceforth, a fractional-order chaotic system, like the Lorenz one, is suitable to be used to design a PRNG, which implementation can be performed by using embedded devices such as the low-cost ESP32 (32-bit LX6 microprocessor) and field-programmable gate array (FPGA). To increase the throughput, the fractional Lorenz system is integrated with an approximated two steps Runge–Kutta method. An analysis in performed to find the domain of attraction for each state variable, and to verify that the PRNG produces non-correlated sequences. The hardware implementation is detailed by establishing the number of bits (or keys) required for the PRNG to guarantee its suitability for cryptographic applications. Finally, the hardware design of a PRNG using the fractional Lorenz system provides a throughput of 4.99 Mbits/s in the ESP32 platform, and 112.96 Mbits/s in the FPGA.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102622"},"PeriodicalIF":2.5,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.vlsi.2025.102620
Yiqi Zhou , Yanghui Wu , Daying Sun , Shan Shen , Xiong Cheng , Li Li
Multipliers dominate energy consumption in digital signal processing (DSP) systems, while approximate multipliers offer accuracy-efficiency trade-offs, many existing designs suffer from suboptimal energy efficiency. This paper presents an error compensation algorithm that minimizes the global error expectation (EE). By analyzing the error distribution across approximate compressor columns, the algorithm determines optimal compensation positions to reduce EE while maintaining low hardware overhead. Based on this approach, two high energy efficiency approximate multipliers (HEAMs) are proposed: HEAM_M1, optimized for high accuracy, and HEAM_M2, which incorporates a newly designed 4-1 approximate compressor for ultra-low power applications. Compared to an exact multiplier, HEAM_M1 and HEAM_M2 achieve power-delay product (PDP) reductions of 32% and 54%, respectively. Moreover, compared to prior approximate multipliers with similar PDP levels, HEAM_M1 reduces NMED and MRED by 80% and 83%, while HEAM_M2 achieves reductions of 70% and 86%, respectively. Application-level evaluations on image processing and neural network tasks further demonstrate the effectiveness and robustness of the proposed designs.
{"title":"Error expectation-driven design and energy optimization of approximate multipliers","authors":"Yiqi Zhou , Yanghui Wu , Daying Sun , Shan Shen , Xiong Cheng , Li Li","doi":"10.1016/j.vlsi.2025.102620","DOIUrl":"10.1016/j.vlsi.2025.102620","url":null,"abstract":"<div><div>Multipliers dominate energy consumption in digital signal processing (DSP) systems, while approximate multipliers offer accuracy-efficiency trade-offs, many existing designs suffer from suboptimal energy efficiency. This paper presents an error compensation algorithm that minimizes the global error expectation (EE). By analyzing the error distribution across approximate compressor columns, the algorithm determines optimal compensation positions to reduce EE while maintaining low hardware overhead. Based on this approach, two high energy efficiency approximate multipliers (HEAMs) are proposed: HEAM_M1, optimized for high accuracy, and HEAM_M2, which incorporates a newly designed 4-1 approximate compressor for ultra-low power applications. Compared to an exact multiplier, HEAM_M1 and HEAM_M2 achieve power-delay product (PDP) reductions of 32% and 54%, respectively. Moreover, compared to prior approximate multipliers with similar PDP levels, HEAM_M1 reduces NMED and MRED by 80% and 83%, while HEAM_M2 achieves reductions of 70% and 86%, respectively. Application-level evaluations on image processing and neural network tasks further demonstrate the effectiveness and robustness of the proposed designs.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102620"},"PeriodicalIF":2.5,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26DOI: 10.1016/j.vlsi.2025.102603
Josna Philomina , Rekha K. James , Shirshendu Das , Palash Das , Daleesha M Viswanathan
As Network-on-Chip (NoC) designs become essential in Tiled Chip Multicore Processor (TCMP) systems, it is increasingly important to protect NoC router communication from disruptions caused by hardware Trojans (HT). TCMPs often use intellectual property (IP) blocks from multiple vendors to design their NoC. This opens the door for untrusted vendors to compromise system security by inserting HTs into these IPs, which can alter the normal operation of NoC routers. These HTs are especially dangerous because they can remain undetected during the chip verification and testing stages. This paper explores how multiple HTs placed in the Route Computation Unit (RCU) of NoC routers can interfere with routing decisions, affect packet delivery, and harm overall system performance. We analyze the effects of these HTs using both synthetic traffic and real-world benchmarks, measuring their impact on latency, throughput and processor instructions per cycle (IPC). To address these issues, we introduce a solution called Neighbor-Supported Trojan-Aware Routing (NeSTAR). NeSTAR uses cooperation among neighboring routers to make routing decisions, helping the network to continue to function even when some RCUs are compromised. Our experimental results show that NeSTAR can reduce latency by 46%, improve throughput by 262%, lower packet deflected latency by 69%, and improve IPC by 37%, compared to the NoC affected by HT.
{"title":"NeSTAR: Hardware Trojans and its mitigation strategy in NoC routers","authors":"Josna Philomina , Rekha K. James , Shirshendu Das , Palash Das , Daleesha M Viswanathan","doi":"10.1016/j.vlsi.2025.102603","DOIUrl":"10.1016/j.vlsi.2025.102603","url":null,"abstract":"<div><div>As Network-on-Chip (NoC) designs become essential in Tiled Chip Multicore Processor (TCMP) systems, it is increasingly important to protect NoC router communication from disruptions caused by hardware Trojans (HT). TCMPs often use intellectual property (IP) blocks from multiple vendors to design their NoC. This opens the door for untrusted vendors to compromise system security by inserting HTs into these IPs, which can alter the normal operation of NoC routers. These HTs are especially dangerous because they can remain undetected during the chip verification and testing stages. This paper explores how multiple HTs placed in the Route Computation Unit (RCU) of NoC routers can interfere with routing decisions, affect packet delivery, and harm overall system performance. We analyze the effects of these HTs using both synthetic traffic and real-world benchmarks, measuring their impact on latency, throughput and processor instructions per cycle (IPC). To address these issues, we introduce a solution called Neighbor-Supported Trojan-Aware Routing (NeSTAR). NeSTAR uses cooperation among neighboring routers to make routing decisions, helping the network to continue to function even when some RCUs are compromised. Our experimental results show that NeSTAR can reduce latency by 46%, improve throughput by 262%, lower packet deflected latency by 69%, and improve IPC by 37%, compared to the NoC affected by HT.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102603"},"PeriodicalIF":2.5,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As Very Large-Scale Integration (VLSI) technology advances, the demand for reliable and scalable pre-silicon fault detection (FD) techniques continues to grow. Conventional diagnostic methods often face limitations in identifying subtle stuck-at faults within complex and high-dimensional test data. This study proposes a deep learning-based fault detection framework that integrates unsupervised and supervised learning to enhance fault identification and classification in combinational circuits. A Convolutional Autoencoder (CAE) is employed to extract spatial and structural features from circuit test patterns, effectively reducing dimensionality while preserving fault-related information. The encoded features are then classified using a Random Forest model for precise fault localization. The proposed framework is validated on ISCAS’85 benchmark circuits of different sizes and complexities, achieving fault detection accuracies ranging from 93 % to 100 %. Notably, when compared to existing models such as SSAE, VAE, and CEAE, which recorded accuracies between 83 % to 98 %, the proposed CAE-Random Forest framework consistently outperformed them across all benchmarks. Furthermore, the model exhibited stable convergence, low reconstruction error, and efficient memory usage of about 380–403 MB, ensuring reliable and scalable performance. Overall, these results demonstrate that the framework offers a robust, high-accuracy, and resource-efficient solution for automatic fault detection in digital VLSI circuits. It can also be effectively extended to more complex architectures for improved diagnostic reliability.
{"title":"Enhanced fault detection in digital VLSI circuits using convolutional autoencoders","authors":"Chandrasekhar Savalam , Sanjay Medisetti , Prasanti Korapati","doi":"10.1016/j.vlsi.2025.102608","DOIUrl":"10.1016/j.vlsi.2025.102608","url":null,"abstract":"<div><div>As Very Large-Scale Integration (VLSI) technology advances, the demand for reliable and scalable pre-silicon fault detection (FD) techniques continues to grow. Conventional diagnostic methods often face limitations in identifying subtle stuck-at faults within complex and high-dimensional test data. This study proposes a deep learning-based fault detection framework that integrates unsupervised and supervised learning to enhance fault identification and classification in combinational circuits. A Convolutional Autoencoder (CAE) is employed to extract spatial and structural features from circuit test patterns, effectively reducing dimensionality while preserving fault-related information. The encoded features are then classified using a Random Forest model for precise fault localization. The proposed framework is validated on ISCAS’85 benchmark circuits of different sizes and complexities, achieving fault detection accuracies ranging from 93 % to 100 %. Notably, when compared to existing models such as SSAE, VAE, and CEAE, which recorded accuracies between 83 % to 98 %, the proposed CAE-Random Forest framework consistently outperformed them across all benchmarks. Furthermore, the model exhibited stable convergence, low reconstruction error, and efficient memory usage of about 380–403 MB, ensuring reliable and scalable performance. Overall, these results demonstrate that the framework offers a robust, high-accuracy, and resource-efficient solution for automatic fault detection in digital VLSI circuits. It can also be effectively extended to more complex architectures for improved diagnostic reliability.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102608"},"PeriodicalIF":2.5,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-25DOI: 10.1016/j.vlsi.2025.102609
Yamane Soma , Sakai Yuwa , Riaz-ul-haque Mian
Wafer-level performance has garnered significant attention within the industry. In this study, to achieve accurate modeling in a multisite testing environment, we explore the potential of incorporating Semi-Supervised Progressive Self-Training techniques into Gaussian process regression. Our experimental results, based on industrial production test data, show that the proposed progressive self-training Semi-Supervised Model outperforms two state-of-the-art methods: The Hierarchical Gaussian Process Regression (HGP) model and the Active Learning-Based Gaussian Process Regression (AHGP) model. Specifically, the proposed method achieved 29% and 80% less errors compared to the HGP model and cluster-based (Two Step) method respectively with the similar training data. Furthermore, it reduced testing costs by 50% while maintaining accuracy levels comparable to state-of-the-art active learning (AHGP) based models in a multi-site testing environment.
{"title":"A progressive self-training semi-supervised model to enhance discontinuous change detection","authors":"Yamane Soma , Sakai Yuwa , Riaz-ul-haque Mian","doi":"10.1016/j.vlsi.2025.102609","DOIUrl":"10.1016/j.vlsi.2025.102609","url":null,"abstract":"<div><div>Wafer-level performance has garnered significant attention within the industry. In this study, to achieve accurate modeling in a multisite testing environment, we explore the potential of incorporating <em>Semi-Supervised Progressive Self-Training</em> techniques into Gaussian process regression. Our experimental results, based on industrial production test data, show that the proposed progressive self-training Semi-Supervised Model outperforms two state-of-the-art methods: The Hierarchical Gaussian Process Regression (HGP) model and the Active Learning-Based Gaussian Process Regression (AHGP) model. Specifically, the proposed method achieved 29% and 80% less errors compared to the HGP model and cluster-based (Two Step) method respectively with the similar training data. Furthermore, it reduced testing costs by 50% while maintaining accuracy levels comparable to state-of-the-art active learning (AHGP) based models in a multi-site testing environment.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102609"},"PeriodicalIF":2.5,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24DOI: 10.1016/j.vlsi.2025.102605
Sukhreet Kaur, Rita Mahajan, Deepak Bagai
This paper presents a novel hybrid 1-bit full adder design integrating 14 nm FinFET technology with a two-phase adiabatic logic family—Energy-Efficient Diode-Connected, DC-Biased Positive Feedback Adiabatic Logic (EE DC-DB PFAL)—and Modified Gate Diffusion Input (MGDI) logic. The proposed design achieves an average power consumption of 4.36 nW, a power-delay product of 1.26 aJ, and a transistor count of only 15, demonstrating significant energy efficiency and performance improvements compared to existing benchmark adder architectures. Transistor-level analysis, including Gm/Id considerations, validates optimized device sizing and energy-efficient switching. Layout and post-layout simulations confirm compact design and practical feasibility. The design demonstrates robustness under process, voltage, and temperature (PVT) variations, ensuring reliable operation across a wide range of operating conditions. Scalability across technology nodes from 7 nm to 20nm is demonstrated, and the methodology can be extended to multi-bit arithmetic units and full-scale ALUs. The proposed adder is particularly suitable for IoT edge nodes, wearable and biomedical devices, and portable communication processors.
{"title":"EE DC-DB PFAL: A novel two-phase adiabatic logic family for low-power 14 nm FinFET-Based hybrid full adders","authors":"Sukhreet Kaur, Rita Mahajan, Deepak Bagai","doi":"10.1016/j.vlsi.2025.102605","DOIUrl":"10.1016/j.vlsi.2025.102605","url":null,"abstract":"<div><div>This paper presents a novel hybrid 1-bit full adder design integrating 14 nm FinFET technology with a two-phase adiabatic logic family—Energy-Efficient Diode-Connected, DC-Biased Positive Feedback Adiabatic Logic (EE DC-DB PFAL)—and Modified Gate Diffusion Input (MGDI) logic. The proposed design achieves an average power consumption of 4.36 nW, a power-delay product of 1.26 aJ, and a transistor count of only 15, demonstrating significant energy efficiency and performance improvements compared to existing benchmark adder architectures. Transistor-level analysis, including Gm/Id considerations, validates optimized device sizing and energy-efficient switching. Layout and post-layout simulations confirm compact design and practical feasibility. The design demonstrates robustness under process, voltage, and temperature (PVT) variations, ensuring reliable operation across a wide range of operating conditions. Scalability across technology nodes from 7 nm to 20nm is demonstrated, and the methodology can be extended to multi-bit arithmetic units and full-scale ALUs. The proposed adder is particularly suitable for IoT edge nodes, wearable and biomedical devices, and portable communication processors.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102605"},"PeriodicalIF":2.5,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24DOI: 10.1016/j.vlsi.2025.102607
Ahmed S. Elwakil , Brent J. Maundy , Anis Allagui , Costas Psychalinos
In this work, we show how higher-order low-pass, high-pass, band-pass or band-stop filter functions can be systematically obtained using the Mittag-Leffler (ML) function as a basic building block. In particular, by multiplying (i.e. cascading) two or more ML functions; each having the form of a two-parameter ML function () with its argument z being equal to or s ( is the complex frequency; i.e. ), higher-order fractional filters can be obtained. We focus here on the cascade of two ML functions, which produces second-order transfer functions (as a special case) when and . We derive in closed form the impulse response and step-response of these filters and experimentally verify their behavior after approximating the ML function using a suitable integer-order approximation. It is worth mentioning that this class of filters does not employ the fractional-order Laplace operator (), unlike classical fractional-order filters.
{"title":"Higher-order filters based on the Mittag-Leffler function","authors":"Ahmed S. Elwakil , Brent J. Maundy , Anis Allagui , Costas Psychalinos","doi":"10.1016/j.vlsi.2025.102607","DOIUrl":"10.1016/j.vlsi.2025.102607","url":null,"abstract":"<div><div>In this work, we show how higher-order low-pass, high-pass, band-pass or band-stop filter functions can be systematically obtained using the Mittag-Leffler (ML) function as a basic building block. In particular, by multiplying (i.e. cascading) two or more ML functions; each having the form of a two-parameter ML function <span><math><mrow><msub><mrow><mi>E</mi></mrow><mrow><mi>α</mi><mo>,</mo><mi>β</mi></mrow></msub><mrow><mo>(</mo><mi>z</mi><mo>)</mo></mrow></mrow></math></span> (<span><math><mrow><mn>0</mn><mo>≤</mo><mi>α</mi><mo>,</mo><mi>β</mi><mo>≤</mo><mn>1</mn></mrow></math></span>) with its argument z being equal to <span><math><mrow><mo>−</mo><mi>s</mi></mrow></math></span> or <span><math><mrow><mo>−</mo><mn>1</mn><mo>/</mo></mrow></math></span>s (<span><math><mi>s</mi></math></span> is the complex frequency; i.e. <span><math><mrow><mi>s</mi><mo>=</mo><mi>j</mi><mi>ω</mi></mrow></math></span>), higher-order fractional filters can be obtained. We focus here on the cascade of two ML functions, which produces second-order transfer functions (as a special case) when <span><math><mrow><mi>α</mi><mo>=</mo><mn>0</mn></mrow></math></span> and <span><math><mrow><mi>β</mi><mo>=</mo><mn>1</mn></mrow></math></span>. We derive in closed form the impulse response and step-response of these filters and experimentally verify their behavior after approximating the ML function using a suitable integer-order approximation. It is worth mentioning that this class of filters <em>does not</em> employ the fractional-order Laplace operator <span><math><msup><mrow><mi>s</mi></mrow><mrow><mo>±</mo><mi>γ</mi></mrow></msup></math></span> (<span><math><mrow><mn>0</mn><mo><</mo><mi>γ</mi><mo>≤</mo><mn>1</mn></mrow></math></span>), unlike classical fractional-order filters.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102607"},"PeriodicalIF":2.5,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-20DOI: 10.1016/j.vlsi.2025.102604
Dekai Sun , Zhang Zhang , Wenyan Liu , Hongbin Yang , Yi Lu , Yonghong Zeng , Biao Zhang , Lianjie Lu
Implementing an artificial intelligence algorithm requires a lot of calculation, but the calculation process needs a lot of data migration, which consumes a lot of energy and time. In-memory computing is a promising paradigm to ease this limitation. XNOR-Network is an effective acceleration technique and has been widely applied in in-memory computing SRAM macro. Current in-memory computing SRAM macro for XNOR-Network has challenges in flexibility and reliability. To overcome these challenges, this paper proposes a differential in-memory computing 12T SRAM macro for XNOR-Network. The proposed SRAM macro eliminates the issue of memory information flipping that occurs during XNOR-and-accumulate operations. Moreover, it is capable of supporting XNOR-and-accumulate operations of varying sizes. Additionally, the XNOR-and-accumulate result can be read out quickly by the sensitive amplifier for its sign or read out by the Flash ADC for its multi-bit quantized value. The proposed architecture has an energy efficiency of 98.6TOPS/W and a recognition rate of 97.06% for MNIST data set.
{"title":"A differential in-memory computing 12T SRAM macro with enhanced flexibility and reliability for XNOR-network","authors":"Dekai Sun , Zhang Zhang , Wenyan Liu , Hongbin Yang , Yi Lu , Yonghong Zeng , Biao Zhang , Lianjie Lu","doi":"10.1016/j.vlsi.2025.102604","DOIUrl":"10.1016/j.vlsi.2025.102604","url":null,"abstract":"<div><div>Implementing an artificial intelligence algorithm requires a lot of calculation, but the calculation process needs a lot of data migration, which consumes a lot of energy and time. In-memory computing is a promising paradigm to ease this limitation. XNOR-Network is an effective acceleration technique and has been widely applied in in-memory computing SRAM macro. Current in-memory computing SRAM macro for XNOR-Network has challenges in flexibility and reliability. To overcome these challenges, this paper proposes a differential in-memory computing 12T SRAM macro for XNOR-Network. The proposed SRAM macro eliminates the issue of memory information flipping that occurs during XNOR-and-accumulate operations. Moreover, it is capable of supporting XNOR-and-accumulate operations of varying sizes. Additionally, the XNOR-and-accumulate result can be read out quickly by the sensitive amplifier for its sign or read out by the Flash ADC for its multi-bit quantized value. The proposed architecture has an energy efficiency of 98.6TOPS/W and a recognition rate of 97.06% for MNIST data set.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102604"},"PeriodicalIF":2.5,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arithmetic circuits are the fundamental building blocks of circuitry with applications including digital signal processing, cryptography processors, and multimedia. Integer multiplier circuits with high bit width of the operands dominate the extensive circuitry area of new-generation technologies. Traditionally, various multiplication algorithms are available to generate multiplier circuits considering area, delay, and power. Custom optimization is performed to reduce the circuit size, which increases the probability of logical bugs in the design. In the past over thirty years, prominent formal verification techniques such as Satisfiability (SAT) checking, Binary Decision Diagram (BDD), and Symbolic Computer Algebra (SCA) made massive progress in analyzing the correctness of the circuits. In this paper, we study the best state-of-the-art techniques from each method available in the academic domain and perform a comparative analysis to verify integer multiplier circuits with different architectures after logic optimization. Although the complexity of BDDs is constantly exponential with the input size of the circuit, and BDDs can be constructed only up to 18 bits, the method is robust to verify a variety of multiplier structures. Algebraic backward rewriting based on Symbolic Computer Algebra (SCA) facilitates the formal verification of high-bit-width multiplier circuits. Conventional approaches that leverage hierarchical structural information are constrained to algebraic-friendly multipliers, wherein adder sub-circuits are preserved in their canonical form, an assumption often invalidated post logic synthesis and optimization. In contrast, advanced algebraic techniques that operate directly on flattened net-lists demonstrate scalability and robustness in verifying large multiplier designs. Formal analysis with straightforward SAT techniques does not work well for comparing two structural non-similar circuits, which is often the case after applying logic optimization. If the degree of similarity is not excessively low, SAT-Sweeping can effectively reduce structural non-similarity, and SAT techniques can verify multipliers up to 512 bits. However, the verification of complex circuits, characterized by their non-algebraic-friendly nature, near-zero similarity to reference circuits, and larger input sizes, remains an open challenge.
{"title":"A comparative study on formal verification techniques to verify large integer multiplier circuits","authors":"Jitendra Kumar , Asutosh Srivastava , Masahiro Fujita","doi":"10.1016/j.vlsi.2025.102606","DOIUrl":"10.1016/j.vlsi.2025.102606","url":null,"abstract":"<div><div>Arithmetic circuits are the fundamental building blocks of circuitry with applications including digital signal processing, cryptography processors, and multimedia. Integer multiplier circuits with high bit width of the operands dominate the extensive circuitry area of new-generation technologies. Traditionally, various multiplication algorithms are available to generate multiplier circuits considering area, delay, and power. Custom optimization is performed to reduce the circuit size, which increases the probability of logical bugs in the design. In the past over thirty years, prominent formal verification techniques such as Satisfiability (SAT) checking, Binary Decision Diagram (BDD), and Symbolic Computer Algebra (SCA) made massive progress in analyzing the correctness of the circuits. In this paper, we study the best state-of-the-art techniques from each method available in the academic domain and perform a comparative analysis to verify integer multiplier circuits with different architectures after logic optimization. Although the complexity of BDDs is constantly exponential with the input size of the circuit, and BDDs can be constructed only up to 18 bits, the method is robust to verify a variety of multiplier structures. Algebraic backward rewriting based on Symbolic Computer Algebra (SCA) facilitates the formal verification of high-bit-width multiplier circuits. Conventional approaches that leverage hierarchical structural information are constrained to algebraic-friendly multipliers, wherein adder sub-circuits are preserved in their canonical form, an assumption often invalidated post logic synthesis and optimization. In contrast, advanced algebraic techniques that operate directly on flattened net-lists demonstrate scalability and robustness in verifying large multiplier designs. Formal analysis with straightforward SAT techniques does not work well for comparing two structural non-similar circuits, which is often the case after applying logic optimization. If the degree of similarity is not excessively low, SAT-Sweeping can effectively reduce structural non-similarity, and SAT techniques can verify multipliers up to 512 bits. However, the verification of complex circuits, characterized by their non-algebraic-friendly nature, near-zero similarity to reference circuits, and larger input sizes, remains an open challenge.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102606"},"PeriodicalIF":2.5,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145572012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}