Pub Date : 2025-12-06DOI: 10.1016/j.vlsi.2025.102627
R. Sindhu, V. Arunachalam
The activation functions (AF) such as and are essential in a Long-Short Term Memory (LSTM) cell for time series classification using a Recurrent Neural Network (RNN). These AFs regulate the data flow effectively and optimize memory requirements in LSTM cells. Hardware realizations of these AFs are complex; consequently, approximation strategies must be adopted. The piece-wise linearization (PWL) method is appropriate for hardware implementations. A 7-segment PWL-based approximate , is proposed here. Employing a MATLAB-based error analysis, an optimum fixed-point data format (1-bit sign, 2-bit integer, 8-bit fraction) is chosen. The function is implemented with parallel segment selection and two 10-bit adders using TSMC 65 nm technology libraries. This architecture uses 356.4 μm2 area and consumes 230.7 μW at 1.67 GHz. Later, an approximate , is implemented using the module with two shifters, a complement and an 11-bit adder. It uses a 462.4 μm2 area and consumes 324.2 μW power at 1.25 GHz. An approximate LSTM cell with the proposed and functions are modelled using Python 3.2 and tested with the Italian Parkinson's dataset. The approximate LSTM cell produces closer classification metrics with a maximum deviation of 0.21 % from the exact LSTM cell.
{"title":"Hardware efficient approximate activation functions for a Long-Short-Term Memory cell","authors":"R. Sindhu, V. Arunachalam","doi":"10.1016/j.vlsi.2025.102627","DOIUrl":"10.1016/j.vlsi.2025.102627","url":null,"abstract":"<div><div>The activation functions (AF) such as <span><math><mrow><mi>s</mi><mi>i</mi><mi>g</mi><mi>m</mi><mi>o</mi><mi>i</mi><mi>d</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span> and <span><math><mrow><mi>tanh</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span> are essential in a Long-Short Term Memory (LSTM) cell for time series classification using a Recurrent Neural Network (RNN). These AFs regulate the data flow effectively and optimize memory requirements in LSTM cells. Hardware realizations of these AFs are complex; consequently, approximation strategies must be adopted. The piece-wise linearization (PWL) method is appropriate for hardware implementations. A 7-segment PWL-based approximate <span><math><mrow><mi>tanh</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span>, <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> is proposed here. Employing a MATLAB-based error analysis, an optimum fixed-point data format (1-bit sign, 2-bit integer, 8-bit fraction) is chosen. The function <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> is implemented with parallel segment selection and two 10-bit adders using TSMC 65 nm technology libraries. This architecture uses 356.4 μm<sup>2</sup> area and consumes 230.7 μW at 1.67 GHz. Later, an approximate <span><math><mrow><mi>s</mi><mi>i</mi><mi>g</mi><mi>m</mi><mi>o</mi><mi>i</mi><mi>d</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span>, <span><math><mrow><mi>σ</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> is implemented using the <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> module with two shifters, a complement and an 11-bit adder. It uses a 462.4 μm<sup>2</sup> area and consumes 324.2 μW power at 1.25 GHz. An approximate LSTM cell with the proposed <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> and <span><math><mrow><mi>σ</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> functions are modelled using Python 3.2 and tested with the Italian Parkinson's dataset. The approximate LSTM cell produces closer classification metrics with a maximum deviation of 0.21 % from the exact LSTM cell.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102627"},"PeriodicalIF":2.5,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This high-speed, power-efficient content addressable memory (CAM) uses parallel lookups to match quickly without sacrificing power consumption. It introduces three key contributions: i. Pre-charge free operation, which improves search speed and reduces power requirements by eliminating node charging time, ii. A Hybrid Match Line (HML) structure that strategically balances power and delay, combining the high-speed attributes of NOR with the low-power attributes of NAND, and iii. Local searching technique ascertain further improvement in search time. Performance indicators improve greatly when these methods are seamlessly integrated. Utilizing 45 nm CMOS technology, the design supports diverse process voltages, temperatures, and frequencies for a 64x32 memory array. Monte Carlo simulations verify design stability. The proposed architecture outperforms the leading benchmark in speed and power-delay-product (PDP) by 54.6% and 76.02%, respectively. This novel design can do repeated data searches at frequencies up to 2 GHz after a single write operation, enabling quicker and more energy-efficient data processing that could revolutionize consumer electronics. This development could revolutionize consumer electronics by improving efficiency and speed in high-performance computing, mobile devices, and IoT applications.
{"title":"Achieving superior segmented CAM efficiency with pre-charge free local search based hybrid matcher for high speed applications","authors":"Shyamosree Goswami , Adwait Wakankar , Partha Bhattacharyya , Anup Dandapat","doi":"10.1016/j.vlsi.2025.102621","DOIUrl":"10.1016/j.vlsi.2025.102621","url":null,"abstract":"<div><div>This high-speed, power-efficient content addressable memory (CAM) uses parallel lookups to match quickly without sacrificing power consumption. It introduces three key contributions: i. Pre-charge free operation, which improves search speed and reduces power requirements by eliminating node charging time, ii. A Hybrid Match Line (HML) structure that strategically balances power and delay, combining the high-speed attributes of NOR with the low-power attributes of NAND, and iii. Local searching technique ascertain further improvement in search time. Performance indicators improve greatly when these methods are seamlessly integrated. Utilizing 45 nm CMOS technology, the design supports diverse process voltages, temperatures, and frequencies for a 64x32 memory array. Monte Carlo simulations verify design stability. The proposed architecture outperforms the leading benchmark in speed and power-delay-product (PDP) by 54.6% and 76.02%, respectively. This novel design can do repeated data searches at frequencies up to 2 GHz after a single write operation, enabling quicker and more energy-efficient data processing that could revolutionize consumer electronics. This development could revolutionize consumer electronics by improving efficiency and speed in high-performance computing, mobile devices, and IoT applications.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102621"},"PeriodicalIF":2.5,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02DOI: 10.1016/j.vlsi.2025.102625
Angelos Athanasiadis , Nikolaos Tampouratzis , Ioannis Papaefstathiou
The growing demand for real-time processing in artificial intelligence applications, particularly those involving Convolutional Neural Networks (CNNs), has highlighted the need for efficient computational solutions. Conventional processors and graphical processing units (GPUs), very often, fall short in balancing performance, power consumption, and latency, especially in embedded systems and edge computing platforms. Field-Programmable Gate Arrays (FPGAs) offer a promising alternative, combining high performance with energy efficiency and reconfigurability. This paper presents a design and implementation framework for implementing CNNs seamlessly on FPGAs that maintains full precision in all neural network parameters thus addressing a niche, that of non-quantized NNs. The presented framework extends Darknet, which is very widely used for the design of CNNs, and allows the designer, by effectively using a Darknet NN description, to efficiently implement CNNs in a heterogeneous system comprising of CPUs and FPGAs. Our framework is evaluated on the implementation of a number of different CNNs and as part of a real world application utilizing UAVs; in all cases it outperforms the CPU and GPU systems in terms of performance and/or power consumption. When compared with the FPGA frameworks that support quantization, our solution offers similar performance and/or energy efficiency without any degradation on the NN accuracy.
{"title":"An efficient open-source design and implementation framework for non-quantized CNNs on FPGAs","authors":"Angelos Athanasiadis , Nikolaos Tampouratzis , Ioannis Papaefstathiou","doi":"10.1016/j.vlsi.2025.102625","DOIUrl":"10.1016/j.vlsi.2025.102625","url":null,"abstract":"<div><div>The growing demand for real-time processing in artificial intelligence applications, particularly those involving Convolutional Neural Networks (CNNs), has highlighted the need for efficient computational solutions. Conventional processors and graphical processing units (GPUs), very often, fall short in balancing performance, power consumption, and latency, especially in embedded systems and edge computing platforms. Field-Programmable Gate Arrays (FPGAs) offer a promising alternative, combining high performance with energy efficiency and reconfigurability. This paper presents a design and implementation framework for implementing CNNs seamlessly on FPGAs that maintains full precision in all neural network parameters thus addressing a niche, that of non-quantized NNs. The presented framework extends Darknet, which is very widely used for the design of CNNs, and allows the designer, by effectively using a Darknet NN description, to efficiently implement CNNs in a heterogeneous system comprising of CPUs and FPGAs. Our framework is evaluated on the implementation of a number of different CNNs and as part of a real world application utilizing UAVs; in all cases it outperforms the CPU and GPU systems in terms of performance and/or power consumption. When compared with the FPGA frameworks that support quantization, our solution offers similar performance and/or energy efficiency without any degradation on the NN accuracy.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102625"},"PeriodicalIF":2.5,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The deposition of dielectric thin film in semiconductor fabrication is significantly influenced by process parameter configuration. Traditional optimization via experiments or multi-physics simulations is costly, time-consuming, and lacks flexibility. Data-driven methods that leverage production line sensor data provide a promising alternative. This work proposes a machine learning modeling framework for studying the nonlinear correlation between dielectric deposition parameters and film thickness distribution. The proposed approach is validated using historical High-Density Plasma Chemical Vapor Deposition (HDPCVD) process data collected from production runs and demonstrates strong predictive performance across multiple technology nodes. This framework achieves strong predictive performance in thin film thickness ( = 0.92) and enables practical assessment of specification compliance, achieving 79.5% accuracy in determining whether predicted thicknesses lie within the node–specific tolerances at the 14 nm node. The results suggest that data-driven modeling offers a practical, scalable, and efficient solution for process monitoring and optimization in advanced semiconductor fabrication.
{"title":"Machine-learning-driven prediction of thin film parameters for optimizing the dielectric deposition in semiconductor fabrication","authors":"Hao Wen , Enda Zhao , Qiyue Zhang , Ruofei Xiang , Wenjian Yu","doi":"10.1016/j.vlsi.2025.102617","DOIUrl":"10.1016/j.vlsi.2025.102617","url":null,"abstract":"<div><div>The deposition of dielectric thin film in semiconductor fabrication is significantly influenced by process parameter configuration. Traditional optimization via experiments or multi-physics simulations is costly, time-consuming, and lacks flexibility. Data-driven methods that leverage production line sensor data provide a promising alternative. This work proposes a machine learning modeling framework for studying the nonlinear correlation between dielectric deposition parameters and film thickness distribution. The proposed approach is validated using historical High-Density Plasma Chemical Vapor Deposition (HDPCVD) process data collected from production runs and demonstrates strong predictive performance across multiple technology nodes. This framework achieves strong predictive performance in thin film thickness (<span><math><msup><mrow><mo>R</mo></mrow><mrow><mn>2</mn></mrow></msup></math></span> = 0.92) and enables practical assessment of specification compliance, achieving 79.5% accuracy in determining whether predicted thicknesses lie within the node–specific tolerances at the 14 nm node. The results suggest that data-driven modeling offers a practical, scalable, and efficient solution for process monitoring and optimization in advanced semiconductor fabrication.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102617"},"PeriodicalIF":2.5,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02DOI: 10.1016/j.vlsi.2025.102623
Yong Zhang , Wen-Jie Li , Guo-Jing Ge , Jin-Qiao Wang , Bo-Wen Jia , Ning Xu
The A∗ algorithm is one of the most common analog integrated circuit (IC) routing techniques. As the number of nets increases, the routing order of this heuristic routing algorithm will affect the routing results immensely. Currently, artificial intelligence (AI) technologies are widely applied in IC physical design to accelerate layout design. In this paper, we propose a reinforcement model based on net order selection. We construct multi-channel images of routing data and extract features of the coordinates of routing pins through an attention mechanism. After training, the model outputs an optimized net order, which is then used to perform routing with a bidirectional A∗ algorithm, thereby improving both the speed and efficiency of the routing process. Experimental results on cases based on 130-nm and 180-nm processes show that the proposed method can achieve a 2.5 % reduction in wire length and a 3.7 % decrease in the number of vias compared to state-of-the-art methods for analog IC routing. In terms of computational efficiency, the bidirectional A∗ algorithm improves performance by 7.3 % over the unidirectional A∗ algorithm in decision-making scenarios and by 51.09 % in the path-planning process. Simulation results further demonstrate that, compared with manual and advanced automation methods, the overall performance of the layout achieved by our method aligns most closely with schematic performance.
A *算法是最常见的模拟集成电路(IC)路由技术之一。随着网络数量的增加,启发式路由算法的路由顺序将极大地影响路由结果。目前,人工智能(AI)技术被广泛应用于集成电路物理设计中,以加速版图设计。本文提出了一种基于网络顺序选择的强化模型。我们构建多通道路由数据图像,并通过注意机制提取路由引脚的坐标特征。经过训练后,该模型输出一个优化后的净顺序,然后使用双向a *算法执行路由,从而提高了路由过程的速度和效率。基于130纳米和180纳米工艺的实验结果表明,与最先进的模拟IC布线方法相比,所提出的方法可以减少2.5%的线长和3.7%的过孔数量。在计算效率方面,双向A∗算法在决策场景中的性能比单向A∗算法提高了7.3%,在路径规划过程中提高了51.09%。仿真结果进一步表明,与人工和先进的自动化方法相比,该方法实现的布局总体性能与原理图性能最接近。
{"title":"Reinforcement learning-driven net order selection for efficient analog IC routing","authors":"Yong Zhang , Wen-Jie Li , Guo-Jing Ge , Jin-Qiao Wang , Bo-Wen Jia , Ning Xu","doi":"10.1016/j.vlsi.2025.102623","DOIUrl":"10.1016/j.vlsi.2025.102623","url":null,"abstract":"<div><div>The A∗ algorithm is one of the most common analog integrated circuit (IC) routing techniques. As the number of nets increases, the routing order of this heuristic routing algorithm will affect the routing results immensely. Currently, artificial intelligence (AI) technologies are widely applied in IC physical design to accelerate layout design. In this paper, we propose a reinforcement model based on net order selection. We construct multi-channel images of routing data and extract features of the coordinates of routing pins through an attention mechanism. After training, the model outputs an optimized net order, which is then used to perform routing with a bidirectional A∗ algorithm, thereby improving both the speed and efficiency of the routing process. Experimental results on cases based on 130-nm and 180-nm processes show that the proposed method can achieve a 2.5 % reduction in wire length and a 3.7 % decrease in the number of vias compared to state-of-the-art methods for analog IC routing. In terms of computational efficiency, the bidirectional A∗ algorithm improves performance by 7.3 % over the unidirectional A∗ algorithm in decision-making scenarios and by 51.09 % in the path-planning process. Simulation results further demonstrate that, compared with manual and advanced automation methods, the overall performance of the layout achieved by our method aligns most closely with schematic performance.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102623"},"PeriodicalIF":2.5,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1016/j.vlsi.2025.102624
Yan Xing, Zicheng Deng, Shuting Cai, Weijun Li, Xiaoming Xiong
Existing routability-driven global placers typically employed an iterative routability optimization process and performed cell inflation based only on lookahead congestion maps during each run. However, this incremental application of congestion estimation and mitigation resulted in placement solutions that deviate from optimal wirelength, thus compromising the optimization objective of balancing wirelength minimization and routability optimization. To simultaneously improve routability and reduce wirelength, this paper proposes a novel routability–wirelength co-guided cell inflation approach for global placement optimization. It employs a multi-task learning-based feature selection method, MTL-FS, to identify the optimal feature subset and train the corresponding routability–wirelength co-learning model, RWNet. During the iterative optimization process, both routability and wirelength are predicted using RWNet, and their correlation is interpreted by DeepSHAP to produce three impact maps. Subsequently, routability–wirelength co-guided cell inflation (RWCI) is performed based on an adjusted congestion map, which is derived from the predicted congestion map and the three impact maps. The experimental results on ISPD2011 and DAC2012 benchmark designs demonstrate that, compared to DERAMPlace and RoutePlacer (which represent non-machine learning-based and machine learning-based routability-driven placers, respectively), the proposed approach achieves both better optimization quality, specifically improved routability and reduced wirelength, and a decreased time cost. Moreover, the extension experiment shows our method consistently outperforms DREAMPlace (even when it uses 2D feature maps as proxies) in effectiveness while maintaining comparable efficiency. The Generalization experiment further confirms this superiority and comparable runtime, particularly in highly congested scenarios.
{"title":"Routability–wirelength co-guided cell inflation with explainable multi-task learning for global placement optimization","authors":"Yan Xing, Zicheng Deng, Shuting Cai, Weijun Li, Xiaoming Xiong","doi":"10.1016/j.vlsi.2025.102624","DOIUrl":"10.1016/j.vlsi.2025.102624","url":null,"abstract":"<div><div>Existing routability-driven global placers typically employed an iterative routability optimization process and performed cell inflation based only on lookahead congestion maps during each run. However, this incremental application of congestion estimation and mitigation resulted in placement solutions that deviate from optimal wirelength, thus compromising the optimization objective of balancing wirelength minimization and routability optimization. To simultaneously improve routability and reduce wirelength, this paper proposes a novel routability–wirelength co-guided cell inflation approach for global placement optimization. It employs a multi-task learning-based feature selection method, MTL-FS, to identify the optimal feature subset and train the corresponding routability–wirelength co-learning model, RWNet. During the iterative optimization process, both routability and wirelength are predicted using RWNet, and their correlation is interpreted by DeepSHAP to produce three impact maps. Subsequently, routability–wirelength co-guided cell inflation (RWCI) is performed based on an adjusted congestion map, which is derived from the predicted congestion map and the three impact maps. The experimental results on ISPD2011 and DAC2012 benchmark designs demonstrate that, compared to DERAMPlace and RoutePlacer (which represent non-machine learning-based and machine learning-based routability-driven placers, respectively), the proposed approach achieves both better optimization quality, specifically improved routability and reduced wirelength, and a decreased time cost. Moreover, the extension experiment shows our method consistently outperforms DREAMPlace (even when it uses 2D feature maps as proxies) in effectiveness while maintaining comparable efficiency. The Generalization experiment further confirms this superiority and comparable runtime, particularly in highly congested scenarios.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102624"},"PeriodicalIF":2.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29DOI: 10.1016/j.vlsi.2025.102616
Juntao Jian, Yan Xing, Shuting Cai, Weijun Li, Xiaoming Xiong
Detailed-routability optimization methods for three-dimensional global routing typically employ a two-stage process involving initial routing and multi-level maze routing (iterative rip-up and reroute, or RRR iterations). Within the coarse-grained maze route planning of RRR iterations, the resource model and cost scheme are paramount for optimization quality. However, current advancements in these areas often overlook the dynamic nature of routing resources throughout RRR iterations and fail to consider routability features beyond congestion. To mitigate these limitations, this paper introduces a novel detailed-routability optimization approach that integrates a dynamic resource model and a routability-aware cost scheme. The proposed dynamic resource model accounts for routing resources’ sensitivity to both spatial information and the progression of RRR iterations. Moreover, the routability-aware cost scheme, derived from coarse-grained routability features, is designed to optimize fine-grained routability. Experimental results validate that our approach surpasses baseline detailed-routability-driven global routers, exhibiting superior optimization performance by concurrently enhancing routability and overall quality scores (a weighted summation of wirelength and routability metrics), alongside achieving significant runtime reduction.
{"title":"Optimizing detailed-routability for 3D global routing through dynamic resource model and routability-aware cost scheme","authors":"Juntao Jian, Yan Xing, Shuting Cai, Weijun Li, Xiaoming Xiong","doi":"10.1016/j.vlsi.2025.102616","DOIUrl":"10.1016/j.vlsi.2025.102616","url":null,"abstract":"<div><div>Detailed-routability optimization methods for three-dimensional global routing typically employ a two-stage process involving initial routing and multi-level maze routing (iterative rip-up and reroute, or RRR iterations). Within the coarse-grained maze route planning of RRR iterations, the resource model and cost scheme are paramount for optimization quality. However, current advancements in these areas often overlook the dynamic nature of routing resources throughout RRR iterations and fail to consider routability features beyond congestion. To mitigate these limitations, this paper introduces a novel detailed-routability optimization approach that integrates a dynamic resource model and a routability-aware cost scheme. The proposed dynamic resource model accounts for routing resources’ sensitivity to both spatial information and the progression of RRR iterations. Moreover, the routability-aware cost scheme, derived from coarse-grained routability features, is designed to optimize fine-grained routability. Experimental results validate that our approach surpasses baseline detailed-routability-driven global routers, exhibiting superior optimization performance by concurrently enhancing routability and overall quality scores (a weighted summation of wirelength and routability metrics), alongside achieving significant runtime reduction.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102616"},"PeriodicalIF":2.5,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29DOI: 10.1016/j.vlsi.2025.102622
Luis Gerardo de la Fraga , Esteban Tlelo-Cuautle
A Pseudo Random Number Generator (PRNG) produces a sequence whose randomness is evaluated by statistical tests like NIST and TestU01. The random sequences are deterministic and reproducible when using the same seed value. In this manner, and for cryptographic applications, the key size of a PRNG must be increased to resist brute force attacks. Henceforth, a fractional-order chaotic system, like the Lorenz one, is suitable to be used to design a PRNG, which implementation can be performed by using embedded devices such as the low-cost ESP32 (32-bit LX6 microprocessor) and field-programmable gate array (FPGA). To increase the throughput, the fractional Lorenz system is integrated with an approximated two steps Runge–Kutta method. An analysis in performed to find the domain of attraction for each state variable, and to verify that the PRNG produces non-correlated sequences. The hardware implementation is detailed by establishing the number of bits (or keys) required for the PRNG to guarantee its suitability for cryptographic applications. Finally, the hardware design of a PRNG using the fractional Lorenz system provides a throughput of 4.99 Mbits/s in the ESP32 platform, and 112.96 Mbits/s in the FPGA.
{"title":"Design insights for implementing a PRNG with fractional Lorenz system on ESP32 and FPGA","authors":"Luis Gerardo de la Fraga , Esteban Tlelo-Cuautle","doi":"10.1016/j.vlsi.2025.102622","DOIUrl":"10.1016/j.vlsi.2025.102622","url":null,"abstract":"<div><div>A Pseudo Random Number Generator (PRNG) produces a sequence whose randomness is evaluated by statistical tests like NIST and TestU01. The random sequences are deterministic and reproducible when using the same seed value. In this manner, and for cryptographic applications, the key size of a PRNG must be increased to resist brute force attacks. Henceforth, a fractional-order chaotic system, like the Lorenz one, is suitable to be used to design a PRNG, which implementation can be performed by using embedded devices such as the low-cost ESP32 (32-bit LX6 microprocessor) and field-programmable gate array (FPGA). To increase the throughput, the fractional Lorenz system is integrated with an approximated two steps Runge–Kutta method. An analysis in performed to find the domain of attraction for each state variable, and to verify that the PRNG produces non-correlated sequences. The hardware implementation is detailed by establishing the number of bits (or keys) required for the PRNG to guarantee its suitability for cryptographic applications. Finally, the hardware design of a PRNG using the fractional Lorenz system provides a throughput of 4.99 Mbits/s in the ESP32 platform, and 112.96 Mbits/s in the FPGA.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102622"},"PeriodicalIF":2.5,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.vlsi.2025.102620
Yiqi Zhou , Yanghui Wu , Daying Sun , Shan Shen , Xiong Cheng , Li Li
Multipliers dominate energy consumption in digital signal processing (DSP) systems, while approximate multipliers offer accuracy-efficiency trade-offs, many existing designs suffer from suboptimal energy efficiency. This paper presents an error compensation algorithm that minimizes the global error expectation (EE). By analyzing the error distribution across approximate compressor columns, the algorithm determines optimal compensation positions to reduce EE while maintaining low hardware overhead. Based on this approach, two high energy efficiency approximate multipliers (HEAMs) are proposed: HEAM_M1, optimized for high accuracy, and HEAM_M2, which incorporates a newly designed 4-1 approximate compressor for ultra-low power applications. Compared to an exact multiplier, HEAM_M1 and HEAM_M2 achieve power-delay product (PDP) reductions of 32% and 54%, respectively. Moreover, compared to prior approximate multipliers with similar PDP levels, HEAM_M1 reduces NMED and MRED by 80% and 83%, while HEAM_M2 achieves reductions of 70% and 86%, respectively. Application-level evaluations on image processing and neural network tasks further demonstrate the effectiveness and robustness of the proposed designs.
{"title":"Error expectation-driven design and energy optimization of approximate multipliers","authors":"Yiqi Zhou , Yanghui Wu , Daying Sun , Shan Shen , Xiong Cheng , Li Li","doi":"10.1016/j.vlsi.2025.102620","DOIUrl":"10.1016/j.vlsi.2025.102620","url":null,"abstract":"<div><div>Multipliers dominate energy consumption in digital signal processing (DSP) systems, while approximate multipliers offer accuracy-efficiency trade-offs, many existing designs suffer from suboptimal energy efficiency. This paper presents an error compensation algorithm that minimizes the global error expectation (EE). By analyzing the error distribution across approximate compressor columns, the algorithm determines optimal compensation positions to reduce EE while maintaining low hardware overhead. Based on this approach, two high energy efficiency approximate multipliers (HEAMs) are proposed: HEAM_M1, optimized for high accuracy, and HEAM_M2, which incorporates a newly designed 4-1 approximate compressor for ultra-low power applications. Compared to an exact multiplier, HEAM_M1 and HEAM_M2 achieve power-delay product (PDP) reductions of 32% and 54%, respectively. Moreover, compared to prior approximate multipliers with similar PDP levels, HEAM_M1 reduces NMED and MRED by 80% and 83%, while HEAM_M2 achieves reductions of 70% and 86%, respectively. Application-level evaluations on image processing and neural network tasks further demonstrate the effectiveness and robustness of the proposed designs.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102620"},"PeriodicalIF":2.5,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26DOI: 10.1016/j.vlsi.2025.102603
Josna Philomina , Rekha K. James , Shirshendu Das , Palash Das , Daleesha M Viswanathan
As Network-on-Chip (NoC) designs become essential in Tiled Chip Multicore Processor (TCMP) systems, it is increasingly important to protect NoC router communication from disruptions caused by hardware Trojans (HT). TCMPs often use intellectual property (IP) blocks from multiple vendors to design their NoC. This opens the door for untrusted vendors to compromise system security by inserting HTs into these IPs, which can alter the normal operation of NoC routers. These HTs are especially dangerous because they can remain undetected during the chip verification and testing stages. This paper explores how multiple HTs placed in the Route Computation Unit (RCU) of NoC routers can interfere with routing decisions, affect packet delivery, and harm overall system performance. We analyze the effects of these HTs using both synthetic traffic and real-world benchmarks, measuring their impact on latency, throughput and processor instructions per cycle (IPC). To address these issues, we introduce a solution called Neighbor-Supported Trojan-Aware Routing (NeSTAR). NeSTAR uses cooperation among neighboring routers to make routing decisions, helping the network to continue to function even when some RCUs are compromised. Our experimental results show that NeSTAR can reduce latency by 46%, improve throughput by 262%, lower packet deflected latency by 69%, and improve IPC by 37%, compared to the NoC affected by HT.
{"title":"NeSTAR: Hardware Trojans and its mitigation strategy in NoC routers","authors":"Josna Philomina , Rekha K. James , Shirshendu Das , Palash Das , Daleesha M Viswanathan","doi":"10.1016/j.vlsi.2025.102603","DOIUrl":"10.1016/j.vlsi.2025.102603","url":null,"abstract":"<div><div>As Network-on-Chip (NoC) designs become essential in Tiled Chip Multicore Processor (TCMP) systems, it is increasingly important to protect NoC router communication from disruptions caused by hardware Trojans (HT). TCMPs often use intellectual property (IP) blocks from multiple vendors to design their NoC. This opens the door for untrusted vendors to compromise system security by inserting HTs into these IPs, which can alter the normal operation of NoC routers. These HTs are especially dangerous because they can remain undetected during the chip verification and testing stages. This paper explores how multiple HTs placed in the Route Computation Unit (RCU) of NoC routers can interfere with routing decisions, affect packet delivery, and harm overall system performance. We analyze the effects of these HTs using both synthetic traffic and real-world benchmarks, measuring their impact on latency, throughput and processor instructions per cycle (IPC). To address these issues, we introduce a solution called Neighbor-Supported Trojan-Aware Routing (NeSTAR). NeSTAR uses cooperation among neighboring routers to make routing decisions, helping the network to continue to function even when some RCUs are compromised. Our experimental results show that NeSTAR can reduce latency by 46%, improve throughput by 262%, lower packet deflected latency by 69%, and improve IPC by 37%, compared to the NoC affected by HT.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102603"},"PeriodicalIF":2.5,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}