首页 > 最新文献

Integration-The Vlsi Journal最新文献

英文 中文
Hardware efficient approximate activation functions for a Long-Short-Term Memory cell 长短期记忆单元的硬件高效近似激活函数
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-06 DOI: 10.1016/j.vlsi.2025.102627
R. Sindhu, V. Arunachalam
The activation functions (AF) such as sigmoid(x) and tanh(x) are essential in a Long-Short Term Memory (LSTM) cell for time series classification using a Recurrent Neural Network (RNN). These AFs regulate the data flow effectively and optimize memory requirements in LSTM cells. Hardware realizations of these AFs are complex; consequently, approximation strategies must be adopted. The piece-wise linearization (PWL) method is appropriate for hardware implementations. A 7-segment PWL-based approximate tanh(x), t(x8) is proposed here. Employing a MATLAB-based error analysis, an optimum fixed-point data format (1-bit sign, 2-bit integer, 8-bit fraction) is chosen. The function t(x8) is implemented with parallel segment selection and two 10-bit adders using TSMC 65 nm technology libraries. This architecture uses 356.4 μm2 area and consumes 230.7 μW at 1.67 GHz. Later, an approximate sigmoid(x), σ(x8) is implemented using the t(x8) module with two shifters, a complement and an 11-bit adder. It uses a 462.4 μm2 area and consumes 324.2 μW power at 1.25 GHz. An approximate LSTM cell with the proposed t(x8) and σ(x8) functions are modelled using Python 3.2 and tested with the Italian Parkinson's dataset. The approximate LSTM cell produces closer classification metrics with a maximum deviation of 0.21 % from the exact LSTM cell.
激活函数(AF)如sigmoid(x)和tanh(x)是使用递归神经网络(RNN)进行时间序列分类的长短期记忆(LSTM)单元中必不可少的。这些af有效地调节数据流并优化LSTM单元中的内存需求。这些af的硬件实现是复杂的;因此,必须采用近似策略。分段线性化(PWL)方法适用于硬件实现。这里提出了一个基于7段pwl的近似tanh(x), t(x8)。通过matlab误差分析,选择了最佳的定点数据格式(1位符号,2位整数,8位分数)。函数t(x8)是通过并行段选择和两个使用台积电65nm技术库的10位加法器实现的。该架构占用356.4 μm2的面积,1.67 GHz时功耗为230.7 μW。随后,使用t(x8)模块实现了一个近似的sigmoid(x), σ(x8),其中包含两个移位器,一个补码和一个11位加法器。它占地462.4 μm2,在1.25 GHz时功耗为324.2 μW。使用Python 3.2对具有提议的t(x8)和σ(x8)函数的近似LSTM单元进行建模,并使用意大利帕金森病数据集进行测试。近似LSTM单元产生更接近的分类指标,与精确LSTM单元的最大偏差为0.21%。
{"title":"Hardware efficient approximate activation functions for a Long-Short-Term Memory cell","authors":"R. Sindhu,&nbsp;V. Arunachalam","doi":"10.1016/j.vlsi.2025.102627","DOIUrl":"10.1016/j.vlsi.2025.102627","url":null,"abstract":"<div><div>The activation functions (AF) such as <span><math><mrow><mi>s</mi><mi>i</mi><mi>g</mi><mi>m</mi><mi>o</mi><mi>i</mi><mi>d</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span> and <span><math><mrow><mi>tanh</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span> are essential in a Long-Short Term Memory (LSTM) cell for time series classification using a Recurrent Neural Network (RNN). These AFs regulate the data flow effectively and optimize memory requirements in LSTM cells. Hardware realizations of these AFs are complex; consequently, approximation strategies must be adopted. The piece-wise linearization (PWL) method is appropriate for hardware implementations. A 7-segment PWL-based approximate <span><math><mrow><mi>tanh</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span>, <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> is proposed here. Employing a MATLAB-based error analysis, an optimum fixed-point data format (1-bit sign, 2-bit integer, 8-bit fraction) is chosen. The function <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> is implemented with parallel segment selection and two 10-bit adders using TSMC 65 nm technology libraries. This architecture uses 356.4 μm<sup>2</sup> area and consumes 230.7 μW at 1.67 GHz. Later, an approximate <span><math><mrow><mi>s</mi><mi>i</mi><mi>g</mi><mi>m</mi><mi>o</mi><mi>i</mi><mi>d</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mrow></math></span>, <span><math><mrow><mi>σ</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> is implemented using the <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> module with two shifters, a complement and an 11-bit adder. It uses a 462.4 μm<sup>2</sup> area and consumes 324.2 μW power at 1.25 GHz. An approximate LSTM cell with the proposed <span><math><mrow><mi>t</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> and <span><math><mrow><mi>σ</mi><mrow><mo>(</mo><msub><mi>x</mi><mn>8</mn></msub><mo>)</mo></mrow></mrow></math></span> functions are modelled using Python 3.2 and tested with the Italian Parkinson's dataset. The approximate LSTM cell produces closer classification metrics with a maximum deviation of 0.21 % from the exact LSTM cell.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102627"},"PeriodicalIF":2.5,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Achieving superior segmented CAM efficiency with pre-charge free local search based hybrid matcher for high speed applications 实现卓越的分段凸轮效率与预收费免费本地搜索为基础的混合匹配器高速应用
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-06 DOI: 10.1016/j.vlsi.2025.102621
Shyamosree Goswami , Adwait Wakankar , Partha Bhattacharyya , Anup Dandapat
This high-speed, power-efficient content addressable memory (CAM) uses parallel lookups to match quickly without sacrificing power consumption. It introduces three key contributions: i. Pre-charge free operation, which improves search speed and reduces power requirements by eliminating node charging time, ii. A Hybrid Match Line (HML) structure that strategically balances power and delay, combining the high-speed attributes of NOR with the low-power attributes of NAND, and iii. Local searching technique ascertain further improvement in search time. Performance indicators improve greatly when these methods are seamlessly integrated. Utilizing 45 nm CMOS technology, the design supports diverse process voltages, temperatures, and frequencies for a 64x32 memory array. Monte Carlo simulations verify design stability. The proposed architecture outperforms the leading benchmark in speed and power-delay-product (PDP) by 54.6% and 76.02%, respectively. This novel design can do repeated data searches at frequencies up to 2 GHz after a single write operation, enabling quicker and more energy-efficient data processing that could revolutionize consumer electronics. This development could revolutionize consumer electronics by improving efficiency and speed in high-performance computing, mobile devices, and IoT applications.
这种高速、节能的内容可寻址内存(CAM)使用并行查找来快速匹配,而不会牺牲功耗。它引入了三个关键贡献:1 .预充电自由操作,通过消除节点充电时间,提高搜索速度并降低功耗要求;混合匹配线(HML)结构,战略性地平衡了功率和延迟,结合了NOR的高速属性和NAND的低功耗属性;局部搜索技术进一步提高了搜索时间。当这些方法无缝集成时,性能指标将大大提高。该设计采用45纳米CMOS技术,支持64x32存储器阵列的各种工艺电压、温度和频率。蒙特卡罗仿真验证了设计的稳定性。该架构在速度和功率延迟积(PDP)方面分别优于领先基准54.6%和76.02%。这种新颖的设计可以在单次写入操作后以高达2 GHz的频率进行重复数据搜索,从而实现更快、更节能的数据处理,这可能会给消费电子产品带来革命性的变化。这一发展可以通过提高高性能计算、移动设备和物联网应用的效率和速度来彻底改变消费电子产品。
{"title":"Achieving superior segmented CAM efficiency with pre-charge free local search based hybrid matcher for high speed applications","authors":"Shyamosree Goswami ,&nbsp;Adwait Wakankar ,&nbsp;Partha Bhattacharyya ,&nbsp;Anup Dandapat","doi":"10.1016/j.vlsi.2025.102621","DOIUrl":"10.1016/j.vlsi.2025.102621","url":null,"abstract":"<div><div>This high-speed, power-efficient content addressable memory (CAM) uses parallel lookups to match quickly without sacrificing power consumption. It introduces three key contributions: i. Pre-charge free operation, which improves search speed and reduces power requirements by eliminating node charging time, ii. A Hybrid Match Line (HML) structure that strategically balances power and delay, combining the high-speed attributes of NOR with the low-power attributes of NAND, and iii. Local searching technique ascertain further improvement in search time. Performance indicators improve greatly when these methods are seamlessly integrated. Utilizing 45 nm CMOS technology, the design supports diverse process voltages, temperatures, and frequencies for a 64x32 memory array. Monte Carlo simulations verify design stability. The proposed architecture outperforms the leading benchmark in speed and power-delay-product (PDP) by 54.6% and 76.02%, respectively. This novel design can do repeated data searches at frequencies up to 2 GHz after a single write operation, enabling quicker and more energy-efficient data processing that could revolutionize consumer electronics. This development could revolutionize consumer electronics by improving efficiency and speed in high-performance computing, mobile devices, and IoT applications.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102621"},"PeriodicalIF":2.5,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An efficient open-source design and implementation framework for non-quantized CNNs on FPGAs fpga上非量化cnn的高效开源设计与实现框架
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-02 DOI: 10.1016/j.vlsi.2025.102625
Angelos Athanasiadis , Nikolaos Tampouratzis , Ioannis Papaefstathiou
The growing demand for real-time processing in artificial intelligence applications, particularly those involving Convolutional Neural Networks (CNNs), has highlighted the need for efficient computational solutions. Conventional processors and graphical processing units (GPUs), very often, fall short in balancing performance, power consumption, and latency, especially in embedded systems and edge computing platforms. Field-Programmable Gate Arrays (FPGAs) offer a promising alternative, combining high performance with energy efficiency and reconfigurability. This paper presents a design and implementation framework for implementing CNNs seamlessly on FPGAs that maintains full precision in all neural network parameters thus addressing a niche, that of non-quantized NNs. The presented framework extends Darknet, which is very widely used for the design of CNNs, and allows the designer, by effectively using a Darknet NN description, to efficiently implement CNNs in a heterogeneous system comprising of CPUs and FPGAs. Our framework is evaluated on the implementation of a number of different CNNs and as part of a real world application utilizing UAVs; in all cases it outperforms the CPU and GPU systems in terms of performance and/or power consumption. When compared with the FPGA frameworks that support quantization, our solution offers similar performance and/or energy efficiency without any degradation on the NN accuracy.
人工智能应用中对实时处理的需求日益增长,特别是涉及卷积神经网络(cnn)的应用,突出了对高效计算解决方案的需求。传统的处理器和图形处理单元(gpu)通常无法平衡性能、功耗和延迟,特别是在嵌入式系统和边缘计算平台中。现场可编程门阵列(fpga)提供了一种很有前途的替代方案,结合了高性能、能效和可重构性。本文提出了一种在fpga上无缝实现cnn的设计和实现框架,该框架在所有神经网络参数中保持完全的精度,从而解决了非量化nn的利基问题。该框架扩展了目前广泛应用于cnn设计的Darknet,并允许设计者通过有效地使用Darknet NN描述,在由cpu和fpga组成的异构系统中有效地实现cnn。我们的框架在许多不同cnn的实施上进行评估,并作为利用无人机的现实世界应用的一部分;在所有情况下,它在性能和/或功耗方面都优于CPU和GPU系统。与支持量化的FPGA框架相比,我们的解决方案提供了类似的性能和/或能源效率,而不会降低神经网络的精度。
{"title":"An efficient open-source design and implementation framework for non-quantized CNNs on FPGAs","authors":"Angelos Athanasiadis ,&nbsp;Nikolaos Tampouratzis ,&nbsp;Ioannis Papaefstathiou","doi":"10.1016/j.vlsi.2025.102625","DOIUrl":"10.1016/j.vlsi.2025.102625","url":null,"abstract":"<div><div>The growing demand for real-time processing in artificial intelligence applications, particularly those involving Convolutional Neural Networks (CNNs), has highlighted the need for efficient computational solutions. Conventional processors and graphical processing units (GPUs), very often, fall short in balancing performance, power consumption, and latency, especially in embedded systems and edge computing platforms. Field-Programmable Gate Arrays (FPGAs) offer a promising alternative, combining high performance with energy efficiency and reconfigurability. This paper presents a design and implementation framework for implementing CNNs seamlessly on FPGAs that maintains full precision in all neural network parameters thus addressing a niche, that of non-quantized NNs. The presented framework extends Darknet, which is very widely used for the design of CNNs, and allows the designer, by effectively using a Darknet NN description, to efficiently implement CNNs in a heterogeneous system comprising of CPUs and FPGAs. Our framework is evaluated on the implementation of a number of different CNNs and as part of a real world application utilizing UAVs; in all cases it outperforms the CPU and GPU systems in terms of performance and/or power consumption. When compared with the FPGA frameworks that support quantization, our solution offers similar performance and/or energy efficiency without any degradation on the NN accuracy.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102625"},"PeriodicalIF":2.5,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine-learning-driven prediction of thin film parameters for optimizing the dielectric deposition in semiconductor fabrication 机器学习驱动的薄膜参数预测,用于优化半导体制造中的介电沉积
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-02 DOI: 10.1016/j.vlsi.2025.102617
Hao Wen , Enda Zhao , Qiyue Zhang , Ruofei Xiang , Wenjian Yu
The deposition of dielectric thin film in semiconductor fabrication is significantly influenced by process parameter configuration. Traditional optimization via experiments or multi-physics simulations is costly, time-consuming, and lacks flexibility. Data-driven methods that leverage production line sensor data provide a promising alternative. This work proposes a machine learning modeling framework for studying the nonlinear correlation between dielectric deposition parameters and film thickness distribution. The proposed approach is validated using historical High-Density Plasma Chemical Vapor Deposition (HDPCVD) process data collected from production runs and demonstrates strong predictive performance across multiple technology nodes. This framework achieves strong predictive performance in thin film thickness (R2 = 0.92) and enables practical assessment of specification compliance, achieving 79.5% accuracy in determining whether predicted thicknesses lie within the node–specific tolerances at the 14 nm node. The results suggest that data-driven modeling offers a practical, scalable, and efficient solution for process monitoring and optimization in advanced semiconductor fabrication.
在半导体制造中,介质薄膜的沉积受到工艺参数配置的显著影响。传统的通过实验或多物理场模拟进行优化既昂贵又耗时,而且缺乏灵活性。利用生产线传感器数据的数据驱动方法提供了一个有希望的替代方案。本文提出了一种机器学习建模框架,用于研究介电沉积参数与薄膜厚度分布之间的非线性相关性。利用从生产运行中收集的高密度等离子体化学气相沉积(HDPCVD)工艺数据验证了所提出的方法,并在多个技术节点上展示了强大的预测性能。该框架在薄膜厚度方面实现了强大的预测性能(R2 = 0.92),并实现了规范符合性的实际评估,在确定预测厚度是否在14 nm节点的节点特定公差范围内时,准确率达到79.5%。结果表明,数据驱动建模为先进半导体制造过程监控和优化提供了实用、可扩展和高效的解决方案。
{"title":"Machine-learning-driven prediction of thin film parameters for optimizing the dielectric deposition in semiconductor fabrication","authors":"Hao Wen ,&nbsp;Enda Zhao ,&nbsp;Qiyue Zhang ,&nbsp;Ruofei Xiang ,&nbsp;Wenjian Yu","doi":"10.1016/j.vlsi.2025.102617","DOIUrl":"10.1016/j.vlsi.2025.102617","url":null,"abstract":"<div><div>The deposition of dielectric thin film in semiconductor fabrication is significantly influenced by process parameter configuration. Traditional optimization via experiments or multi-physics simulations is costly, time-consuming, and lacks flexibility. Data-driven methods that leverage production line sensor data provide a promising alternative. This work proposes a machine learning modeling framework for studying the nonlinear correlation between dielectric deposition parameters and film thickness distribution. The proposed approach is validated using historical High-Density Plasma Chemical Vapor Deposition (HDPCVD) process data collected from production runs and demonstrates strong predictive performance across multiple technology nodes. This framework achieves strong predictive performance in thin film thickness (<span><math><msup><mrow><mo>R</mo></mrow><mrow><mn>2</mn></mrow></msup></math></span> = 0.92) and enables practical assessment of specification compliance, achieving 79.5% accuracy in determining whether predicted thicknesses lie within the node–specific tolerances at the 14 nm node. The results suggest that data-driven modeling offers a practical, scalable, and efficient solution for process monitoring and optimization in advanced semiconductor fabrication.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102617"},"PeriodicalIF":2.5,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement learning-driven net order selection for efficient analog IC routing 基于强化学习的高效模拟IC路由网络选择
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-02 DOI: 10.1016/j.vlsi.2025.102623
Yong Zhang , Wen-Jie Li , Guo-Jing Ge , Jin-Qiao Wang , Bo-Wen Jia , Ning Xu
The A∗ algorithm is one of the most common analog integrated circuit (IC) routing techniques. As the number of nets increases, the routing order of this heuristic routing algorithm will affect the routing results immensely. Currently, artificial intelligence (AI) technologies are widely applied in IC physical design to accelerate layout design. In this paper, we propose a reinforcement model based on net order selection. We construct multi-channel images of routing data and extract features of the coordinates of routing pins through an attention mechanism. After training, the model outputs an optimized net order, which is then used to perform routing with a bidirectional A∗ algorithm, thereby improving both the speed and efficiency of the routing process. Experimental results on cases based on 130-nm and 180-nm processes show that the proposed method can achieve a 2.5 % reduction in wire length and a 3.7 % decrease in the number of vias compared to state-of-the-art methods for analog IC routing. In terms of computational efficiency, the bidirectional A∗ algorithm improves performance by 7.3 % over the unidirectional A∗ algorithm in decision-making scenarios and by 51.09 % in the path-planning process. Simulation results further demonstrate that, compared with manual and advanced automation methods, the overall performance of the layout achieved by our method aligns most closely with schematic performance.
A *算法是最常见的模拟集成电路(IC)路由技术之一。随着网络数量的增加,启发式路由算法的路由顺序将极大地影响路由结果。目前,人工智能(AI)技术被广泛应用于集成电路物理设计中,以加速版图设计。本文提出了一种基于网络顺序选择的强化模型。我们构建多通道路由数据图像,并通过注意机制提取路由引脚的坐标特征。经过训练后,该模型输出一个优化后的净顺序,然后使用双向a *算法执行路由,从而提高了路由过程的速度和效率。基于130纳米和180纳米工艺的实验结果表明,与最先进的模拟IC布线方法相比,所提出的方法可以减少2.5%的线长和3.7%的过孔数量。在计算效率方面,双向A∗算法在决策场景中的性能比单向A∗算法提高了7.3%,在路径规划过程中提高了51.09%。仿真结果进一步表明,与人工和先进的自动化方法相比,该方法实现的布局总体性能与原理图性能最接近。
{"title":"Reinforcement learning-driven net order selection for efficient analog IC routing","authors":"Yong Zhang ,&nbsp;Wen-Jie Li ,&nbsp;Guo-Jing Ge ,&nbsp;Jin-Qiao Wang ,&nbsp;Bo-Wen Jia ,&nbsp;Ning Xu","doi":"10.1016/j.vlsi.2025.102623","DOIUrl":"10.1016/j.vlsi.2025.102623","url":null,"abstract":"<div><div>The A∗ algorithm is one of the most common analog integrated circuit (IC) routing techniques. As the number of nets increases, the routing order of this heuristic routing algorithm will affect the routing results immensely. Currently, artificial intelligence (AI) technologies are widely applied in IC physical design to accelerate layout design. In this paper, we propose a reinforcement model based on net order selection. We construct multi-channel images of routing data and extract features of the coordinates of routing pins through an attention mechanism. After training, the model outputs an optimized net order, which is then used to perform routing with a bidirectional A∗ algorithm, thereby improving both the speed and efficiency of the routing process. Experimental results on cases based on 130-nm and 180-nm processes show that the proposed method can achieve a 2.5 % reduction in wire length and a 3.7 % decrease in the number of vias compared to state-of-the-art methods for analog IC routing. In terms of computational efficiency, the bidirectional A∗ algorithm improves performance by 7.3 % over the unidirectional A∗ algorithm in decision-making scenarios and by 51.09 % in the path-planning process. Simulation results further demonstrate that, compared with manual and advanced automation methods, the overall performance of the layout achieved by our method aligns most closely with schematic performance.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102623"},"PeriodicalIF":2.5,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Routability–wirelength co-guided cell inflation with explainable multi-task learning for global placement optimization 可达性-无线协同引导细胞膨胀与可解释的多任务学习,用于全局布局优化
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-01 DOI: 10.1016/j.vlsi.2025.102624
Yan Xing, Zicheng Deng, Shuting Cai, Weijun Li, Xiaoming Xiong
Existing routability-driven global placers typically employed an iterative routability optimization process and performed cell inflation based only on lookahead congestion maps during each run. However, this incremental application of congestion estimation and mitigation resulted in placement solutions that deviate from optimal wirelength, thus compromising the optimization objective of balancing wirelength minimization and routability optimization. To simultaneously improve routability and reduce wirelength, this paper proposes a novel routability–wirelength co-guided cell inflation approach for global placement optimization. It employs a multi-task learning-based feature selection method, MTL-FS, to identify the optimal feature subset and train the corresponding routability–wirelength co-learning model, RWNet. During the iterative optimization process, both routability and wirelength are predicted using RWNet, and their correlation is interpreted by DeepSHAP to produce three impact maps. Subsequently, routability–wirelength co-guided cell inflation (RWCI) is performed based on an adjusted congestion map, which is derived from the predicted congestion map and the three impact maps. The experimental results on ISPD2011 and DAC2012 benchmark designs demonstrate that, compared to DERAMPlace and RoutePlacer (which represent non-machine learning-based and machine learning-based routability-driven placers, respectively), the proposed approach achieves both better optimization quality, specifically improved routability and reduced wirelength, and a decreased time cost. Moreover, the extension experiment shows our method consistently outperforms DREAMPlace (even when it uses 2D feature maps as proxies) in effectiveness while maintaining comparable efficiency. The Generalization experiment further confirms this superiority and comparable runtime, particularly in highly congested scenarios.
现有的可达性驱动的全局放置器通常采用迭代的可达性优化过程,并且在每次运行期间仅基于前瞻拥塞图执行单元膨胀。然而,这种对拥塞估计和缓解的增量应用导致放置解决方案偏离最佳无线长度,从而损害了平衡无线长度最小化和可达性优化的优化目标。为了同时提高可达性和减少无线长度,本文提出了一种新的可达性-无线长度协同引导的蜂窝膨胀方法,用于全局布局优化。它采用基于多任务学习的特征选择方法MTL-FS来识别最优特征子集,并训练相应的路由-无线共同学习模型RWNet。在迭代优化过程中,使用RWNet预测可达性和无线长度,并通过DeepSHAP解释它们的相关性,生成三个影响图。随后,基于调整后的拥塞图(由预测的拥塞图和三个影响图导出)执行可达性-无线共同引导蜂窝膨胀(RWCI)。在ISPD2011和DAC2012基准设计上的实验结果表明,与DERAMPlace和RoutePlacer(分别代表非机器学习和基于机器学习的可达性驱动的放料器)相比,所提出的方法实现了更好的优化质量,特别是提高了可达性和缩短了无线长度,并且降低了时间成本。此外,扩展实验表明,我们的方法在保持相当效率的同时,在有效性上始终优于DREAMPlace(即使它使用2D特征映射作为代理)。泛化实验进一步证实了这种优越性和可比较的运行时间,特别是在高度拥塞的场景下。
{"title":"Routability–wirelength co-guided cell inflation with explainable multi-task learning for global placement optimization","authors":"Yan Xing,&nbsp;Zicheng Deng,&nbsp;Shuting Cai,&nbsp;Weijun Li,&nbsp;Xiaoming Xiong","doi":"10.1016/j.vlsi.2025.102624","DOIUrl":"10.1016/j.vlsi.2025.102624","url":null,"abstract":"<div><div>Existing routability-driven global placers typically employed an iterative routability optimization process and performed cell inflation based only on lookahead congestion maps during each run. However, this incremental application of congestion estimation and mitigation resulted in placement solutions that deviate from optimal wirelength, thus compromising the optimization objective of balancing wirelength minimization and routability optimization. To simultaneously improve routability and reduce wirelength, this paper proposes a novel routability–wirelength co-guided cell inflation approach for global placement optimization. It employs a multi-task learning-based feature selection method, MTL-FS, to identify the optimal feature subset and train the corresponding routability–wirelength co-learning model, RWNet. During the iterative optimization process, both routability and wirelength are predicted using RWNet, and their correlation is interpreted by DeepSHAP to produce three impact maps. Subsequently, routability–wirelength co-guided cell inflation (RWCI) is performed based on an adjusted congestion map, which is derived from the predicted congestion map and the three impact maps. The experimental results on ISPD2011 and DAC2012 benchmark designs demonstrate that, compared to DERAMPlace and RoutePlacer (which represent non-machine learning-based and machine learning-based routability-driven placers, respectively), the proposed approach achieves both better optimization quality, specifically improved routability and reduced wirelength, and a decreased time cost. Moreover, the extension experiment shows our method consistently outperforms DREAMPlace (even when it uses 2D feature maps as proxies) in effectiveness while maintaining comparable efficiency. The Generalization experiment further confirms this superiority and comparable runtime, particularly in highly congested scenarios.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102624"},"PeriodicalIF":2.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing detailed-routability for 3D global routing through dynamic resource model and routability-aware cost scheme 通过动态资源模型和可达性感知成本方案优化三维全局路由的详细可达性
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-29 DOI: 10.1016/j.vlsi.2025.102616
Juntao Jian, Yan Xing, Shuting Cai, Weijun Li, Xiaoming Xiong
Detailed-routability optimization methods for three-dimensional global routing typically employ a two-stage process involving initial routing and multi-level maze routing (iterative rip-up and reroute, or RRR iterations). Within the coarse-grained maze route planning of RRR iterations, the resource model and cost scheme are paramount for optimization quality. However, current advancements in these areas often overlook the dynamic nature of routing resources throughout RRR iterations and fail to consider routability features beyond congestion. To mitigate these limitations, this paper introduces a novel detailed-routability optimization approach that integrates a dynamic resource model and a routability-aware cost scheme. The proposed dynamic resource model accounts for routing resources’ sensitivity to both spatial information and the progression of RRR iterations. Moreover, the routability-aware cost scheme, derived from coarse-grained routability features, is designed to optimize fine-grained routability. Experimental results validate that our approach surpasses baseline detailed-routability-driven global routers, exhibiting superior optimization performance by concurrently enhancing routability and overall quality scores (a weighted summation of wirelength and routability metrics), alongside achieving significant runtime reduction.
三维全局路由的详细可达性优化方法通常采用两阶段过程,包括初始路由和多级迷宫路由(迭代撕破和重新路由,或RRR迭代)。在RRR迭代的粗粒度迷宫路径规划中,资源模型和成本方案对优化质量至关重要。然而,目前这些领域的进展往往忽略了路由资源在RRR迭代过程中的动态特性,并且没有考虑到除了拥塞之外的可达性特征。为了减轻这些限制,本文引入了一种新的详细可达性优化方法,该方法集成了动态资源模型和可达性感知成本方案。提出的动态资源模型考虑了路由资源对空间信息的敏感性和RRR迭代进程的敏感性。此外,基于粗粒度可达性特征的可达性感知成本方案旨在优化细粒度可达性。实验结果证实,我们的方法超越了基准的详细可达性驱动的全局路由器,通过同时增强可达性和整体质量分数(无线长度和可达性指标的加权总和),表现出卓越的优化性能,同时显著减少了运行时间。
{"title":"Optimizing detailed-routability for 3D global routing through dynamic resource model and routability-aware cost scheme","authors":"Juntao Jian,&nbsp;Yan Xing,&nbsp;Shuting Cai,&nbsp;Weijun Li,&nbsp;Xiaoming Xiong","doi":"10.1016/j.vlsi.2025.102616","DOIUrl":"10.1016/j.vlsi.2025.102616","url":null,"abstract":"<div><div>Detailed-routability optimization methods for three-dimensional global routing typically employ a two-stage process involving initial routing and multi-level maze routing (iterative rip-up and reroute, or RRR iterations). Within the coarse-grained maze route planning of RRR iterations, the resource model and cost scheme are paramount for optimization quality. However, current advancements in these areas often overlook the dynamic nature of routing resources throughout RRR iterations and fail to consider routability features beyond congestion. To mitigate these limitations, this paper introduces a novel detailed-routability optimization approach that integrates a dynamic resource model and a routability-aware cost scheme. The proposed dynamic resource model accounts for routing resources’ sensitivity to both spatial information and the progression of RRR iterations. Moreover, the routability-aware cost scheme, derived from coarse-grained routability features, is designed to optimize fine-grained routability. Experimental results validate that our approach surpasses baseline detailed-routability-driven global routers, exhibiting superior optimization performance by concurrently enhancing routability and overall quality scores (a weighted summation of wirelength and routability metrics), alongside achieving significant runtime reduction.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102616"},"PeriodicalIF":2.5,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design insights for implementing a PRNG with fractional Lorenz system on ESP32 and FPGA 在ESP32和FPGA上实现带有分数洛伦兹系统的PRNG的设计见解
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-29 DOI: 10.1016/j.vlsi.2025.102622
Luis Gerardo de la Fraga , Esteban Tlelo-Cuautle
A Pseudo Random Number Generator (PRNG) produces a sequence whose randomness is evaluated by statistical tests like NIST and TestU01. The random sequences are deterministic and reproducible when using the same seed value. In this manner, and for cryptographic applications, the key size of a PRNG must be increased to resist brute force attacks. Henceforth, a fractional-order chaotic system, like the Lorenz one, is suitable to be used to design a PRNG, which implementation can be performed by using embedded devices such as the low-cost ESP32 (32-bit LX6 microprocessor) and field-programmable gate array (FPGA). To increase the throughput, the fractional Lorenz system is integrated with an approximated two steps Runge–Kutta method. An analysis in performed to find the domain of attraction for each state variable, and to verify that the PRNG produces non-correlated sequences. The hardware implementation is detailed by establishing the number of bits (or keys) required for the PRNG to guarantee its suitability for cryptographic applications. Finally, the hardware design of a PRNG using the fractional Lorenz system provides a throughput of 4.99 Mbits/s in the ESP32 platform, and 112.96 Mbits/s in the FPGA.
伪随机数生成器(PRNG)生成一个序列,其随机性由NIST和TestU01等统计测试评估。当使用相同的种子值时,随机序列具有确定性和可重复性。以这种方式,对于加密应用程序,必须增加PRNG的密钥大小以抵抗暴力攻击。因此,分数阶混沌系统,如洛伦兹系统,适合用于设计PRNG,其实现可以使用嵌入式器件,如低成本的ESP32(32位LX6微处理器)和现场可编程门阵列(FPGA)来实现。为了提高通量,将分数阶洛伦兹系统与近似两步龙格-库塔方法相结合。进行了分析,以找到每个状态变量的吸引域,并验证PRNG产生非相关序列。通过建立PRNG所需的比特(或密钥)数量来确保其适合加密应用程序,详细介绍了硬件实现。最后,采用分数阶洛伦兹系统的PRNG硬件设计在ESP32平台上的吞吐量为4.99 Mbits/s,在FPGA上的吞吐量为112.96 Mbits/s。
{"title":"Design insights for implementing a PRNG with fractional Lorenz system on ESP32 and FPGA","authors":"Luis Gerardo de la Fraga ,&nbsp;Esteban Tlelo-Cuautle","doi":"10.1016/j.vlsi.2025.102622","DOIUrl":"10.1016/j.vlsi.2025.102622","url":null,"abstract":"<div><div>A Pseudo Random Number Generator (PRNG) produces a sequence whose randomness is evaluated by statistical tests like NIST and TestU01. The random sequences are deterministic and reproducible when using the same seed value. In this manner, and for cryptographic applications, the key size of a PRNG must be increased to resist brute force attacks. Henceforth, a fractional-order chaotic system, like the Lorenz one, is suitable to be used to design a PRNG, which implementation can be performed by using embedded devices such as the low-cost ESP32 (32-bit LX6 microprocessor) and field-programmable gate array (FPGA). To increase the throughput, the fractional Lorenz system is integrated with an approximated two steps Runge–Kutta method. An analysis in performed to find the domain of attraction for each state variable, and to verify that the PRNG produces non-correlated sequences. The hardware implementation is detailed by establishing the number of bits (or keys) required for the PRNG to guarantee its suitability for cryptographic applications. Finally, the hardware design of a PRNG using the fractional Lorenz system provides a throughput of 4.99 Mbits/s in the ESP32 platform, and 112.96 Mbits/s in the FPGA.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102622"},"PeriodicalIF":2.5,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Error expectation-driven design and energy optimization of approximate multipliers 误差预期驱动的近似乘法器设计与能量优化
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-28 DOI: 10.1016/j.vlsi.2025.102620
Yiqi Zhou , Yanghui Wu , Daying Sun , Shan Shen , Xiong Cheng , Li Li
Multipliers dominate energy consumption in digital signal processing (DSP) systems, while approximate multipliers offer accuracy-efficiency trade-offs, many existing designs suffer from suboptimal energy efficiency. This paper presents an error compensation algorithm that minimizes the global error expectation (EE). By analyzing the error distribution across approximate compressor columns, the algorithm determines optimal compensation positions to reduce EE while maintaining low hardware overhead. Based on this approach, two high energy efficiency approximate multipliers (HEAMs) are proposed: HEAM_M1, optimized for high accuracy, and HEAM_M2, which incorporates a newly designed 4-1 approximate compressor for ultra-low power applications. Compared to an exact multiplier, HEAM_M1 and HEAM_M2 achieve power-delay product (PDP) reductions of 32% and 54%, respectively. Moreover, compared to prior approximate multipliers with similar PDP levels, HEAM_M1 reduces NMED and MRED by 80% and 83%, while HEAM_M2 achieves reductions of 70% and 86%, respectively. Application-level evaluations on image processing and neural network tasks further demonstrate the effectiveness and robustness of the proposed designs.
在数字信号处理(DSP)系统中,乘法器主导着能源消耗,而近似乘法器提供了精度和效率的权衡,许多现有的设计都遭受着次优能源效率的困扰。提出了一种最小化全局误差期望(EE)的误差补偿算法。通过分析近似压缩机列之间的误差分布,该算法确定最佳补偿位置,以减少EE,同时保持较低的硬件开销。基于这种方法,提出了两种高能效近似乘法器(HEAMs):针对高精度进行优化的HEAM_M1和包含新设计的用于超低功耗应用的4-1近似压缩机的HEAM_M2。与精确乘法器相比,HEAM_M1和HEAM_M2分别实现了32%和54%的功率延迟积(PDP)降低。此外,与先前具有相似PDP水平的近似乘数相比,HEAM_M1将NMED和MRED分别降低了80%和83%,而HEAM_M2分别降低了70%和86%。对图像处理和神经网络任务的应用级评估进一步证明了所提出设计的有效性和鲁棒性。
{"title":"Error expectation-driven design and energy optimization of approximate multipliers","authors":"Yiqi Zhou ,&nbsp;Yanghui Wu ,&nbsp;Daying Sun ,&nbsp;Shan Shen ,&nbsp;Xiong Cheng ,&nbsp;Li Li","doi":"10.1016/j.vlsi.2025.102620","DOIUrl":"10.1016/j.vlsi.2025.102620","url":null,"abstract":"<div><div>Multipliers dominate energy consumption in digital signal processing (DSP) systems, while approximate multipliers offer accuracy-efficiency trade-offs, many existing designs suffer from suboptimal energy efficiency. This paper presents an error compensation algorithm that minimizes the global error expectation (EE). By analyzing the error distribution across approximate compressor columns, the algorithm determines optimal compensation positions to reduce EE while maintaining low hardware overhead. Based on this approach, two high energy efficiency approximate multipliers (HEAMs) are proposed: HEAM_M1, optimized for high accuracy, and HEAM_M2, which incorporates a newly designed 4-1 approximate compressor for ultra-low power applications. Compared to an exact multiplier, HEAM_M1 and HEAM_M2 achieve power-delay product (PDP) reductions of 32% and 54%, respectively. Moreover, compared to prior approximate multipliers with similar PDP levels, HEAM_M1 reduces NMED and MRED by 80% and 83%, while HEAM_M2 achieves reductions of 70% and 86%, respectively. Application-level evaluations on image processing and neural network tasks further demonstrate the effectiveness and robustness of the proposed designs.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102620"},"PeriodicalIF":2.5,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeSTAR: Hardware Trojans and its mitigation strategy in NoC routers NeSTAR: NoC路由器中的硬件木马及其缓解策略
IF 2.5 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-26 DOI: 10.1016/j.vlsi.2025.102603
Josna Philomina , Rekha K. James , Shirshendu Das , Palash Das , Daleesha M Viswanathan
As Network-on-Chip (NoC) designs become essential in Tiled Chip Multicore Processor (TCMP) systems, it is increasingly important to protect NoC router communication from disruptions caused by hardware Trojans (HT). TCMPs often use intellectual property (IP) blocks from multiple vendors to design their NoC. This opens the door for untrusted vendors to compromise system security by inserting HTs into these IPs, which can alter the normal operation of NoC routers. These HTs are especially dangerous because they can remain undetected during the chip verification and testing stages. This paper explores how multiple HTs placed in the Route Computation Unit (RCU) of NoC routers can interfere with routing decisions, affect packet delivery, and harm overall system performance. We analyze the effects of these HTs using both synthetic traffic and real-world benchmarks, measuring their impact on latency, throughput and processor instructions per cycle (IPC). To address these issues, we introduce a solution called Neighbor-Supported Trojan-Aware Routing (NeSTAR). NeSTAR uses cooperation among neighboring routers to make routing decisions, helping the network to continue to function even when some RCUs are compromised. Our experimental results show that NeSTAR can reduce latency by 46%, improve throughput by 262%, lower packet deflected latency by 69%, and improve IPC by 37%, compared to the NoC affected by HT.
随着片上网络(NoC)设计在平铺片多核处理器(TCMP)系统中变得越来越重要,保护NoC路由器通信免受硬件木马(HT)造成的中断变得越来越重要。tcm通常使用来自多个供应商的知识产权(IP)块来设计他们的NoC。这为不受信任的供应商打开了一扇门,通过在这些ip中插入ht来破坏系统安全,这可能会改变NoC路由器的正常运行。这些高温超导尤其危险,因为它们可能在芯片验证和测试阶段未被发现。本文探讨了放置在NoC路由器的路由计算单元(RCU)中的多个ht如何干扰路由决策,影响数据包传递,并损害整体系统性能。我们使用合成流量和实际基准来分析这些ht的影响,测量它们对延迟、吞吐量和每周期处理器指令(IPC)的影响。为了解决这些问题,我们引入了一种称为邻居支持的木马感知路由(NeSTAR)的解决方案。NeSTAR使用相邻路由器之间的合作来做出路由决策,即使某些rcu受到损害,也可以帮助网络继续运行。我们的实验结果表明,与受HT影响的NoC相比,NeSTAR可以减少46%的延迟,提高262%的吞吐量,降低69%的数据包偏转延迟,提高37%的IPC。
{"title":"NeSTAR: Hardware Trojans and its mitigation strategy in NoC routers","authors":"Josna Philomina ,&nbsp;Rekha K. James ,&nbsp;Shirshendu Das ,&nbsp;Palash Das ,&nbsp;Daleesha M Viswanathan","doi":"10.1016/j.vlsi.2025.102603","DOIUrl":"10.1016/j.vlsi.2025.102603","url":null,"abstract":"<div><div>As Network-on-Chip (NoC) designs become essential in Tiled Chip Multicore Processor (TCMP) systems, it is increasingly important to protect NoC router communication from disruptions caused by hardware Trojans (HT). TCMPs often use intellectual property (IP) blocks from multiple vendors to design their NoC. This opens the door for untrusted vendors to compromise system security by inserting HTs into these IPs, which can alter the normal operation of NoC routers. These HTs are especially dangerous because they can remain undetected during the chip verification and testing stages. This paper explores how multiple HTs placed in the Route Computation Unit (RCU) of NoC routers can interfere with routing decisions, affect packet delivery, and harm overall system performance. We analyze the effects of these HTs using both synthetic traffic and real-world benchmarks, measuring their impact on latency, throughput and processor instructions per cycle (IPC). To address these issues, we introduce a solution called Neighbor-Supported Trojan-Aware Routing (NeSTAR). NeSTAR uses cooperation among neighboring routers to make routing decisions, helping the network to continue to function even when some RCUs are compromised. Our experimental results show that NeSTAR can reduce latency by 46%, improve throughput by 262%, lower packet deflected latency by 69%, and improve IPC by 37%, compared to the NoC affected by HT.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"107 ","pages":"Article 102603"},"PeriodicalIF":2.5,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Integration-The Vlsi Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1