Pub Date : 2025-07-10DOI: 10.1109/TVLSI.2025.3584657
Yishuo Meng;Jianfei Wang;Qiang Fu;Jia Hou;Siwei Xiang;Ge Li;Chen Yang
The customization of accelerators for sparse convolutional neural networks (SCNNs) has been shown to significantly enhance the computational efficiency of CNNs. However, while processing the widely existing irregularly distributed sparsity in filters and feature maps, serial sparsity detection (SSD) methods and small-capacity computation arrays are always applied in current works. As a result, it is difficult to fully translate the exploitation of sparsity into hardware performance improvement. Therefore, in this article, first, a novel parallel sparsity detection (PSD) scheme is proposed and hardware-implemented to efficiently extract the valid weights and activations. In addition, an index-oriented computation workflow for parallel sparse convolution is also proposed to eliminate the output index diversity during sparse convolutions. With the assistance of the above sparsity detection scheme and computation workflow, a large-scale two-side SCNN accelerator is designed and implemented on the Xilinx VCU118 platform, achieving a runtime frequency of 300 MHz. The evaluation results indicate that this work can achieve 1284.43/1105.31 GOPS performance while deploying VGG16/ResNet-50. Compared to the previous dense-/sparse-based works, this work can achieve a performance enhancement ranging from $1.284times $ to $12.266times $ and a DSP efficiency improvement from $1.718times $ to $6.131times $ . These results highlight the superior ability to translate sparsity exploitation into performance gains.
{"title":"A High-Performance SCNN Accelerator Using Parallel Sparsity Detection and Index-Oriented Computation Workflow","authors":"Yishuo Meng;Jianfei Wang;Qiang Fu;Jia Hou;Siwei Xiang;Ge Li;Chen Yang","doi":"10.1109/TVLSI.2025.3584657","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3584657","url":null,"abstract":"The customization of accelerators for sparse convolutional neural networks (SCNNs) has been shown to significantly enhance the computational efficiency of CNNs. However, while processing the widely existing irregularly distributed sparsity in filters and feature maps, serial sparsity detection (SSD) methods and small-capacity computation arrays are always applied in current works. As a result, it is difficult to fully translate the exploitation of sparsity into hardware performance improvement. Therefore, in this article, first, a novel parallel sparsity detection (PSD) scheme is proposed and hardware-implemented to efficiently extract the valid weights and activations. In addition, an index-oriented computation workflow for parallel sparse convolution is also proposed to eliminate the output index diversity during sparse convolutions. With the assistance of the above sparsity detection scheme and computation workflow, a large-scale two-side SCNN accelerator is designed and implemented on the Xilinx VCU118 platform, achieving a runtime frequency of 300 MHz. The evaluation results indicate that this work can achieve 1284.43/1105.31 GOPS performance while deploying VGG16/ResNet-50. Compared to the previous dense-/sparse-based works, this work can achieve a performance enhancement ranging from <inline-formula> <tex-math>$1.284times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$12.266times $ </tex-math></inline-formula> and a DSP efficiency improvement from <inline-formula> <tex-math>$1.718times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$6.131times $ </tex-math></inline-formula>. These results highlight the superior ability to translate sparsity exploitation into performance gains.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2449-2461"},"PeriodicalIF":3.1,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-10DOI: 10.1109/TVLSI.2025.3585971
Jeongmin Kim;Jaehoon Kwon;Hansol Jeong;In-Cheol Park
Syndrome calculation (SC) is a critical step in Bose-Chaudhuri-Hocquenghem (BCH) decoding, and its computational efficiency significantly impacts the energy consumption of the entire decoder. This article proposes an energy-efficient SC architecture designed for BCH decoders. The proposed architecture fundamentally adopts a remainder-based SC, which consumes less energy than the conventional Horner’s method-based SC unit. Furthermore, unlike previous remainder-based approaches, it uses a minimal polynomial to produce a shorter remainder, leading to reduced computation and improved energy efficiency. Implementation results demonstrate an 80% improvement in energy efficiency compared to the latest Horner’s method-based SC unit and a 35% improvement compared to the previous remainder-based SC unit.
{"title":"Energy-Efficient Syndrome Calculation Architecture for BCH Decoders","authors":"Jeongmin Kim;Jaehoon Kwon;Hansol Jeong;In-Cheol Park","doi":"10.1109/TVLSI.2025.3585971","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3585971","url":null,"abstract":"Syndrome calculation (SC) is a critical step in Bose-Chaudhuri-Hocquenghem (BCH) decoding, and its computational efficiency significantly impacts the energy consumption of the entire decoder. This article proposes an energy-efficient SC architecture designed for BCH decoders. The proposed architecture fundamentally adopts a remainder-based SC, which consumes less energy than the conventional Horner’s method-based SC unit. Furthermore, unlike previous remainder-based approaches, it uses a minimal polynomial to produce a shorter remainder, leading to reduced computation and improved energy efficiency. Implementation results demonstrate an 80% improvement in energy efficiency compared to the latest Horner’s method-based SC unit and a 35% improvement compared to the previous remainder-based SC unit.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2488-2496"},"PeriodicalIF":3.1,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-10DOI: 10.1109/TVLSI.2025.3585732
Chun-Chi Chen;Chao-Lieh Chen;Kai-Hsiang Chang
This brief presents an all-digital CMOS time-to-digital converter (TDC) with an integrated smart temperature sensor (STS), effectively reducing circuit complexity and cost. Unlike previous designs employing a single coupling unit, the proposed TDC adopts a two-coupling-unit structure, simplifying the overall architecture while enabling pulse-shrinking time measurement and offset-error cancellation within a single cyclic delay line. The built-in cancellation enhances linearity while minimizing overhead. Notably, the integrated STS requires only one additional coupling unit, ensuring a negligible impact on circuit complexity and cost. Fabricated using the TSMC 0.35-$mu $ m CMOS process, the proposed design demonstrates improved cost efficiency compared to prior works. Experimental results validate the successful measurement of time and temperature, highlighting the advantages of reduced complexity and cost savings.
本文介绍了一种集成了智能温度传感器(STS)的全数字CMOS时间-数字转换器(TDC),有效地降低了电路的复杂性和成本。与以往采用单个耦合单元的设计不同,本文提出的TDC采用双耦合单元结构,简化了整体结构,同时在单个循环延迟线内实现了脉冲收缩时间测量和偏移误差抵消。内置抵消增强线性,同时最大限度地减少开销。值得注意的是,集成STS只需要一个额外的耦合单元,确保对电路复杂性和成本的影响可以忽略不计。采用TSMC 0.35- $mu $ m CMOS工艺制造,与先前的工作相比,所提出的设计具有更高的成本效率。实验结果验证了时间和温度的成功测量,突出了降低复杂性和节省成本的优点。
{"title":"All-Digital CMOS Pulse-Shrinking Time-to-Digital Converter With Built-in Offset-Error Cancellation and Smart Temperature Sensor","authors":"Chun-Chi Chen;Chao-Lieh Chen;Kai-Hsiang Chang","doi":"10.1109/TVLSI.2025.3585732","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3585732","url":null,"abstract":"This brief presents an all-digital CMOS time-to-digital converter (TDC) with an integrated smart temperature sensor (STS), effectively reducing circuit complexity and cost. Unlike previous designs employing a single coupling unit, the proposed TDC adopts a two-coupling-unit structure, simplifying the overall architecture while enabling pulse-shrinking time measurement and offset-error cancellation within a single cyclic delay line. The built-in cancellation enhances linearity while minimizing overhead. Notably, the integrated STS requires only one additional coupling unit, ensuring a negligible impact on circuit complexity and cost. Fabricated using the TSMC 0.35-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m CMOS process, the proposed design demonstrates improved cost efficiency compared to prior works. Experimental results validate the successful measurement of time and temperature, highlighting the advantages of reduced complexity and cost savings.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2597-2601"},"PeriodicalIF":3.1,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-08DOI: 10.1109/TVLSI.2025.3585043
Akul Malhotra;Sumeet Kumar Gupta
Ternary large language models (LLMs), which use ternary precision weights and 8-bit activations, have demonstrated competitive performance while significantly reducing the high computational and memory requirements of full-precision LLMs. The energy efficiency and performance of ternary LLMs can be further improved by deploying them on ternary computing-in-memory (TCiM) accelerators, thereby alleviating the von-Neumann bottleneck. However, TCiM accelerators are prone to memory stuck-at faults (SAFs) leading to degradation in model accuracy. This is particularly severe for LLMs due to their low weight sparsity. To boost SAF tolerance of TCiM accelerators, we propose ReTern that is based on 1) fault-aware sign transformations (FASTs) and 2) TCiM bitcell reprogramming exploiting their natural redundancy. The key idea is to use FAST to minimize computation errors due to SAFs in +1/−1 weights, while the natural bitcell redundancy is exploited to target SAFs in 0 weights (zero-fix). Our experiments on BitNet b1.58 700M and 3B ternary LLMs show that our technique furnishes significant fault tolerance, notably ~35% reduction in perplexity on the Wikitext dataset in the presence of faults. These benefits come at the cost of <3%, <7%, and <1% energy, latency, and area overheads, respectively.
{"title":"ReTern: Exploiting Natural Redundancy and Sign Transformations for Enhanced Fault Tolerance in Compute-in-Memory-Based Ternary LLMs","authors":"Akul Malhotra;Sumeet Kumar Gupta","doi":"10.1109/TVLSI.2025.3585043","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3585043","url":null,"abstract":"Ternary large language models (LLMs), which use ternary precision weights and 8-bit activations, have demonstrated competitive performance while significantly reducing the high computational and memory requirements of full-precision LLMs. The energy efficiency and performance of ternary LLMs can be further improved by deploying them on ternary computing-in-memory (TCiM) accelerators, thereby alleviating the von-Neumann bottleneck. However, TCiM accelerators are prone to memory stuck-at faults (SAFs) leading to degradation in model accuracy. This is particularly severe for LLMs due to their low weight sparsity. To boost SAF tolerance of TCiM accelerators, we propose ReTern that is based on 1) fault-aware sign transformations (FASTs) and 2) TCiM bitcell reprogramming exploiting their natural redundancy. The key idea is to use FAST to minimize computation errors due to SAFs in +1/−1 weights, while the natural bitcell redundancy is exploited to target SAFs in 0 weights (zero-fix). Our experiments on BitNet b1.58 700M and 3B ternary LLMs show that our technique furnishes significant fault tolerance, notably ~35% reduction in perplexity on the Wikitext dataset in the presence of faults. These benefits come at the cost of <3%, <7%, and <1% energy, latency, and area overheads, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2518-2527"},"PeriodicalIF":3.1,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-04DOI: 10.1109/TVLSI.2025.3580266
Yerui Guang;Qun Ding;Dongxu Liu
Although multiscroll conservative chaotic systems exhibit rich dynamical characteristics and hold great potential for secure communications, existing designs generally suffer from limited controllability and low hardware implementation efficiency. To address these challenges, this article proposes a novel 4-D multiscroll conservative chaotic system based on a nonlinear feedback structure constructed using the floor function. This original approach simplifies the system’s logical structure, facilitating efficient hardware modeling while enabling flexible control over the number, amplitude, and spatial distribution of scrolls in 3-D space. The system’s high complexity and coexisting behaviors are validated through dynamical analyses, including equilibrium point analysis, Poincaré sections, and Lyapunov exponents (LEs). To achieve efficient deployment of the chaotic system on field-programmable gate array (FPGA) platforms, this article first simplifies the hardware implementation logic of the feedback structure through the design of an algorithmic model based on bitwise operations. Subsequently, precise control of the system’s module signals is achieved through a finite state machine (FSM) design. The results of the resource comparison analysis indicate that the proposed model achieves a high throughput of 10.08 Gbps while consuming only 1051 look-up tables (LUTs). The lower energy efficiency is 0.0264 mW/Mbps. Hardware-software co-simulation and oscilloscope visual output confirm the numerical precision and hardware feasibility of the proposed system. Finally, this system is integrated with the ZUC stream cipher to construct a novel encryption core, enabling asynchronous ciphertext transmission as well as encryption and decryption functions, thereby demonstrating its potential for secure hardware applications.
{"title":"FPGA-Oriented Design and Efficient Implementation of a Geometrically Tunable Multiscroll Conservative Chaotic System Without Equilibrium Points","authors":"Yerui Guang;Qun Ding;Dongxu Liu","doi":"10.1109/TVLSI.2025.3580266","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3580266","url":null,"abstract":"Although multiscroll conservative chaotic systems exhibit rich dynamical characteristics and hold great potential for secure communications, existing designs generally suffer from limited controllability and low hardware implementation efficiency. To address these challenges, this article proposes a novel 4-D multiscroll conservative chaotic system based on a nonlinear feedback structure constructed using the floor function. This original approach simplifies the system’s logical structure, facilitating efficient hardware modeling while enabling flexible control over the number, amplitude, and spatial distribution of scrolls in 3-D space. The system’s high complexity and coexisting behaviors are validated through dynamical analyses, including equilibrium point analysis, Poincaré sections, and Lyapunov exponents (LEs). To achieve efficient deployment of the chaotic system on field-programmable gate array (FPGA) platforms, this article first simplifies the hardware implementation logic of the feedback structure through the design of an algorithmic model based on bitwise operations. Subsequently, precise control of the system’s module signals is achieved through a finite state machine (FSM) design. The results of the resource comparison analysis indicate that the proposed model achieves a high throughput of 10.08 Gbps while consuming only 1051 look-up tables (LUTs). The lower energy efficiency is 0.0264 mW/Mbps. Hardware-software co-simulation and oscilloscope visual output confirm the numerical precision and hardware feasibility of the proposed system. Finally, this system is integrated with the ZUC stream cipher to construct a novel encryption core, enabling asynchronous ciphertext transmission as well as encryption and decryption functions, thereby demonstrating its potential for secure hardware applications.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2528-2541"},"PeriodicalIF":3.1,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-03DOI: 10.1109/TVLSI.2025.3579484
Sebastian Haas;Christopher Dunkel;Friedrich Pauls;Mattis Hasler;Yogesh Verma;Nilanjana Das;Michael Raitza
Nowadays, digital devices like sensors, cell phones, and home servers are deeply embedded in our world to make our daily lives easier. Since we heavily rely on these systems, it is crucial to guarantee their correct functionality and to ensure security and privacy properties. As systems become increasingly complex, it is difficult to maintain security since it necessitates a thorough understanding of all functionalities in hardware and software. Complexity may lead to vulnerabilities that malicious components can exploit. These components can compromise security features provided by the processing cores and the operating system (OS), jeopardizing the overall trustworthiness of the system. In this article, we provide a secure-by-default hardware/OS co-design to build a substrate for trustworthy computing in digital devices. The design is based on a tiled architecture that can integrate untrusted hardware components. Instead of relying on isolation mechanisms of potentially malicious components, isolation is achieved by dedicated and independent hardware components called trusted communication units (TCUs). By keeping the attack surface small and isolating all components by default, malicious hardware and software are restricted in access permissions and, hence, cannot easily break the system’s security. We implemented a TCU-based multiprocessor architecture in a silicon research chip, called Masur23, and ran transfer workloads and selected portions of the microkernel-based OS M3. Our measurements demonstrate the feasibility of such a hardware/OS co-design for trustworthy computing. Compared to the entire chip implementation, security features require minimal latency, area, and power consumption overhead.
{"title":"A Secure-by-Design Hardware/Operating System as a Substrate for Trustworthy Computing","authors":"Sebastian Haas;Christopher Dunkel;Friedrich Pauls;Mattis Hasler;Yogesh Verma;Nilanjana Das;Michael Raitza","doi":"10.1109/TVLSI.2025.3579484","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3579484","url":null,"abstract":"Nowadays, digital devices like sensors, cell phones, and home servers are deeply embedded in our world to make our daily lives easier. Since we heavily rely on these systems, it is crucial to guarantee their correct functionality and to ensure security and privacy properties. As systems become increasingly complex, it is difficult to maintain security since it necessitates a thorough understanding of all functionalities in hardware and software. Complexity may lead to vulnerabilities that malicious components can exploit. These components can compromise security features provided by the processing cores and the operating system (OS), jeopardizing the overall trustworthiness of the system. In this article, we provide a secure-by-default hardware/OS co-design to build a substrate for trustworthy computing in digital devices. The design is based on a tiled architecture that can integrate untrusted hardware components. Instead of relying on isolation mechanisms of potentially malicious components, isolation is achieved by dedicated and independent hardware components called trusted communication units (TCUs). By keeping the attack surface small and isolating all components by default, malicious hardware and software are restricted in access permissions and, hence, cannot easily break the system’s security. We implemented a TCU-based multiprocessor architecture in a silicon research chip, called Masur23, and ran transfer workloads and selected portions of the microkernel-based OS M<sup>3</sup>. Our measurements demonstrate the feasibility of such a hardware/OS co-design for trustworthy computing. Compared to the entire chip implementation, security features require minimal latency, area, and power consumption overhead.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 10","pages":"2862-2872"},"PeriodicalIF":3.1,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-02DOI: 10.1109/TVLSI.2024.3477963
Muhao Li;Houren Ji;Xiaosi Tan;Chuan Zhang
In this brief, a stochastic belief propagation (BP)-based iterative detection and decoding (IDD) for multiple-input and multiple-output (MIMO) system is proposed. We modify the algorithm of BP detection to make it more suitable for stochastic computation and enable the soft message to be transmitted between the detector and decoder in the format of stochastic sequences. Through IDD, the required number of iterations and quantization precision for the detector will decrease. By sharing the stochastic number generator, the hardware complexity of both the detector and decoder can be reduced. Hardware architectural optimizations and the corresponding implementation are also given, and we can implement $64times 32$ , four-QAM MIMO system with (128, 64) polar codes with $1.283~text {mm}^{2}$ area consumption. Compared with other detector, the hardware efficiency can be improved by 7.8 times.
{"title":"Stochastic Belief Propagation-Based Iterative Detection and Decoding for MIMO Systems","authors":"Muhao Li;Houren Ji;Xiaosi Tan;Chuan Zhang","doi":"10.1109/TVLSI.2024.3477963","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3477963","url":null,"abstract":"In this brief, a stochastic belief propagation (BP)-based iterative detection and decoding (IDD) for multiple-input and multiple-output (MIMO) system is proposed. We modify the algorithm of BP detection to make it more suitable for stochastic computation and enable the soft message to be transmitted between the detector and decoder in the format of stochastic sequences. Through IDD, the required number of iterations and quantization precision for the detector will decrease. By sharing the stochastic number generator, the hardware complexity of both the detector and decoder can be reduced. Hardware architectural optimizations and the corresponding implementation are also given, and we can implement <inline-formula> <tex-math>$64times 32$ </tex-math></inline-formula>, four-QAM MIMO system with (128, 64) polar codes with <inline-formula> <tex-math>$1.283~text {mm}^{2}$ </tex-math></inline-formula> area consumption. Compared with other detector, the hardware efficiency can be improved by 7.8 times.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2324-2328"},"PeriodicalIF":2.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144705278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-30DOI: 10.1109/TVLSI.2025.3579662
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2025.3579662","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3579662","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11059982","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-30DOI: 10.1109/TVLSI.2025.3579664
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3579664","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3579664","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11059983","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-27DOI: 10.1109/TVLSI.2025.3581296
Ming-Yi Lin;Wei-Kuan Chiang;Chin-Hung Wang
The increasing density of static random access memory (SRAM) in modern system-on-chip (SoC) architectures has intensified the need for efficient built-in self-test (BIST) solutions to ensure fault detection and repair. This article presents an optimized register transfer level (RTL)-BIST intellectual property core (IP core) that integrates a novel March mSR+ algorithm, providing a low-power, high-fault-coverage approach to embedded memory testing. Developed using high-level synthesis (HLS), the proposed framework enhances test efficiency while minimizing hardware complexity. Experimental results on field-programmable gate array (FPGA) implementations demonstrate that the March mSR+ algorithm achieves an 88.89% fault coverage while reducing power consumption compared with conventional March-based testing methods. These findings validate the effectiveness of the RTL-BIST framework in improving memory reliability for artificial intelligence (AI), high-performance computing (HPC), and safety-critical applications.
{"title":"Enhancing Memory BIST With an Optimized RTL-BIST IP Core: A Low-Power, High-Fault-Coverage Approach","authors":"Ming-Yi Lin;Wei-Kuan Chiang;Chin-Hung Wang","doi":"10.1109/TVLSI.2025.3581296","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3581296","url":null,"abstract":"The increasing density of static random access memory (SRAM) in modern system-on-chip (SoC) architectures has intensified the need for efficient built-in self-test (BIST) solutions to ensure fault detection and repair. This article presents an optimized register transfer level (RTL)-BIST intellectual property core (IP core) that integrates a novel March mSR+ algorithm, providing a low-power, high-fault-coverage approach to embedded memory testing. Developed using high-level synthesis (HLS), the proposed framework enhances test efficiency while minimizing hardware complexity. Experimental results on field-programmable gate array (FPGA) implementations demonstrate that the March mSR+ algorithm achieves an 88.89% fault coverage while reducing power consumption compared with conventional March-based testing methods. These findings validate the effectiveness of the RTL-BIST framework in improving memory reliability for artificial intelligence (AI), high-performance computing (HPC), and safety-critical applications.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 9","pages":"2556-2569"},"PeriodicalIF":3.1,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144904903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}