Pub Date : 2026-02-25DOI: 10.1109/TVLSI.2026.3660040
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2026.3660040","DOIUrl":"https://doi.org/10.1109/TVLSI.2026.3660040","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"34 3","pages":"C3-C3"},"PeriodicalIF":3.1,"publicationDate":"2026-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11411924","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147280885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.1109/TVLSI.2026.3654164
Yi Guo;Xuetao Li;Xin Luo;Heming Sun;Haroon Waris;Weiqiang Liu
The Booth algorithm is widely used for efficient signed multiplication due to its ability to reduce partial products. A higher radix Booth multiplier generates fewer partial products, while it also increases hardware complexity in the generator, diminishing the advantage of fewer accumulators. Previous optimizations of generators and accumulators were designed for application-specific integrated circuits (ASICs), but their performance gains cannot be comparably translated to field-programmable gate arrays (FPGAs) due to differences in architecture. This article proposes FPGA-friendly approximate Booth multipliers that combine approximate hybrid-radix partial product generation with resource-efficient accumulation techniques. Initially, to improve generation efficiency, an look-up table (LUT)-reused exact radix-8 generator is introduced through logical partitioning to integrate two types of partial products into a single LUT. In addition, approximate adjacent-compensation radix-8 and radix-16 generators are developed based on the Booth encoding bit-repetition principle. Later, to speed up partial product accumulation, an overlap-parallel accumulation scheme and various accumulators are proposed, reducing compression steps and enhancing resource utilization. Last, performance-configurable hybrid radix-8/-16 approximate Booth multipliers are designed to meet the needs of different error-resilient applications. The most hardware-efficient configuration of the proposed 16-bit multiplier reduces power–delay product (PDP) and LUT consumption by 38.31% and 35.66%, respectively, compared with the exact multiplier. Furthermore, the proposed designs offer a better balance between accuracy and hardware complexity than existing approximate multipliers. The practicality of these multipliers is demonstrated in both joint photographic experts group (JPEG) image compression and finite impulse response (FIR) filtering applications. An open-source library of the proposed multipliers is available at https://github.com/YnuGuoLab/FPGA_Signed_Approx_Mul to support further research.
{"title":"FPGA-Based Low-Power Signed Approximate Multipliers for Diverse Error-Resilient Applications","authors":"Yi Guo;Xuetao Li;Xin Luo;Heming Sun;Haroon Waris;Weiqiang Liu","doi":"10.1109/TVLSI.2026.3654164","DOIUrl":"https://doi.org/10.1109/TVLSI.2026.3654164","url":null,"abstract":"The Booth algorithm is widely used for efficient signed multiplication due to its ability to reduce partial products. A higher radix Booth multiplier generates fewer partial products, while it also increases hardware complexity in the generator, diminishing the advantage of fewer accumulators. Previous optimizations of generators and accumulators were designed for application-specific integrated circuits (ASICs), but their performance gains cannot be comparably translated to field-programmable gate arrays (FPGAs) due to differences in architecture. This article proposes FPGA-friendly approximate Booth multipliers that combine approximate hybrid-radix partial product generation with resource-efficient accumulation techniques. Initially, to improve generation efficiency, an look-up table (LUT)-reused exact radix-8 generator is introduced through logical partitioning to integrate two types of partial products into a single LUT. In addition, approximate adjacent-compensation radix-8 and radix-16 generators are developed based on the Booth encoding bit-repetition principle. Later, to speed up partial product accumulation, an overlap-parallel accumulation scheme and various accumulators are proposed, reducing compression steps and enhancing resource utilization. Last, performance-configurable hybrid radix-8/-16 approximate Booth multipliers are designed to meet the needs of different error-resilient applications. The most hardware-efficient configuration of the proposed 16-bit multiplier reduces power–delay product (PDP) and LUT consumption by 38.31% and 35.66%, respectively, compared with the exact multiplier. Furthermore, the proposed designs offer a better balance between accuracy and hardware complexity than existing approximate multipliers. The practicality of these multipliers is demonstrated in both joint photographic experts group (JPEG) image compression and finite impulse response (FIR) filtering applications. An open-source library of the proposed multipliers is available at <uri>https://github.com/YnuGuoLab/FPGA_Signed_Approx_Mul</uri> to support further research.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"34 3","pages":"1029-1042"},"PeriodicalIF":3.1,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147280886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1109/TVLSI.2026.3653075
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2026.3653075","DOIUrl":"https://doi.org/10.1109/TVLSI.2026.3653075","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"34 2","pages":"C3-C3"},"PeriodicalIF":3.1,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11361321","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1109/TVLSI.2026.3651779
Congwei Chen;Jinwei Pu;Jianxiong Zhang;Jiaying Liao;Ruidian Zhan;Fei Yu;Yun Chen;Shuting Cai
The number theoretic transform (NTT) is essential for accelerating polynomial multiplication in lattice-based cryptography. However, it is vulnerable to soft-analytical side-channel attacks (SASCAs). Although local masking countermeasure provides theoretical resistance against such attacks, its direct implementation in Radix-4 NTT architecture leads to more than a 4 times increase in modular multiplications, resulting in substantial hardware overhead. To address this challenge, we propose the modular multiplication parallel mask sharing (MMPMS) scheme, which optimizes the modular multiplication parallelism of the Radix-4 butterfly units and shares random twiddle factors, thereby achieving a balance between hardware overhead and security. Then, we construct a complete local masking NTT/INTT algorithm and efficiently implement it on the Artix-7 field-programmable gate array (FPGA). Experimental results show that compared with the state-of-the-art local masking NTT, our scheme reduces the equivalent area and ATP overhead by more than 8.24 times and 6.74 times, respectively. In addition, a nonspecific t-test analysis indicates no significant side-channel leakage.
{"title":"A Low-Cost Local Masking Radix-4 NTT Against Soft-Analytical Side-Channel Attacks","authors":"Congwei Chen;Jinwei Pu;Jianxiong Zhang;Jiaying Liao;Ruidian Zhan;Fei Yu;Yun Chen;Shuting Cai","doi":"10.1109/TVLSI.2026.3651779","DOIUrl":"https://doi.org/10.1109/TVLSI.2026.3651779","url":null,"abstract":"The number theoretic transform (NTT) is essential for accelerating polynomial multiplication in lattice-based cryptography. However, it is vulnerable to soft-analytical side-channel attacks (SASCAs). Although local masking countermeasure provides theoretical resistance against such attacks, its direct implementation in Radix-4 NTT architecture leads to more than a 4 times increase in modular multiplications, resulting in substantial hardware overhead. To address this challenge, we propose the modular multiplication parallel mask sharing (MMPMS) scheme, which optimizes the modular multiplication parallelism of the Radix-4 butterfly units and shares random twiddle factors, thereby achieving a balance between hardware overhead and security. Then, we construct a complete local masking NTT/INTT algorithm and efficiently implement it on the Artix-7 field-programmable gate array (FPGA). Experimental results show that compared with the state-of-the-art local masking NTT, our scheme reduces the equivalent area and ATP overhead by more than 8.24 times and 6.74 times, respectively. In addition, a nonspecific <italic>t</i>-test analysis indicates no significant side-channel leakage.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"34 3","pages":"1062-1066"},"PeriodicalIF":3.1,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147280902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-13DOI: 10.1109/TVLSI.2026.3651307
Linjun Jiang;Yitong Zhou;He Zhang;Wang Kang
Analog computing-in-memory (ACIM) has garnered widespread attention due to its advantage of high energy efficiency. However, it faces large power and hardware costs to handle sophisticated nonlinear functions, such as the softmax, due to costly exponentiation and division. Existing digital-domain approaches often rely on dedicated modules to carry out these operations, leading to a cost expensive area and high-power consumption. To address the issues, we propose a self-calibrating analog circuitry for a softmax-scaled function with ACIM. By exploiting transistor subthreshold properties, the work eliminates expensive digital operations while mapping exponentiation and division to successive analog circuits. A self-calibration module further mitigates partial mismatch-induced deviations by dynamically tuning bias voltages, improving overall fitting accuracy and system robustness. The proposed softmax-enabled ACIM work achieves energy efficiency of 55.06–60.08 TOPS/W and 684.15 GOPS/mm2 at 4-bit precision. In comparison with the state-of-the-art ACIMs with softmax implications, our proposed work shows higher energy efficiency and area efficiency.
{"title":"Self-Calibrating Analog Circuitry for Softmax-Scaled Function With Analog Computing-In-Memory","authors":"Linjun Jiang;Yitong Zhou;He Zhang;Wang Kang","doi":"10.1109/TVLSI.2026.3651307","DOIUrl":"https://doi.org/10.1109/TVLSI.2026.3651307","url":null,"abstract":"Analog computing-in-memory (ACIM) has garnered widespread attention due to its advantage of high energy efficiency. However, it faces large power and hardware costs to handle sophisticated nonlinear functions, such as the softmax, due to costly exponentiation and division. Existing digital-domain approaches often rely on dedicated modules to carry out these operations, leading to a cost expensive area and high-power consumption. To address the issues, we propose a self-calibrating analog circuitry for a softmax-scaled function with ACIM. By exploiting transistor subthreshold properties, the work eliminates expensive digital operations while mapping exponentiation and division to successive analog circuits. A self-calibration module further mitigates partial mismatch-induced deviations by dynamically tuning bias voltages, improving overall fitting accuracy and system robustness. The proposed softmax-enabled ACIM work achieves energy efficiency of 55.06–60.08 TOPS/W and 684.15 GOPS/mm<sup>2</sup> at 4-bit precision. In comparison with the state-of-the-art ACIMs with softmax implications, our proposed work shows higher energy efficiency and area efficiency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"34 3","pages":"1067-1071"},"PeriodicalIF":3.1,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147280502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1109/TVLSI.2025.3646357
Byeong Yong Kong
In this article, a fault-tolerant architecture (FTA) is presented for the multiuser detector in interleave division multiple access (IDMA). The detector is inherently prone to soft errors, as its chip area is predominantly occupied by memories, which are easily exposed to high-energy particles and hostile interferences in harsh environments. One of the most widespread ways to protect memories is to encode their entries with error-correcting codes (ECCs). However, naïvely encoding all bits in an entry is likely to be costly and unnecessary. Accordingly, to sort out performance-critical bits and determine the priority of protection, we extensively scrutinize how vulnerable respective bits in the memories of the detector are too soft errors. Based on the analysis, in addition, an efficient FTA that selectively encodes only a subset of the bits in order of the identified vulnerability is developed. Furthermore, the proposed FTA implements the state-of-the-art multiuser detection (MUD) scheme called on-the-fly despreading (OD) and showcases a new feature named purification, which repeatedly replaces erroneous entries with corrected ones to keep them error-free. Complicated memory accesses to concurrently perform the OD as well as the purification are enabled by remodeling both the datapath and the control path of the baseline OD architecture (ODA). Implementation results demonstrate that, unlike the prior arts that fail to sustain near-optimal performances and become impractical even for a very low probability of soft error, the proposed FTA may operate robustly in a wide range of harsh conditions without incurring much overhead.
{"title":"Fault-Tolerant IDMA Multiuser Detector Based on Fault Injection Analysis of Internal Memories","authors":"Byeong Yong Kong","doi":"10.1109/TVLSI.2025.3646357","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3646357","url":null,"abstract":"In this article, a fault-tolerant architecture (FTA) is presented for the multiuser detector in interleave division multiple access (IDMA). The detector is inherently prone to soft errors, as its chip area is predominantly occupied by memories, which are easily exposed to high-energy particles and hostile interferences in harsh environments. One of the most widespread ways to protect memories is to encode their entries with error-correcting codes (ECCs). However, naïvely encoding all bits in an entry is likely to be costly and unnecessary. Accordingly, to sort out performance-critical bits and determine the priority of protection, we extensively scrutinize how vulnerable respective bits in the memories of the detector are too soft errors. Based on the analysis, in addition, an efficient FTA that selectively encodes only a subset of the bits in order of the identified vulnerability is developed. Furthermore, the proposed FTA implements the state-of-the-art multiuser detection (MUD) scheme called on-the-fly despreading (OD) and showcases a new feature named purification, which repeatedly replaces erroneous entries with corrected ones to keep them error-free. Complicated memory accesses to concurrently perform the OD as well as the purification are enabled by remodeling both the datapath and the control path of the baseline OD architecture (ODA). Implementation results demonstrate that, unlike the prior arts that fail to sustain near-optimal performances and become impractical even for a very low probability of soft error, the proposed FTA may operate robustly in a wide range of harsh conditions without incurring much overhead.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"34 3","pages":"1004-1016"},"PeriodicalIF":3.1,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147280587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1109/TVLSI.2025.3646232
Byungsoo Kim;Seung Ho Shin;Youngki Moon;Eugene Jeong;Sungho Kang
The massive computational requirements of large language model (LLMs) have increased the need for high-bandwidth memory (HBM), which involves high-volume data transfers. The high cell capacity of HBM results in extended test and repair times, leading to increased manufacturing costs. To reduce test time, a built- in self-repair (BISR) circuit, integrated into the HBM base die to detect and repair faults, tests multiple banks in parallel. Conventional BISR approaches adopt content-addressable memory (CAM) for fault classification to reduce repair time. However, dedicated CAM on each bank leads to substantial area overhead associated with its comparison logic. To address these issues, a novel BISR architecture that decouples fault classification and storage is proposed in this article. By introducing a linked CAM design with low area and sharing it across banks for fault classification, while small-area first-in first-out (FIFO) memories allocated to each bank store the classified fault information, the proposed architecture substantially reduces overall area overhead. Furthermore, the proposed architecture reorders the repair solution search sequence toward the most promising candidates by swapping fault entries during test idle periods, thereby significantly reducing repair time. Experimental results demonstrate that the proposed BISR architecture achieves low area overhead and fast repair time for high-density HBM.
{"title":"A Low Area Built-In Self-Repair Using Hybrid Fault Address Memory for HBM","authors":"Byungsoo Kim;Seung Ho Shin;Youngki Moon;Eugene Jeong;Sungho Kang","doi":"10.1109/TVLSI.2025.3646232","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3646232","url":null,"abstract":"The massive computational requirements of large language model (LLMs) have increased the need for high-bandwidth memory (HBM), which involves high-volume data transfers. The high cell capacity of HBM results in extended test and repair times, leading to increased manufacturing costs. To reduce test time, a built- in self-repair (BISR) circuit, integrated into the HBM base die to detect and repair faults, tests multiple banks in parallel. Conventional BISR approaches adopt content-addressable memory (CAM) for fault classification to reduce repair time. However, dedicated CAM on each bank leads to substantial area overhead associated with its comparison logic. To address these issues, a novel BISR architecture that decouples fault classification and storage is proposed in this article. By introducing a linked CAM design with low area and sharing it across banks for fault classification, while small-area first-in first-out (FIFO) memories allocated to each bank store the classified fault information, the proposed architecture substantially reduces overall area overhead. Furthermore, the proposed architecture reorders the repair solution search sequence toward the most promising candidates by swapping fault entries during test idle periods, thereby significantly reducing repair time. Experimental results demonstrate that the proposed BISR architecture achieves low area overhead and fast repair time for high-density HBM.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"34 3","pages":"991-1003"},"PeriodicalIF":3.1,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147280531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1109/TVLSI.2025.3641351
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3641351","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3641351","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"34 1","pages":"C3-C3"},"PeriodicalIF":3.1,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11318116","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145847796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1109/TVLSI.2025.3646936
Bin Qiang;Yiming Wei;Yongliang Zhou;Xiulong Wu;Chunyu Peng
Compute-in-memory (CIM) is increasingly recognized as an effective hardware accelerator for convolutional neural networks (CNNs). This work proposes a hybrid-domain CIM design using: 1) a multibit compute unit (MBCU) structure that realizes the multiplication operation of 2-bit input and 4-bit weight through the transistor-size-weighted capacitor discharge on the bitline; 2) a hybrid-domain quantization scheme (HDQS) of “time-domain + voltage-domain,” which integrates the high energy efficiency of time-domain quantization with the low-delay advantages of the voltage-domain quantization, and enhances the quantization accuracy through the combined effect of the process tracking module and the reference signal module; 3) the CIM circuit design, layout drawing and simulation verification of hybrid-domain static random access memory (SRAM) were realized by 28-nm CMOS technology, results show that the circuit supports 8-bit multiply–accumulate (MAC) operation, and full-precision quantization in the hybrid-domain form can achieve the optimal energy efficiency of 249.7 TOPS/W per bit at 0.7 V, and area efficiency of 4.29 TOPS/mm2 per bit. Furthermore, the integration of the circuits with the VGG-16 network has been demonstrated to yield an inference accuracy of 90.52% in the CIFAR-10 dataset.
{"title":"A Capacitor Discharge-Based SRAM CIM Macro Based on Hybrid-Domain for Convolutional Neural Networks","authors":"Bin Qiang;Yiming Wei;Yongliang Zhou;Xiulong Wu;Chunyu Peng","doi":"10.1109/TVLSI.2025.3646936","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3646936","url":null,"abstract":"Compute-in-memory (CIM) is increasingly recognized as an effective hardware accelerator for convolutional neural networks (CNNs). This work proposes a hybrid-domain CIM design using: 1) a multibit compute unit (MBCU) structure that realizes the multiplication operation of 2-bit input and 4-bit weight through the transistor-size-weighted capacitor discharge on the bitline; 2) a hybrid-domain quantization scheme (HDQS) of “time-domain + voltage-domain,” which integrates the high energy efficiency of time-domain quantization with the low-delay advantages of the voltage-domain quantization, and enhances the quantization accuracy through the combined effect of the process tracking module and the reference signal module; 3) the CIM circuit design, layout drawing and simulation verification of hybrid-domain static random access memory (SRAM) were realized by 28-nm CMOS technology, results show that the circuit supports 8-bit multiply–accumulate (MAC) operation, and full-precision quantization in the hybrid-domain form can achieve the optimal energy efficiency of 249.7 TOPS/W per bit at 0.7 V, and area efficiency of 4.29 TOPS/mm<sup>2</sup> per bit. Furthermore, the integration of the circuits with the VGG-16 network has been demonstrated to yield an inference accuracy of 90.52% in the CIFAR-10 dataset.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"34 3","pages":"1043-1047"},"PeriodicalIF":3.1,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147280893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1109/TVLSI.2025.3642611
Lowry P.-T. Wang;Charles H.-P. Wen
Single-event upsets (SEUs) pose a critical reliability threat in advanced automotive and space electronics. While existing SEU-tolerant latch designs, such as those based on C-elements and unique modules, often fail to meet the stringent space radiation standards (linear energy transfer (LET) $= 60~text {MeV} cdot text {cm}^{2}$ /mg) at advanced fin field-effect transistor (FinFET) technology nodes, triple modular redundancy (TMR) achieves sufficient tolerance but incurs significant overhead. To address these limitations, this brief introduces DN-FF, a novel detection-node flip-flop (DN-FF) architecture that leverages reduced node spacing in modern processes for complete SEU immunity with significantly reduced overhead compared to TMR. Incorporating strategically placed detection nodes (DNs) and a dedicated detection circuit (DC), DN-FF achieves robust radiation-hardness while significantly reducing physical area, delay, and power consumption compared to traditional TMR-based solutions. The experimental results demonstrate that DN-FF reduces area by 8.2%, delay by 18.4%, and power by 15.9%, delivering a 37% improvement in the overall area–delay–power quality (ADPQ) metric. These advantages make DN-FF a compact, high-performance, and reliable solution for demanding automotive and aerospace applications.
单事件故障(seu)对先进汽车和航天电子设备的可靠性构成严重威胁。虽然现有的容限seu锁存器设计,如基于c -元件和独特模块的锁存器设计,通常无法满足先进的fin场效应晶体管(FinFET)技术节点严格的空间辐射标准(线性能量转移(LET) $= 60~text {MeV} cdot text {cm}^{2}$ /mg),但三模冗余(TMR)实现了足够的容限,但会产生显着的开销。为了解决这些限制,本文简要介绍了DN-FF,这是一种新型的检测节点触发器(DN-FF)架构,与TMR相比,它利用现代工艺中减少的节点间距来实现完全的SEU抗扰性,同时显著降低了开销。与传统的基于tmr的解决方案相比,DN-FF结合了战略性放置的检测节点(dn)和专用检测电路(DC),实现了强大的辐射硬度,同时显著减少了物理面积、延迟和功耗。实验结果表明,DN-FF减少了8.2%的面积,18.4%的延迟,15.9%的功率,提供了37%的整体面积-延迟-功率质量(ADPQ)指标的改进。这些优势使DN-FF成为一种紧凑、高性能、可靠的解决方案,适用于要求苛刻的汽车和航空航天应用。
{"title":"DN-FF: A SEU-Tolerant Flip-Flop Design for Advanced Technology Nodes","authors":"Lowry P.-T. Wang;Charles H.-P. Wen","doi":"10.1109/TVLSI.2025.3642611","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3642611","url":null,"abstract":"Single-event upsets (SEUs) pose a critical reliability threat in advanced automotive and space electronics. While existing SEU-tolerant latch designs, such as those based on C-elements and unique modules, often fail to meet the stringent space radiation standards (linear energy transfer (LET) <inline-formula> <tex-math>$= 60~text {MeV} cdot text {cm}^{2}$ </tex-math></inline-formula>/mg) at advanced fin field-effect transistor (FinFET) technology nodes, triple modular redundancy (TMR) achieves sufficient tolerance but incurs significant overhead. To address these limitations, this brief introduces DN-FF, a novel detection-node flip-flop (DN-FF) architecture that leverages reduced node spacing in modern processes for complete SEU immunity with significantly reduced overhead compared to TMR. Incorporating strategically placed detection nodes (DNs) and a dedicated detection circuit (DC), DN-FF achieves robust radiation-hardness while significantly reducing physical area, delay, and power consumption compared to traditional TMR-based solutions. The experimental results demonstrate that DN-FF reduces area by 8.2%, delay by 18.4%, and power by 15.9%, delivering a 37% improvement in the overall area–delay–power quality (ADPQ) metric. These advantages make DN-FF a compact, high-performance, and reliable solution for demanding automotive and aerospace applications.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"34 3","pages":"1048-1052"},"PeriodicalIF":3.1,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147280522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}