{"title":"Detect and Replace: Efficient Soft Error Protection of FPGA-Based CNN Accelerators","authors":"Zhen Gao;Yanmao Qi;Jinchang Shi;Qiang Liu;Guangjun Ge;Yu Wang;Pedro Reviriego","doi":"10.1109/TVLSI.2024.3443834","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) are widely used in computer vision and natural language processing. Field-programmable gate arrays (FPGAs) are a popular accelerator for CNNs. However, FPGAs are prone to suffer soft errors, so the reliability of FPGA-based CNNs becomes a key problem when used in safety-critical applications. The convolution module based on a processing element (PE) array is the most complex part of the accelerator, so it is the key to efficient protection. Coding-based schemes have been proposed for efficient protection of the convolution module, where the processing of the PE array is modeled as parallel matrix-vector multiplications (MVMs), and every wrong output would be concurrently detected and corrected. However, these schemes cannot deal with errors in the configuration memory that affects many intermediate results. In this article, a protection scheme is proposed based on faulty PE detection and replace (DR) to deal with such configuration memory errors. The DR scheme is implemented on a CNN accelerator based on Xilinx Zynq 7000 SoC, and fault injection (FI) experiments are performed to evaluate the performance of the proposed DR scheme. The results show that it can effectively mitigate the effect of soft errors in the configuration memory with an overhead of about 1.3 times complexity and 1.4 times power consumption relative to those of the unprotected PE array. Compared with the advanced checksum-of-checksum (CoC) scheme, the DR scheme decreases power consumption by up to 30%.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"66-74"},"PeriodicalIF":3.1000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10648661/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Convolutional neural networks (CNNs) are widely used in computer vision and natural language processing. Field-programmable gate arrays (FPGAs) are a popular accelerator for CNNs. However, FPGAs are prone to suffer soft errors, so the reliability of FPGA-based CNNs becomes a key problem when used in safety-critical applications. The convolution module based on a processing element (PE) array is the most complex part of the accelerator, so it is the key to efficient protection. Coding-based schemes have been proposed for efficient protection of the convolution module, where the processing of the PE array is modeled as parallel matrix-vector multiplications (MVMs), and every wrong output would be concurrently detected and corrected. However, these schemes cannot deal with errors in the configuration memory that affects many intermediate results. In this article, a protection scheme is proposed based on faulty PE detection and replace (DR) to deal with such configuration memory errors. The DR scheme is implemented on a CNN accelerator based on Xilinx Zynq 7000 SoC, and fault injection (FI) experiments are performed to evaluate the performance of the proposed DR scheme. The results show that it can effectively mitigate the effect of soft errors in the configuration memory with an overhead of about 1.3 times complexity and 1.4 times power consumption relative to those of the unprotected PE array. Compared with the advanced checksum-of-checksum (CoC) scheme, the DR scheme decreases power consumption by up to 30%.
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.