Towards spatial fault resilience in array processors

2012 IEEE 30th VLSI Test Symposium (VTS) Pub Date : 2012-04-23 DOI:10.1109/VTS.2012.6231068

S. Sindia, V. Agrawal

{"title":"Towards spatial fault resilience in array processors","authors":"S. Sindia, V. Agrawal","doi":"10.1109/VTS.2012.6231068","DOIUrl":null,"url":null,"abstract":"Computing with large die-size graphical processors (that need huge arrays of identical structures) in the late CMOS era is abounding with challenges due to spatial non-idealities arising from chip-to-chip and within-chip variation of MOSFET threshold voltage. In this paper, we propose a machine learning based software-framework for in-situ prediction and correction of computation corrupted due to threshold voltage variation of transistors. Based on semi-supervised training imparted to a fully connected cascade feed-forward neural network (FCCFF-NN), the NN makes an accurate prediction of the underlying hardware, creating a spatial map of faulty processing elements (PE). The faulty elements identified by the NN are avoided in future computing. Further, any transient faults occurring over and above these spatial faults are tracked, and corrected if the number of PEs involved in a particle strike is above a preset threshold. For the purposes of experimental validation, we consider a 256 × 256 array of PE. Each PE is comprised of a multiply-accumulate (MAC) block with three 8 bit registers (two for inputs and one for storing the computed result). One thousand instances of this processor array are created and PEs in each instance are randomly perturbed with threshold voltage variation. Common image processing operations such as low pass filtering and edge enhancement are performed on each of these 1000 instances. A fraction of these images (about 10%) is used to train the NN for spatial non-idealities. Based on this training, the NN is able to accurately predict the spatial extremities in 95% of all the remaining 90% of the cases. The proposed NN based error tolerance results in superior quality images whose degradation is no longer visually perceptible.","PeriodicalId":169611,"journal":{"name":"2012 IEEE 30th VLSI Test Symposium (VTS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 30th VLSI Test Symposium (VTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VTS.2012.6231068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Computing with large die-size graphical processors (that need huge arrays of identical structures) in the late CMOS era is abounding with challenges due to spatial non-idealities arising from chip-to-chip and within-chip variation of MOSFET threshold voltage. In this paper, we propose a machine learning based software-framework for in-situ prediction and correction of computation corrupted due to threshold voltage variation of transistors. Based on semi-supervised training imparted to a fully connected cascade feed-forward neural network (FCCFF-NN), the NN makes an accurate prediction of the underlying hardware, creating a spatial map of faulty processing elements (PE). The faulty elements identified by the NN are avoided in future computing. Further, any transient faults occurring over and above these spatial faults are tracked, and corrected if the number of PEs involved in a particle strike is above a preset threshold. For the purposes of experimental validation, we consider a 256 × 256 array of PE. Each PE is comprised of a multiply-accumulate (MAC) block with three 8 bit registers (two for inputs and one for storing the computed result). One thousand instances of this processor array are created and PEs in each instance are randomly perturbed with threshold voltage variation. Common image processing operations such as low pass filtering and edge enhancement are performed on each of these 1000 instances. A fraction of these images (about 10%) is used to train the NN for spatial non-idealities. Based on this training, the NN is able to accurately predict the spatial extremities in 95% of all the remaining 90% of the cases. The proposed NN based error tolerance results in superior quality images whose degradation is no longer visually perceptible.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

阵列处理器空间故障恢复研究

在CMOS时代晚期，由于片与片之间和片内MOSFET阈值电压的变化所引起的空间非理想性，使用大尺寸图形处理器(需要巨大的相同结构阵列)进行计算充满了挑战。在本文中，我们提出了一个基于机器学习的软件框架，用于现场预测和校正由于晶体管阈值电压变化而导致的计算误差。基于对全连接级联前馈神经网络(FCCFF-NN)的半监督训练，神经网络对底层硬件进行准确预测，创建故障处理元素(PE)的空间图。在以后的计算中避免了神经网络识别出的故障元素。此外，在这些空间故障之上发生的任何瞬态故障都会被跟踪，如果粒子撞击中涉及的pe数量超过预设阈值，则会进行纠正。为了实验验证的目的，我们考虑一个256 × 256的PE阵列。每个PE由一个带有三个8位寄存器的乘法累加(MAC)块组成(两个用于输入，一个用于存储计算结果)。该处理器阵列创建了1000个实例，每个实例中的pe随阈值电压变化而随机扰动。在这1000个实例中的每一个上执行常见的图像处理操作，例如低通滤波和边缘增强。这些图像的一部分(约10%)用于训练神经网络的空间非理想性。在此训练的基础上，神经网络能够准确地预测剩余90%案例中95%的空间端点。所提出的基于神经网络的误差容忍度可以产生高质量的图像，其退化不再是视觉上可感知的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2012 IEEE 30th VLSI Test Symposium (VTS)

自引率

0.00%

发文量

期刊最新文献

Derating based hardware optimizations in soft error tolerant designs Exploiting X-correlation in output compression via superset X-canceling SAT-ATPG using preferences for improved detection of complex defect mechanisms Smart selection of indirect parameters for DC-based alternate RF IC testing Write-through method for embedded memory with compression Scan-based testing