Killi: Runtime Fault Classification to Deploy Low Voltage Caches without MBIST

Shrikanth Ganapathy, J. Kalamatianos, Bradford M. Beckmann, Steven E. Raasch, Lukasz G. Szafaryn
{"title":"Killi: Runtime Fault Classification to Deploy Low Voltage Caches without MBIST","authors":"Shrikanth Ganapathy, J. Kalamatianos, Bradford M. Beckmann, Steven E. Raasch, Lukasz G. Szafaryn","doi":"10.1109/HPCA.2019.00046","DOIUrl":null,"url":null,"abstract":"Supply voltage (VDD) scaling is one of the most effective mechanisms to reduce energy consumption in highperformance microprocessors. However, VDD scaling is challenging for SRAM-based on-chip memories such as caches due to persistent failures at low voltage (LV). Previously designed LV-enabling mechanisms require additional Memory Built-in Self-Test (MBIST) steps, employed either offline or online to identify persistent failures for every LV operating mode. However, these additional MBIST steps are time consuming, resulting in extended boot time or delayed power state transitions. Furthermore, most prior techniques combine MBIST-based solutions with customized Error Correction Codes (ECC), which suffer from non-trivial area or performance overheads. In this paper, we highlight the practical challenges for deploying LV techniques and propose a new low-cost error protection scheme, called Killi, which leverages conventional ECC and parity to enable LV operation. Foremost, the failing lines are discovered dynamically at runtime using both parity and ECC, negating the need for extra MBIST testing. Killi then provides on demand error protection by decoupling cheap error detection from expensive error correction. Killi provides error detection capability to all lines using parity but employs Single Error Correction, Double Error Detection (SECDED) ECC for a subset of the lines with a single LV fault. All lines with more than one fault are disabled. We evaluate this completely hardware enclosed solution on a GPU write-through L2 cache and show that the Vmin (minimum reliable VDD) can be reduced to 62.5% of nominal VDD when operating at 1GHz with only a maximum of 0.8% performance degradation. As a result, an 8CU GPU with Killi can reduce the power consumption of the L2 cache by 59.3% compared to the baseline L2 cache running at nominal VDD. In addition, Killi reduces the error protection area overhead by 50% compared to SECDED ECC. Keywords—cache, energy-efficiency, GPU, low voltage,","PeriodicalId":102050,"journal":{"name":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2019.00046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Supply voltage (VDD) scaling is one of the most effective mechanisms to reduce energy consumption in highperformance microprocessors. However, VDD scaling is challenging for SRAM-based on-chip memories such as caches due to persistent failures at low voltage (LV). Previously designed LV-enabling mechanisms require additional Memory Built-in Self-Test (MBIST) steps, employed either offline or online to identify persistent failures for every LV operating mode. However, these additional MBIST steps are time consuming, resulting in extended boot time or delayed power state transitions. Furthermore, most prior techniques combine MBIST-based solutions with customized Error Correction Codes (ECC), which suffer from non-trivial area or performance overheads. In this paper, we highlight the practical challenges for deploying LV techniques and propose a new low-cost error protection scheme, called Killi, which leverages conventional ECC and parity to enable LV operation. Foremost, the failing lines are discovered dynamically at runtime using both parity and ECC, negating the need for extra MBIST testing. Killi then provides on demand error protection by decoupling cheap error detection from expensive error correction. Killi provides error detection capability to all lines using parity but employs Single Error Correction, Double Error Detection (SECDED) ECC for a subset of the lines with a single LV fault. All lines with more than one fault are disabled. We evaluate this completely hardware enclosed solution on a GPU write-through L2 cache and show that the Vmin (minimum reliable VDD) can be reduced to 62.5% of nominal VDD when operating at 1GHz with only a maximum of 0.8% performance degradation. As a result, an 8CU GPU with Killi can reduce the power consumption of the L2 cache by 59.3% compared to the baseline L2 cache running at nominal VDD. In addition, Killi reduces the error protection area overhead by 50% compared to SECDED ECC. Keywords—cache, energy-efficiency, GPU, low voltage,
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Killi:运行时故障分类,部署无MBIST的低压缓存
在高性能微处理器中,电源电压(VDD)缩放是降低能耗的最有效机制之一。然而,对于基于sram的片上存储器(如缓存)来说,由于低电压(LV)下的持续故障,VDD扩展是具有挑战性的。以前设计的LV启用机制需要额外的内存内置自检(MBIST)步骤,可以离线或在线使用,以识别每个LV工作模式的持续故障。然而,这些额外的MBIST步骤非常耗时,导致引导时间延长或电源状态转换延迟。此外,大多数先前的技术将基于mbist的解决方案与定制的纠错码(ECC)结合在一起,这会带来不小的面积或性能开销。在本文中,我们强调了部署低压技术的实际挑战,并提出了一种新的低成本错误保护方案,称为Killi,它利用传统的ECC和奇偶性来实现低压操作。最重要的是,在运行时使用奇偶校验和ECC动态地发现故障行,从而不需要额外的MBIST测试。然后,Killi通过将廉价的错误检测与昂贵的错误纠正分离,提供按需错误保护。Killi为所有使用奇偶校验的线路提供错误检测能力,但对具有单个低压故障的线路子集采用单错误校正,双错误检测(SECDED) ECC。所有有一个以上故障的线路都被禁用。我们在GPU write-through L2缓存上评估了这种完全硬件封闭的解决方案,并表明当工作在1GHz时,Vmin(最小可靠VDD)可以降低到名义VDD的62.5%,而性能下降最多仅为0.8%。因此,与在标称VDD下运行的基准L2缓存相比,带有Killi的8CU GPU可以将L2缓存的功耗降低59.3%。此外,与SECDED ECC相比,Killi减少了50%的错误保护区域开销。关键词:缓存,能效,GPU,低电压,
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Machine Learning at Facebook: Understanding Inference at the Edge Understanding the Future of Energy Efficiency in Multi-Module GPUs POWERT Channels: A Novel Class of Covert CommunicationExploiting Power Management Vulnerabilities The Accelerator Wall: Limits of Chip Specialization Featherlight Reuse-Distance Measurement
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1