ZEBRA：利用不同比特模式实现神经网络加速的零比特稳健累积计算内存方法

2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2024-01-22 DOI:10.1109/ASP-DAC58780.2024.10473851

Yiming Chen, Guodong Yin, Hongtao Zhong, Ming-En Lee, Huazhong Yang, Sumitha George, Vijaykrishnan Narayanan, Xueqing Li

{"title":"ZEBRA：利用不同比特模式实现神经网络加速的零比特稳健累积计算内存方法","authors":"Yiming Chen, Guodong Yin, Hongtao Zhong, Ming-En Lee, Huazhong Yang, Sumitha George, Vijaykrishnan Narayanan, Xueqing Li","doi":"10.1109/ASP-DAC58780.2024.10473851","DOIUrl":null,"url":null,"abstract":"Deploying a lightweight quantized model in compute-in-memory (CIM) might result in significant accuracy degradation due to reduced signal-noise rate (SNR). To address this issue, this paper presents ZEBRA, a zero-bit robust-accumulation CIM approach, which utilizes bitwise zero patterns to compress computation with ultra-high resilience against noise due to circuit non-idealities, etc. First, ZEBRA provides a cross-level design that successfully exploits value-adaptive zero-bit patterns to improve the performance in robust 8-bit quantization dramatically. Second, ZEBRA presents a multi-level local computing unit circuit design to implement the bitwise sparsity pattern, which boosts the area/energy efficiency by 2x-4x compared with existing CIM works. Experiments demonstrate that ZEBRA can achieve <1.0% accuracy loss in CIFAR10/100 with typical noise, while conventional CIM works suffer from > 10% accuracy loss. Such robustness leads to much more stable accuracy for high-parallelism inference on large models in practice.","PeriodicalId":518586,"journal":{"name":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"68 3","pages":"153-158"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ZEBRA: A Zero-Bit Robust-Accumulation Compute-In-Memory Approach for Neural Network Acceleration Utilizing Different Bitwise Patterns\",\"authors\":\"Yiming Chen, Guodong Yin, Hongtao Zhong, Ming-En Lee, Huazhong Yang, Sumitha George, Vijaykrishnan Narayanan, Xueqing Li\",\"doi\":\"10.1109/ASP-DAC58780.2024.10473851\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deploying a lightweight quantized model in compute-in-memory (CIM) might result in significant accuracy degradation due to reduced signal-noise rate (SNR). To address this issue, this paper presents ZEBRA, a zero-bit robust-accumulation CIM approach, which utilizes bitwise zero patterns to compress computation with ultra-high resilience against noise due to circuit non-idealities, etc. First, ZEBRA provides a cross-level design that successfully exploits value-adaptive zero-bit patterns to improve the performance in robust 8-bit quantization dramatically. Second, ZEBRA presents a multi-level local computing unit circuit design to implement the bitwise sparsity pattern, which boosts the area/energy efficiency by 2x-4x compared with existing CIM works. Experiments demonstrate that ZEBRA can achieve <1.0% accuracy loss in CIFAR10/100 with typical noise, while conventional CIM works suffer from > 10% accuracy loss. Such robustness leads to much more stable accuracy for high-parallelism inference on large models in practice.\",\"PeriodicalId\":518586,\"journal\":{\"name\":\"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"volume\":\"68 3\",\"pages\":\"153-158\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASP-DAC58780.2024.10473851\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASP-DAC58780.2024.10473851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在内存计算（CIM）中部署轻量级量化模型可能会因信噪比（SNR）降低而导致精度显著下降。为解决这一问题，本文提出了零位稳健累积 CIM 方法 ZEBRA，该方法利用顺位零模式压缩计算，具有超高的抗噪声能力，可抵御电路非理想性等造成的噪声。首先，ZEBRA 提供了一种跨级设计，成功地利用了值自适应零位模式，显著提高了稳健 8 位量化的性能。其次，ZEBRA 提出了一种多级本地计算单元电路设计来实现位向稀疏性模式，与现有的 CIM 作品相比，其面积/能效提高了 2 倍至 4 倍。实验证明，ZEBRA 可以实现 10% 的精度损失。这种鲁棒性为大型模型的高并行性推理带来了更稳定的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ZEBRA: A Zero-Bit Robust-Accumulation Compute-In-Memory Approach for Neural Network Acceleration Utilizing Different Bitwise Patterns

Deploying a lightweight quantized model in compute-in-memory (CIM) might result in significant accuracy degradation due to reduced signal-noise rate (SNR). To address this issue, this paper presents ZEBRA, a zero-bit robust-accumulation CIM approach, which utilizes bitwise zero patterns to compress computation with ultra-high resilience against noise due to circuit non-idealities, etc. First, ZEBRA provides a cross-level design that successfully exploits value-adaptive zero-bit patterns to improve the performance in robust 8-bit quantization dramatically. Second, ZEBRA presents a multi-level local computing unit circuit design to implement the bitwise sparsity pattern, which boosts the area/energy efficiency by 2x-4x compared with existing CIM works. Experiments demonstrate that ZEBRA can achieve <1.0% accuracy loss in CIFAR10/100 with typical noise, while conventional CIM works suffer from > 10% accuracy loss. Such robustness leads to much more stable accuracy for high-parallelism inference on large models in practice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)

自引率

0.00%

发文量