LRMP: Layer Replication with Mixed Precision for spatial in-memory DNN accelerators.

IF 3 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Frontiers in Artificial Intelligence Pub Date : 2024-10-04 eCollection Date: 2024-01-01 DOI:10.3389/frai.2024.1268317
Abinand Nallathambi, Christin David Bose, Wilfried Haensch, Anand Raghunathan
{"title":"LRMP: Layer Replication with Mixed Precision for spatial in-memory DNN accelerators.","authors":"Abinand Nallathambi, Christin David Bose, Wilfried Haensch, Anand Raghunathan","doi":"10.3389/frai.2024.1268317","DOIUrl":null,"url":null,"abstract":"<p><p>In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained IMC accelerators. LRMP uses a combination of reinforcement learning and mixed integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.6-9.3× latency and 8-18× throughput improvement at minimal (<1%) degradation in accuracy.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"7 ","pages":"1268317"},"PeriodicalIF":3.0000,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11486753/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2024.1268317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained IMC accelerators. LRMP uses a combination of reinforcement learning and mixed integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.6-9.3× latency and 8-18× throughput improvement at minimal (<1%) degradation in accuracy.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LRMP:用于空间内存 DNN 加速器的混合精度层复制。
采用非易失性存储器(NVM)的内存计算(IMC)已成为解决深度神经网络(DNN)快速增长的计算需求的一种有前途的方法。将 DNN 层空间映射到基于 NVM 的 IMC 加速器上可实现高度并行性。然而,这种方法面临两个挑战,一是层处理时间分布极不均匀,二是面积要求高。我们提出了 LRMP,一种联合应用层复制和混合精度量化的方法,以提高 DNN 映射到面积受限的 IMC 加速器上时的性能。LRMP 结合强化学习和混合整数线性编程,使用与目标硬件架构密切相关的模型搜索复制-量化设计空间。在五项 DNN 基准测试中,LRMP 以最小(0.1%)的速度实现了 2.6-9.3 倍的延迟和 8-18 倍的吞吐量改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.10
自引率
2.50%
发文量
272
审稿时长
13 weeks
期刊最新文献
Advancing smart city factories: enhancing industrial mechanical operations via deep learning techniques. Inpainting of damaged temple murals using edge- and line-guided diffusion patch GAN. Catalyzing IVF outcome prediction: exploring advanced machine learning paradigms for enhanced success rate prognostication. Predicting patient reported outcome measures: a scoping review for the artificial intelligence-guided patient preference predictor. A generative AI-driven interactive listening assessment task.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1