A comprehensive exploration of approximate DNN models with a novel floating-point simulation framework

IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Performance Evaluation Pub Date : 2024-05-25 DOI:10.1016/j.peva.2024.102423
Myeongjin Kwak, Jeonggeun Kim, Yongtae Kim
{"title":"A comprehensive exploration of approximate DNN models with a novel floating-point simulation framework","authors":"Myeongjin Kwak,&nbsp;Jeonggeun Kim,&nbsp;Yongtae Kim","doi":"10.1016/j.peva.2024.102423","DOIUrl":null,"url":null,"abstract":"<div><p>This paper introduces <em>TorchAxf</em><span><sup>1</sup></span>, a framework for fast simulation of diverse approximate deep neural network (DNN) models, including spiking neural networks (SNNs). The proposed framework utilizes various approximate adders and multipliers, supports industrial standard reduced precision floating-point formats, such as <span>bfloat16</span>, and accommodates user-customized precision representations. Leveraging GPU acceleration on the PyTorch framework, <em>TorchAxf</em> accelerates approximate DNN training and inference. In addition, it allows seamless integration of arbitrary approximate arithmetic algorithms with C/C++ behavioral models to emulate approximate DNN hardware accelerators.</p><p>We utilize the proposed <em>TorchAxf</em> framework to assess twelve popular DNN models under approximate multiply-and-accumulate (MAC) operations. Through comprehensive experiments, we determine the suitable degree of floating-point arithmetic approximation for these DNN models without significant accuracy loss and offer the optimal reduced precision formats for each DNN model. Additionally, we demonstrate that approximate-aware re-training can rectify errors and enhance pre-trained DNN models under reduced precision formats. Furthermore, <em>TorchAxf</em>, operating on GPU, remarkably reduces simulation time for complex DNN models using approximate arithmetic by up to 131.38<span><math><mo>×</mo></math></span> compared to the baseline optimized CPU implementation. Finally, we compare the proposed framework with state-of-the-art frameworks to highlight its superiority.</p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"165 ","pages":"Article 102423"},"PeriodicalIF":1.0000,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Performance Evaluation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0166531624000282","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

This paper introduces TorchAxf1, a framework for fast simulation of diverse approximate deep neural network (DNN) models, including spiking neural networks (SNNs). The proposed framework utilizes various approximate adders and multipliers, supports industrial standard reduced precision floating-point formats, such as bfloat16, and accommodates user-customized precision representations. Leveraging GPU acceleration on the PyTorch framework, TorchAxf accelerates approximate DNN training and inference. In addition, it allows seamless integration of arbitrary approximate arithmetic algorithms with C/C++ behavioral models to emulate approximate DNN hardware accelerators.

We utilize the proposed TorchAxf framework to assess twelve popular DNN models under approximate multiply-and-accumulate (MAC) operations. Through comprehensive experiments, we determine the suitable degree of floating-point arithmetic approximation for these DNN models without significant accuracy loss and offer the optimal reduced precision formats for each DNN model. Additionally, we demonstrate that approximate-aware re-training can rectify errors and enhance pre-trained DNN models under reduced precision formats. Furthermore, TorchAxf, operating on GPU, remarkably reduces simulation time for complex DNN models using approximate arithmetic by up to 131.38× compared to the baseline optimized CPU implementation. Finally, we compare the proposed framework with state-of-the-art frameworks to highlight its superiority.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用新型浮点模拟框架全面探索近似 DNN 模型
本文介绍了用于快速模拟各种近似深度神经网络(DNN)模型(包括尖峰神经网络(SNN))的框架 TorchAxf1。拟议的框架利用各种近似加法器和乘法器,支持工业标准的降低精度浮点格式(如 bfloat16),并可容纳用户定制的精度表示。利用 PyTorch 框架上的 GPU 加速,TorchAxf 加快了近似 DNN 的训练和推理。此外,它还允许将任意近似算术算法与 C/C++ 行为模型无缝集成,以模拟近似 DNN 硬件加速器。我们利用提出的 TorchAxf 框架评估了近似乘法累加(MAC)操作下的 12 种流行 DNN 模型。通过全面的实验,我们确定了这些 DNN 模型的浮点算术近似程度,而不会造成显著的精度损失,并为每个 DNN 模型提供了最佳的精度降低格式。此外,我们还证明了近似感知再训练可以纠正错误,并在降低精度格式下增强预训练的 DNN 模型。此外,在 GPU 上运行的 TorchAxf,使用近似算法对复杂 DNN 模型进行仿真的时间比基准优化 CPU 实现显著缩短了 131.38 倍。最后,我们将所提出的框架与最先进的框架进行了比较,以突出其优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Performance Evaluation
Performance Evaluation 工程技术-计算机:理论方法
CiteScore
3.10
自引率
0.00%
发文量
20
审稿时长
24 days
期刊介绍: Performance Evaluation functions as a leading journal in the area of modeling, measurement, and evaluation of performance aspects of computing and communication systems. As such, it aims to present a balanced and complete view of the entire Performance Evaluation profession. Hence, the journal is interested in papers that focus on one or more of the following dimensions: -Define new performance evaluation tools, including measurement and monitoring tools as well as modeling and analytic techniques -Provide new insights into the performance of computing and communication systems -Introduce new application areas where performance evaluation tools can play an important role and creative new uses for performance evaluation tools. More specifically, common application areas of interest include the performance of: -Resource allocation and control methods and algorithms (e.g. routing and flow control in networks, bandwidth allocation, processor scheduling, memory management) -System architecture, design and implementation -Cognitive radio -VANETs -Social networks and media -Energy efficient ICT -Energy harvesting -Data centers -Data centric networks -System reliability -System tuning and capacity planning -Wireless and sensor networks -Autonomic and self-organizing systems -Embedded systems -Network science
期刊最新文献
Analysis of a queue-length-dependent vacation queue with bulk service, N-policy, set-up time and cost optimization FedCust: Offloading hyperparameter customization for federated learning Trust your local scaler: A continuous, decentralized approach to autoscaling Enabling scalable and adaptive machine learning training via serverless computing on public cloud Symbolic state-space exploration meets statistical model checking
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1