{"title":"A comprehensive exploration of approximate DNN models with a novel floating-point simulation framework","authors":"Myeongjin Kwak, Jeonggeun Kim, Yongtae Kim","doi":"10.1016/j.peva.2024.102423","DOIUrl":null,"url":null,"abstract":"<div><p>This paper introduces <em>TorchAxf</em><span><sup>1</sup></span>, a framework for fast simulation of diverse approximate deep neural network (DNN) models, including spiking neural networks (SNNs). The proposed framework utilizes various approximate adders and multipliers, supports industrial standard reduced precision floating-point formats, such as <span>bfloat16</span>, and accommodates user-customized precision representations. Leveraging GPU acceleration on the PyTorch framework, <em>TorchAxf</em> accelerates approximate DNN training and inference. In addition, it allows seamless integration of arbitrary approximate arithmetic algorithms with C/C++ behavioral models to emulate approximate DNN hardware accelerators.</p><p>We utilize the proposed <em>TorchAxf</em> framework to assess twelve popular DNN models under approximate multiply-and-accumulate (MAC) operations. Through comprehensive experiments, we determine the suitable degree of floating-point arithmetic approximation for these DNN models without significant accuracy loss and offer the optimal reduced precision formats for each DNN model. Additionally, we demonstrate that approximate-aware re-training can rectify errors and enhance pre-trained DNN models under reduced precision formats. Furthermore, <em>TorchAxf</em>, operating on GPU, remarkably reduces simulation time for complex DNN models using approximate arithmetic by up to 131.38<span><math><mo>×</mo></math></span> compared to the baseline optimized CPU implementation. Finally, we compare the proposed framework with state-of-the-art frameworks to highlight its superiority.</p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"165 ","pages":"Article 102423"},"PeriodicalIF":1.0000,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Performance Evaluation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0166531624000282","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper introduces TorchAxf1, a framework for fast simulation of diverse approximate deep neural network (DNN) models, including spiking neural networks (SNNs). The proposed framework utilizes various approximate adders and multipliers, supports industrial standard reduced precision floating-point formats, such as bfloat16, and accommodates user-customized precision representations. Leveraging GPU acceleration on the PyTorch framework, TorchAxf accelerates approximate DNN training and inference. In addition, it allows seamless integration of arbitrary approximate arithmetic algorithms with C/C++ behavioral models to emulate approximate DNN hardware accelerators.
We utilize the proposed TorchAxf framework to assess twelve popular DNN models under approximate multiply-and-accumulate (MAC) operations. Through comprehensive experiments, we determine the suitable degree of floating-point arithmetic approximation for these DNN models without significant accuracy loss and offer the optimal reduced precision formats for each DNN model. Additionally, we demonstrate that approximate-aware re-training can rectify errors and enhance pre-trained DNN models under reduced precision formats. Furthermore, TorchAxf, operating on GPU, remarkably reduces simulation time for complex DNN models using approximate arithmetic by up to 131.38 compared to the baseline optimized CPU implementation. Finally, we compare the proposed framework with state-of-the-art frameworks to highlight its superiority.
期刊介绍:
Performance Evaluation functions as a leading journal in the area of modeling, measurement, and evaluation of performance aspects of computing and communication systems. As such, it aims to present a balanced and complete view of the entire Performance Evaluation profession. Hence, the journal is interested in papers that focus on one or more of the following dimensions:
-Define new performance evaluation tools, including measurement and monitoring tools as well as modeling and analytic techniques
-Provide new insights into the performance of computing and communication systems
-Introduce new application areas where performance evaluation tools can play an important role and creative new uses for performance evaluation tools.
More specifically, common application areas of interest include the performance of:
-Resource allocation and control methods and algorithms (e.g. routing and flow control in networks, bandwidth allocation, processor scheduling, memory management)
-System architecture, design and implementation
-Cognitive radio
-VANETs
-Social networks and media
-Energy efficient ICT
-Energy harvesting
-Data centers
-Data centric networks
-System reliability
-System tuning and capacity planning
-Wireless and sensor networks
-Autonomic and self-organizing systems
-Embedded systems
-Network science