Mohammad Hassani Sadi, Chirag Sudarshan, Norbert Wehn
{"title":"用于 8 位浮点 DNN 训练的新型自适应量化方法","authors":"Mohammad Hassani Sadi, Chirag Sudarshan, Norbert Wehn","doi":"10.1007/s10617-024-09282-2","DOIUrl":null,"url":null,"abstract":"<p>There is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer from a limited dynamic range, resulting in reduced accuracy. In this paper, a novel 8-bit Floating Point (FP8) data format quantized DNN training methodology is presented, which adapts to the required dynamic range on-the-fly. Our methodology relies on varying the bias values of FP8 format to fit the dynamic range to the required range of DNN parameters and input feature maps. The range fitting during the training is adaptively performed by an online statistical analysis hardware unit without stalling the computation units or its data accesses. Our approach is compatible with any DNN compute cores without any major modifications to the architecture. We propose to integrate the new FP8 quantization unit in the memory controller. The FP32 data from the compute core are converted to FP8 in the memory controller before writing to the DRAM and converted back after reading the data from DRAM. Our results show that the DRAM access energy is reduced by 3.07<span>\\(\\times \\)</span> while using an 8-bit data format instead of using 32-bit. The accuracy loss of the proposed methodology with 8-bit quantized training is <span>\\(\\approx 1\\%\\)</span> for various networks with image and natural language processing datasets.\n</p>","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"41 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Novel adaptive quantization methodology for 8-bit floating-point DNN training\",\"authors\":\"Mohammad Hassani Sadi, Chirag Sudarshan, Norbert Wehn\",\"doi\":\"10.1007/s10617-024-09282-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>There is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer from a limited dynamic range, resulting in reduced accuracy. In this paper, a novel 8-bit Floating Point (FP8) data format quantized DNN training methodology is presented, which adapts to the required dynamic range on-the-fly. Our methodology relies on varying the bias values of FP8 format to fit the dynamic range to the required range of DNN parameters and input feature maps. The range fitting during the training is adaptively performed by an online statistical analysis hardware unit without stalling the computation units or its data accesses. Our approach is compatible with any DNN compute cores without any major modifications to the architecture. We propose to integrate the new FP8 quantization unit in the memory controller. The FP32 data from the compute core are converted to FP8 in the memory controller before writing to the DRAM and converted back after reading the data from DRAM. Our results show that the DRAM access energy is reduced by 3.07<span>\\\\(\\\\times \\\\)</span> while using an 8-bit data format instead of using 32-bit. The accuracy loss of the proposed methodology with 8-bit quantized training is <span>\\\\(\\\\approx 1\\\\%\\\\)</span> for various networks with image and natural language processing datasets.\\n</p>\",\"PeriodicalId\":50594,\"journal\":{\"name\":\"Design Automation for Embedded Systems\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2024-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Design Automation for Embedded Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10617-024-09282-2\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Design Automation for Embedded Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10617-024-09282-2","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Novel adaptive quantization methodology for 8-bit floating-point DNN training
There is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer from a limited dynamic range, resulting in reduced accuracy. In this paper, a novel 8-bit Floating Point (FP8) data format quantized DNN training methodology is presented, which adapts to the required dynamic range on-the-fly. Our methodology relies on varying the bias values of FP8 format to fit the dynamic range to the required range of DNN parameters and input feature maps. The range fitting during the training is adaptively performed by an online statistical analysis hardware unit without stalling the computation units or its data accesses. Our approach is compatible with any DNN compute cores without any major modifications to the architecture. We propose to integrate the new FP8 quantization unit in the memory controller. The FP32 data from the compute core are converted to FP8 in the memory controller before writing to the DRAM and converted back after reading the data from DRAM. Our results show that the DRAM access energy is reduced by 3.07\(\times \) while using an 8-bit data format instead of using 32-bit. The accuracy loss of the proposed methodology with 8-bit quantized training is \(\approx 1\%\) for various networks with image and natural language processing datasets.
期刊介绍:
Embedded (electronic) systems have become the electronic engines of modern consumer and industrial devices, from automobiles to satellites, from washing machines to high-definition TVs, and from cellular phones to complete base stations. These embedded systems encompass a variety of hardware and software components which implement a wide range of functions including digital, analog and RF parts.
Although embedded systems have been designed for decades, the systematic design of such systems with well defined methodologies, automation tools and technologies has gained attention primarily in the last decade. Advances in silicon technology and increasingly demanding applications have significantly expanded the scope and complexity of embedded systems. These systems are only now becoming possible due to advances in methodologies, tools, architectures and design techniques.
Design Automation for Embedded Systems is a multidisciplinary journal which addresses the systematic design of embedded systems, focusing primarily on tools, methodologies and architectures for embedded systems, including HW/SW co-design, simulation and modeling approaches, synthesis techniques, architectures and design exploration, among others.
Design Automation for Embedded Systems offers a forum for scientist and engineers to report on their latest works on algorithms, tools, architectures, case studies and real design examples related to embedded systems hardware and software.
Design Automation for Embedded Systems is an innovative journal which distinguishes itself by welcoming high-quality papers on the methodology, tools, architectures and design of electronic embedded systems, leading to a true multidisciplinary system design journal.