用于 8 位浮点 DNN 训练的新型自适应量化方法

IF 0.9 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Design Automation for Embedded Systems Pub Date : 2024-02-16 DOI:10.1007/s10617-024-09282-2

Mohammad Hassani Sadi, Chirag Sudarshan, Norbert Wehn

{"title":"用于 8 位浮点 DNN 训练的新型自适应量化方法","authors":"Mohammad Hassani Sadi, Chirag Sudarshan, Norbert Wehn","doi":"10.1007/s10617-024-09282-2","DOIUrl":null,"url":null,"abstract":"There is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer from a limited dynamic range, resulting in reduced accuracy. In this paper, a novel 8-bit Floating Point (FP8) data format quantized DNN training methodology is presented, which adapts to the required dynamic range on-the-fly. Our methodology relies on varying the bias values of FP8 format to fit the dynamic range to the required range of DNN parameters and input feature maps. The range fitting during the training is adaptively performed by an online statistical analysis hardware unit without stalling the computation units or its data accesses. Our approach is compatible with any DNN compute cores without any major modifications to the architecture. We propose to integrate the new FP8 quantization unit in the memory controller. The FP32 data from the compute core are converted to FP8 in the memory controller before writing to the DRAM and converted back after reading the data from DRAM. Our results show that the DRAM access energy is reduced by 3.07\\(\\times \\) while using an 8-bit data format instead of using 32-bit. The accuracy loss of the proposed methodology with 8-bit quantized training is \\(\\approx 1\\%\\) for various networks with image and natural language processing datasets.\n","PeriodicalId":50594,"journal":{"name":"Design Automation for Embedded Systems","volume":"41 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Novel adaptive quantization methodology for 8-bit floating-point DNN training\",\"authors\":\"Mohammad Hassani Sadi, Chirag Sudarshan, Norbert Wehn\",\"doi\":\"10.1007/s10617-024-09282-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer from a limited dynamic range, resulting in reduced accuracy. In this paper, a novel 8-bit Floating Point (FP8) data format quantized DNN training methodology is presented, which adapts to the required dynamic range on-the-fly. Our methodology relies on varying the bias values of FP8 format to fit the dynamic range to the required range of DNN parameters and input feature maps. The range fitting during the training is adaptively performed by an online statistical analysis hardware unit without stalling the computation units or its data accesses. Our approach is compatible with any DNN compute cores without any major modifications to the architecture. We propose to integrate the new FP8 quantization unit in the memory controller. The FP32 data from the compute core are converted to FP8 in the memory controller before writing to the DRAM and converted back after reading the data from DRAM. Our results show that the DRAM access energy is reduced by 3.07\\\\(\\\\times \\\\) while using an 8-bit data format instead of using 32-bit. The accuracy loss of the proposed methodology with 8-bit quantized training is \\\\(\\\\approx 1\\\\%\\\\) for various networks with image and natural language processing datasets.\\n\",\"PeriodicalId\":50594,\"journal\":{\"name\":\"Design Automation for Embedded Systems\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2024-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Design Automation for Embedded Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10617-024-09282-2\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Design Automation for Embedded Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10617-024-09282-2","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

训练深度神经网络（DNN）的能耗很高。片外内存访问在总体能耗中占很大比例。通过将数据字量化为低数据位宽（如 8 位），可以减少片外内存事务的数量。然而，低位宽数据格式的动态范围有限，导致精度降低。本文提出了一种新颖的 8 位浮点（FP8）数据格式量化 DNN 训练方法，它能即时适应所需的动态范围。我们的方法依赖于改变 FP8 格式的偏置值，以适应 DNN 参数和输入特征图所需的动态范围。训练期间的范围拟合由在线统计分析硬件单元自适应执行，而不会中断计算单元或其数据访问。我们的方法与任何 DNN 计算核心兼容，无需对架构进行任何重大修改。我们建议将新的 FP8 量化单元集成到内存控制器中。计算内核的 FP32 数据在写入 DRAM 之前在内存控制器中转换为 FP8，从 DRAM 读取数据后再转换回 FP8。我们的结果表明，在使用 8 位数据格式而不是 32 位数据格式时，DRAM 访问能量减少了 3.07（\times \）。在使用图像和自然语言处理数据集的各种网络中，使用8位量化训练的拟议方法的精度损失为（约1%）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Novel adaptive quantization methodology for 8-bit floating-point DNN training

There is a high energy cost associated with training Deep Neural Networks (DNNs). Off-chip memory access contributes a major portion to the overall energy consumption. Reduction in the number of off-chip memory transactions can be achieved by quantizing the data words to low data bit-width (E.g., 8-bit). However, low-bit-width data formats suffer from a limited dynamic range, resulting in reduced accuracy. In this paper, a novel 8-bit Floating Point (FP8) data format quantized DNN training methodology is presented, which adapts to the required dynamic range on-the-fly. Our methodology relies on varying the bias values of FP8 format to fit the dynamic range to the required range of DNN parameters and input feature maps. The range fitting during the training is adaptively performed by an online statistical analysis hardware unit without stalling the computation units or its data accesses. Our approach is compatible with any DNN compute cores without any major modifications to the architecture. We propose to integrate the new FP8 quantization unit in the memory controller. The FP32 data from the compute core are converted to FP8 in the memory controller before writing to the DRAM and converted back after reading the data from DRAM. Our results show that the DRAM access energy is reduced by 3.07\(\times \) while using an 8-bit data format instead of using 32-bit. The accuracy loss of the proposed methodology with 8-bit quantized training is \(\approx 1\%\) for various networks with image and natural language processing datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Design Automation for Embedded Systems 工程技术-计算机：软件工程

CiteScore

2.60

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Embedded (electronic) systems have become the electronic engines of modern consumer and industrial devices, from automobiles to satellites, from washing machines to high-definition TVs, and from cellular phones to complete base stations. These embedded systems encompass a variety of hardware and software components which implement a wide range of functions including digital, analog and RF parts. Although embedded systems have been designed for decades, the systematic design of such systems with well defined methodologies, automation tools and technologies has gained attention primarily in the last decade. Advances in silicon technology and increasingly demanding applications have significantly expanded the scope and complexity of embedded systems. These systems are only now becoming possible due to advances in methodologies, tools, architectures and design techniques. Design Automation for Embedded Systems is a multidisciplinary journal which addresses the systematic design of embedded systems, focusing primarily on tools, methodologies and architectures for embedded systems, including HW/SW co-design, simulation and modeling approaches, synthesis techniques, architectures and design exploration, among others. Design Automation for Embedded Systems offers a forum for scientist and engineers to report on their latest works on algorithms, tools, architectures, case studies and real design examples related to embedded systems hardware and software. Design Automation for Embedded Systems is an innovative journal which distinguishes itself by welcoming high-quality papers on the methodology, tools, architectures and design of electronic embedded systems, leading to a true multidisciplinary system design journal.