Mix-GEMM: Extending RISC-V CPUs for Energy-Efficient Mixed-Precision DNN Inference Using Binary Segmentation

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Computers Pub Date : 2024-11-21 DOI:10.1109/TC.2024.3500369

Jordi Fornt;Enrico Reggiani;Pau Fontova-Musté;Narcís Rodas;Alessandro Pappalardo;Osman Sabri Unsal;Adrián Cristal Kestelman;Josep Altet;Francesc Moll;Jaume Abella

{"title":"Mix-GEMM: Extending RISC-V CPUs for Energy-Efficient Mixed-Precision DNN Inference Using Binary Segmentation","authors":"Jordi Fornt;Enrico Reggiani;Pau Fontova-Musté;Narcís Rodas;Alessandro Pappalardo;Osman Sabri Unsal;Adrián Cristal Kestelman;Josep Altet;Francesc Moll;Jaume Abella","doi":"10.1109/TC.2024.3500369","DOIUrl":null,"url":null,"abstract":"Efficiently computing Deep Neural Networks (DNNs) has become a primary challenge in today's computers, especially on devices targeting mobile or edge applications. Recent progress on Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) has shown that the key to high energy efficiency lies in executing deep learning models with low- (8- to 5-bit) or ultra-low-precision (4- to 2-bit). Unfortunately, current Central Processing Unit (CPU) architectures and Instruction Set Architectures (ISAs) present severe limitations on the range of data sizes supported to compute DNN kernels. In this work, we present Mix-GEMM, a hardware-software co-designed architecture that enables RISC-V processors to efficiently compute arbitrary mixed-precision DNN kernels, supporting all data size combinations from 8- to 2-bit. By applying binary segmentation, our architecture can scale its throughput by decreasing the data size of the operands, resulting in a flexible approach capable of leveraging state-of-the-art QAT and PTQ to achieve high energy efficiency at a very low cost. Evaluating our Mix-GEMM architecture in a dual-issue in-order RISC-V processor shows that we are able to boost its performance and energy efficiency by up to <inline-formula><tex-math>$44\\times$</tex-math></inline-formula> and <inline-formula><tex-math>$11\\times$</tex-math></inline-formula> with respect to the baseline processor, with an area overhead of only 2%. This allows our extended processor to execute state-of-the-art DNNs with significantly higher performance and energy efficiency than the standard FP32 precision, while retaining almost the same model accuracy.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"582-596"},"PeriodicalIF":3.8000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10761060/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Efficiently computing Deep Neural Networks (DNNs) has become a primary challenge in today's computers, especially on devices targeting mobile or edge applications. Recent progress on Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) has shown that the key to high energy efficiency lies in executing deep learning models with low- (8- to 5-bit) or ultra-low-precision (4- to 2-bit). Unfortunately, current Central Processing Unit (CPU) architectures and Instruction Set Architectures (ISAs) present severe limitations on the range of data sizes supported to compute DNN kernels. In this work, we present Mix-GEMM, a hardware-software co-designed architecture that enables RISC-V processors to efficiently compute arbitrary mixed-precision DNN kernels, supporting all data size combinations from 8- to 2-bit. By applying binary segmentation, our architecture can scale its throughput by decreasing the data size of the operands, resulting in a flexible approach capable of leveraging state-of-the-art QAT and PTQ to achieve high energy efficiency at a very low cost. Evaluating our Mix-GEMM architecture in a dual-issue in-order RISC-V processor shows that we are able to boost its performance and energy efficiency by up to

$44\times$

and

$11\times$

with respect to the baseline processor, with an area overhead of only 2%. This allows our extended processor to execute state-of-the-art DNNs with significantly higher performance and energy efficiency than the standard FP32 precision, while retaining almost the same model accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Mix-GEMM：扩展RISC-V cpu的节能混合精度DNN推理使用二值分割

高效计算深度神经网络（dnn）已成为当今计算机的主要挑战，特别是在针对移动或边缘应用程序的设备上。训练后量化（PTQ）和量化感知训练（QAT）的最新进展表明，高能效的关键在于执行低精度（8至5位）或超低精度（4至2位）的深度学习模型。不幸的是，当前的中央处理单元（CPU）体系结构和指令集体系结构（isa）在计算DNN内核所支持的数据大小范围上存在严重限制。在这项工作中，我们提出了Mix-GEMM，这是一种硬件软件协同设计的架构，使RISC-V处理器能够有效地计算任意混合精度DNN内核，支持从8位到2位的所有数据大小组合。通过应用二进制分割，我们的架构可以通过减少操作数的数据大小来扩展其吞吐量，从而产生一种灵活的方法，能够利用最先进的QAT和PTQ以非常低的成本实现高能效。在双问题顺序RISC-V处理器中评估我们的Mix-GEMM架构表明，我们能够将其性能和能源效率提高到基准处理器的44倍和11倍，而面积开销仅为2%。这使得我们的扩展处理器能够以比标准FP32精度更高的性能和能效执行最先进的dnn，同时保持几乎相同的模型精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Computers 工程技术-工程：电子与电气

CiteScore

6.60

自引率

5.40%

发文量

199

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.