Enabling Binary Neural Network Training on the Edge

IF 2.6 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Embedded Computing Systems Pub Date : 2023-11-09 DOI:10.1145/3626100

Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Jia Jie Lim, Claudionor Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, George A. Constantinides

{"title":"Enabling Binary Neural Network Training on the Edge","authors":"Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Jia Jie Lim, Claudionor Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, George A. Constantinides","doi":"10.1145/3626100","DOIUrl":null,"url":null,"abstract":"The ever-growing computational demands of increasingly complex machine learning models frequently necessitate the use of powerful cloud-based infrastructure for their training. Binary neural networks are known to be promising candidates for on-device inference due to their extreme compute and memory savings over higher-precision alternatives. However, their existing training methods require the concurrent storage of high-precision activations for all layers, generally making learning on memory-constrained devices infeasible. In this article, we demonstrate that the backward propagation operations needed for binary neural network training are strongly robust to quantization, thereby making on-the-edge learning with modern models a practical proposition. We introduce a low-cost binary neural network training strategy exhibiting sizable memory footprint reductions while inducing little to no accuracy loss vs Courbariaux & Bengio’s standard approach. These decreases are primarily enabled through the retention of activations exclusively in binary format. Against the latter algorithm, our drop-in replacement sees memory requirement reductions of 3–5×, while reaching similar test accuracy (± 2 pp) in comparable time, across a range of small-scale models trained to classify popular datasets. We also demonstrate from-scratch ImageNet training of binarized ResNet-18, achieving a 3.78× memory reduction. Our work is open-source, and includes the Raspberry Pi-targeted prototype we used to verify our modeled memory decreases and capture the associated energy drops. Such savings will allow for unnecessary cloud offloading to be avoided, reducing latency, increasing energy efficiency, and safeguarding end-user privacy.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":" 48","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Embedded Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3626100","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The ever-growing computational demands of increasingly complex machine learning models frequently necessitate the use of powerful cloud-based infrastructure for their training. Binary neural networks are known to be promising candidates for on-device inference due to their extreme compute and memory savings over higher-precision alternatives. However, their existing training methods require the concurrent storage of high-precision activations for all layers, generally making learning on memory-constrained devices infeasible. In this article, we demonstrate that the backward propagation operations needed for binary neural network training are strongly robust to quantization, thereby making on-the-edge learning with modern models a practical proposition. We introduce a low-cost binary neural network training strategy exhibiting sizable memory footprint reductions while inducing little to no accuracy loss vs Courbariaux & Bengio’s standard approach. These decreases are primarily enabled through the retention of activations exclusively in binary format. Against the latter algorithm, our drop-in replacement sees memory requirement reductions of 3–5×, while reaching similar test accuracy (± 2 pp) in comparable time, across a range of small-scale models trained to classify popular datasets. We also demonstrate from-scratch ImageNet training of binarized ResNet-18, achieving a 3.78× memory reduction. Our work is open-source, and includes the Raspberry Pi-targeted prototype we used to verify our modeled memory decreases and capture the associated energy drops. Such savings will allow for unnecessary cloud offloading to be avoided, reducing latency, increasing energy efficiency, and safeguarding end-user privacy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

启用边缘二进制神经网络训练

日益复杂的机器学习模型不断增长的计算需求经常需要使用强大的基于云的基础设施进行训练。二进制神经网络被认为是设备上推理的有希望的候选者，因为它们比更高精度的替代方案具有极高的计算和内存节省。然而，他们现有的训练方法需要并发存储所有层的高精度激活，通常使得在内存受限的设备上学习是不可行的。在本文中，我们证明了二元神经网络训练所需的反向传播操作对量化具有很强的鲁棒性，从而使现代模型的边缘学习成为一个实用的命题。我们引入了一种低成本的二元神经网络训练策略，显示出相当大的内存占用减少，同时与courbarariaux &本吉奥的标准方法。这些减少主要是通过只保留二进制格式的激活来实现的。对于后一种算法，我们的插入式替换看到内存需求减少3 - 5倍，同时在可比时间内达到类似的测试精度(±2 pp)，在一系列小规模模型中训练以分类流行数据集。我们还从头开始演示了二值化ResNet-18的ImageNet训练，实现了3.78倍的内存减少。我们的工作是开源的，包括针对Raspberry pi的原型，我们用来验证我们建模的内存减少和捕获相关的能量下降。这种节省将允许避免不必要的云卸载，减少延迟，提高能源效率，并保护最终用户的隐私。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Embedded Computing Systems 工程技术-计算机：软件工程

CiteScore

3.70

自引率

0.00%

发文量

138

审稿时长

6 months

期刊介绍： The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.