一种新的超低能量边缘神经网络架构

IF 2.9 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Architecture and Code Optimization Pub Date : 2023-10-25 DOI:10.1145/3629522

Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael F. Katopodis, Leandro S. de Araújo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. França, Mauricio Breternitz Jr., Lizy K. John

{"title":"一种新的超低能量边缘神经网络架构","authors":"Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael F. Katopodis, Leandro S. de Araújo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. França, Mauricio Breternitz Jr., Lizy K. John","doi":"10.1145/3629522","DOIUrl":null,"url":null,"abstract":"”Extreme edge“ devices such as smart sensors are a uniquely challenging environment for the deployment of machine learning. The tiny energy budgets of these devices lie beyond what is feasible for conventional deep neural networks, particularly in high-throughput scenarios, requiring us to rethink how we approach edge inference. In this work, we propose ULEEN, a model and FPGA-based accelerator architecture based on weightless neural networks (WNNs). WNNs eliminate energy-intensive arithmetic operations, instead using table lookups to perform computation, which makes them theoretically well-suited for edge inference. However, WNNs have historically suffered from poor accuracy and excessive memory usage. ULEEN incorporates algorithmic improvements and a novel training strategy inspired by binary neural networks (BNNs) to make significant strides in addressing these issues. We compare ULEEN against BNNs in software and hardware using the four MLPerf Tiny datasets and MNIST. Our FPGA implementations of ULEEN accomplish classification at 4.0-14.3 million inferences per second, improving area-normalized throughput by an average of 3.6 × and steady-state energy efficiency by an average of 7.1 × compared to the FPGA-based Xilinx FINN BNN inference platform. While ULEEN is not a universally applicable machine learning model, we demonstrate that it can be an excellent choice for certain applications in energy- and latency-critical edge environments.","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"1 1","pages":"0"},"PeriodicalIF":2.9000,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks\",\"authors\":\"Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael F. Katopodis, Leandro S. de Araújo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. França, Mauricio Breternitz Jr., Lizy K. John\",\"doi\":\"10.1145/3629522\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"”Extreme edge“ devices such as smart sensors are a uniquely challenging environment for the deployment of machine learning. The tiny energy budgets of these devices lie beyond what is feasible for conventional deep neural networks, particularly in high-throughput scenarios, requiring us to rethink how we approach edge inference. In this work, we propose ULEEN, a model and FPGA-based accelerator architecture based on weightless neural networks (WNNs). WNNs eliminate energy-intensive arithmetic operations, instead using table lookups to perform computation, which makes them theoretically well-suited for edge inference. However, WNNs have historically suffered from poor accuracy and excessive memory usage. ULEEN incorporates algorithmic improvements and a novel training strategy inspired by binary neural networks (BNNs) to make significant strides in addressing these issues. We compare ULEEN against BNNs in software and hardware using the four MLPerf Tiny datasets and MNIST. Our FPGA implementations of ULEEN accomplish classification at 4.0-14.3 million inferences per second, improving area-normalized throughput by an average of 3.6 × and steady-state energy efficiency by an average of 7.1 × compared to the FPGA-based Xilinx FINN BNN inference platform. While ULEEN is not a universally applicable machine learning model, we demonstrate that it can be an excellent choice for certain applications in energy- and latency-critical edge environments.\",\"PeriodicalId\":50920,\"journal\":{\"name\":\"ACM Transactions on Architecture and Code Optimization\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2023-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Architecture and Code Optimization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3629522\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3629522","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

智能传感器等“极端边缘”设备是部署机器学习的独特挑战环境。这些设备的微小能量预算超出了传统深度神经网络的可行范围，特别是在高吞吐量场景下，这需要我们重新思考如何处理边缘推理。在这项工作中，我们提出了ULEEN，一种基于失重神经网络(WNNs)的模型和基于fpga的加速器架构。wnn消除了能量密集的算术运算，而是使用表查找来执行计算，这使得它们在理论上非常适合于边缘推理。然而，wnn在历史上一直存在准确性差和内存使用过多的问题。ULEEN结合了算法改进和受二元神经网络(bnn)启发的新颖训练策略，在解决这些问题方面取得了重大进展。我们使用四个MLPerf Tiny数据集和MNIST在软件和硬件上比较了ULEEN与bnn。与基于FPGA的Xilinx FINN BNN推理平台相比，ULEEN的FPGA实现以每秒40 - 1430万次推理的速度完成分类，将区域标准化吞吐量平均提高3.6倍，稳态能效平均提高7.1倍。虽然ULEEN不是一个普遍适用的机器学习模型，但我们证明它可以成为能源和延迟关键边缘环境中某些应用程序的绝佳选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks

”Extreme edge“ devices such as smart sensors are a uniquely challenging environment for the deployment of machine learning. The tiny energy budgets of these devices lie beyond what is feasible for conventional deep neural networks, particularly in high-throughput scenarios, requiring us to rethink how we approach edge inference. In this work, we propose ULEEN, a model and FPGA-based accelerator architecture based on weightless neural networks (WNNs). WNNs eliminate energy-intensive arithmetic operations, instead using table lookups to perform computation, which makes them theoretically well-suited for edge inference. However, WNNs have historically suffered from poor accuracy and excessive memory usage. ULEEN incorporates algorithmic improvements and a novel training strategy inspired by binary neural networks (BNNs) to make significant strides in addressing these issues. We compare ULEEN against BNNs in software and hardware using the four MLPerf Tiny datasets and MNIST. Our FPGA implementations of ULEEN accomplish classification at 4.0-14.3 million inferences per second, improving area-normalized throughput by an average of 3.6 × and steady-state energy efficiency by an average of 7.1 × compared to the FPGA-based Xilinx FINN BNN inference platform. While ULEEN is not a universally applicable machine learning model, we demonstrate that it can be an excellent choice for certain applications in energy- and latency-critical edge environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Architecture and Code Optimization 工程技术-计算机：理论方法

CiteScore

3.60

自引率

6.20%

发文量

审稿时长

6-12 weeks

期刊介绍： ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.

期刊最新文献

Hermes: Efficient Serving of LLM Applications with Probabilistic Demand Modeling RACER: Avoiding End-to-End Slowdowns in Accelerated Chip Multi-Processors DLAS: A Conceptual Model for Across-Stack Deep Learning Acceleration A Survey of General-purpose Polyhedral Compilers Sectored DRAM: A Practical Energy-Efficient and High-Performance Fine-Grained DRAM Architecture