A high-performance dataflow-centric optimization framework for deep learning inference on the edge

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Systems Architecture Pub Date : 2024-07-01 Epub Date: 2024-05-20 DOI:10.1016/j.sysarc.2024.103180

Runhua Zhang , Hongxu Jiang , Jinkun Geng , Fangzheng Tian , Yuhang Ma , Haojie Wang

{"title":"A high-performance dataflow-centric optimization framework for deep learning inference on the edge","authors":"Runhua Zhang , Hongxu Jiang , Jinkun Geng , Fangzheng Tian , Yuhang Ma , Haojie Wang","doi":"10.1016/j.sysarc.2024.103180","DOIUrl":null,"url":null,"abstract":"<div>Edge computing has been emerging as a popular scenario for model inference. However, the inference performance on edge devices (e.g., Multi-Core DSP, FGPA, etc.) suffers from inefficiency due to the lack of highly optimized inference frameworks. Previous model inference frameworks are mainly developed in an operator-centric way, which provides insufficient acceleration to edge-based inference. Besides, the operator-centric framework incurs significant costs for continuous development and maintenance.Targeting the existing drawbacks of operator-centric frameworks, we design Xenos, which can automatically conduct dataflow-centric optimization of the computation graph and accelerate inference in two dimensions. Vertically, Xenos develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, Xenos develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation demonstrates the effectiveness of vertical and horizontal dataflow optimization, which reduce the inference time by 15.0%–84.9% and 17.9%–89.9% , respectively. Besides, Xenos also outperforms the widely-used TVM by 1.1<math><mo>×</mo></math>–1.9<math><mo>×</mo></math>. Moreover, we extend Xenos to a distributed solution, which we call d-Xenos. d-Xenos employs multiple edge devices to jointly conduct the inference task and achieves a speedup of 3.68<math><mo>×</mo></math>–3.78<math><mo>×</mo></math> compared with the single device.</div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"152 ","pages":"Article 103180"},"PeriodicalIF":4.1000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762124001176","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/20 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Edge computing has been emerging as a popular scenario for model inference. However, the inference performance on edge devices (e.g., Multi-Core DSP, FGPA, etc.) suffers from inefficiency due to the lack of highly optimized inference frameworks. Previous model inference frameworks are mainly developed in an operator-centric way, which provides insufficient acceleration to edge-based inference. Besides, the operator-centric framework incurs significant costs for continuous development and maintenance.

Targeting the existing drawbacks of operator-centric frameworks, we design Xenos, which can automatically conduct dataflow-centric optimization of the computation graph and accelerate inference in two dimensions. Vertically, Xenos develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, Xenos develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation demonstrates the effectiveness of vertical and horizontal dataflow optimization, which reduce the inference time by 15.0%–84.9% and 17.9%–89.9% , respectively. Besides, Xenos also outperforms the widely-used TVM by 1.1 $\times$ –1.9 $\times$ . Moreover, we extend Xenos to a distributed solution, which we call d-Xenos. d-Xenos employs multiple edge devices to jointly conduct the inference task and achieves a speedup of 3.68 $\times$ –3.78 $\times$ compared with the single device.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

以数据流为中心的高性能边缘深度学习推理优化框架

边缘计算已成为模型推理的热门应用场景。然而，由于缺乏高度优化的推理框架，边缘设备（如多核 DSP、FGPA 等）上的推理性能效率低下。以往的模型推理框架主要是以运算器为中心开发的，无法为基于边缘的推理提供足够的加速。针对现有的以算子为中心的推理框架存在的弊端，我们设计了 Xenos，它可以自动对计算图进行以数据流为中心的优化，并在两个维度上加速推理。在纵向上，Xenos开发了运算器链接技术，通过重组运算器间的数据流来提高数据的本地性。在水平方向上，Xenos 开发了 DSP 感知算子拆分技术，以实现多个 DSP 单元之间更高的并行性。我们的评估证明了纵向和横向数据流优化的有效性，它们分别缩短了 15.0%-84.9% 和 17.9%-89.9% 的推理时间。此外，Xenos的性能还比广泛使用的TVM高出1.1倍-1.9倍。d-Xenos 采用多个边缘设备共同执行推理任务，与单个设备相比，速度提高了 3.68×-3.78× 。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Systems Architecture 工程技术-计算机：硬件

CiteScore

8.70

自引率

15.60%

发文量

226

审稿时长

46 days

期刊介绍： The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.