A high-performance dataflow-centric optimization framework for deep learning inference on the edge

IF 3.7 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Systems Architecture Pub Date : 2024-05-20 DOI:10.1016/j.sysarc.2024.103180
Runhua Zhang , Hongxu Jiang , Jinkun Geng , Fangzheng Tian , Yuhang Ma , Haojie Wang
{"title":"A high-performance dataflow-centric optimization framework for deep learning inference on the edge","authors":"Runhua Zhang ,&nbsp;Hongxu Jiang ,&nbsp;Jinkun Geng ,&nbsp;Fangzheng Tian ,&nbsp;Yuhang Ma ,&nbsp;Haojie Wang","doi":"10.1016/j.sysarc.2024.103180","DOIUrl":null,"url":null,"abstract":"<div><p>Edge computing has been emerging as a popular scenario for model inference. However, the inference performance on edge devices (e.g., Multi-Core DSP, FGPA, etc.) suffers from inefficiency due to the lack of highly optimized inference frameworks. Previous model inference frameworks are mainly developed in an operator-centric way, which provides insufficient acceleration to edge-based inference. Besides, the operator-centric framework incurs significant costs for continuous development and maintenance.</p><p>Targeting the existing drawbacks of <em>operator-centric</em> frameworks, we design <span>Xenos</span>, which can automatically conduct <em>dataflow-centric</em> optimization of the computation graph and accelerate inference in two dimensions. Vertically, <span>Xenos</span> develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, <span>Xenos</span> develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation demonstrates the effectiveness of vertical and horizontal dataflow optimization, which reduce the inference time by 15.0%–84.9% and 17.9%–89.9% , respectively. Besides, <span>Xenos</span> also outperforms the widely-used TVM by 1.1<span><math><mo>×</mo></math></span>–1.9<span><math><mo>×</mo></math></span>. Moreover, we extend <span>Xenos</span> to a distributed solution, which we call <span>d-Xenos</span>. <span>d-Xenos</span> employs multiple edge devices to jointly conduct the inference task and achieves a speedup of 3.68<span><math><mo>×</mo></math></span>–3.78<span><math><mo>×</mo></math></span> compared with the single device.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"152 ","pages":"Article 103180"},"PeriodicalIF":3.7000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762124001176","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Edge computing has been emerging as a popular scenario for model inference. However, the inference performance on edge devices (e.g., Multi-Core DSP, FGPA, etc.) suffers from inefficiency due to the lack of highly optimized inference frameworks. Previous model inference frameworks are mainly developed in an operator-centric way, which provides insufficient acceleration to edge-based inference. Besides, the operator-centric framework incurs significant costs for continuous development and maintenance.

Targeting the existing drawbacks of operator-centric frameworks, we design Xenos, which can automatically conduct dataflow-centric optimization of the computation graph and accelerate inference in two dimensions. Vertically, Xenos develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, Xenos develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation demonstrates the effectiveness of vertical and horizontal dataflow optimization, which reduce the inference time by 15.0%–84.9% and 17.9%–89.9% , respectively. Besides, Xenos also outperforms the widely-used TVM by 1.1×–1.9×. Moreover, we extend Xenos to a distributed solution, which we call d-Xenos. d-Xenos employs multiple edge devices to jointly conduct the inference task and achieves a speedup of 3.68×–3.78× compared with the single device.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
以数据流为中心的高性能边缘深度学习推理优化框架
边缘计算已成为模型推理的热门应用场景。然而,由于缺乏高度优化的推理框架,边缘设备(如多核 DSP、FGPA 等)上的推理性能效率低下。以往的模型推理框架主要是以运算器为中心开发的,无法为基于边缘的推理提供足够的加速。针对现有的以算子为中心的推理框架存在的弊端,我们设计了 Xenos,它可以自动对计算图进行以数据流为中心的优化,并在两个维度上加速推理。在纵向上,Xenos开发了运算器链接技术,通过重组运算器间的数据流来提高数据的本地性。在水平方向上,Xenos 开发了 DSP 感知算子拆分技术,以实现多个 DSP 单元之间更高的并行性。我们的评估证明了纵向和横向数据流优化的有效性,它们分别缩短了 15.0%-84.9% 和 17.9%-89.9% 的推理时间。此外,Xenos的性能还比广泛使用的TVM高出1.1倍-1.9倍。d-Xenos 采用多个边缘设备共同执行推理任务,与单个设备相比,速度提高了 3.68×-3.78× 。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Systems Architecture
Journal of Systems Architecture 工程技术-计算机:硬件
CiteScore
8.70
自引率
15.60%
发文量
226
审稿时长
46 days
期刊介绍: The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.
期刊最新文献
SAMFL: Secure Aggregation Mechanism for Federated Learning with Byzantine-robustness by functional encryption ZNS-Cleaner: Enhancing lifespan by reducing empty erase in ZNS SSDs Using MAST for modeling and response-time analysis of real-time applications with GPUs Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN accelerators Function Placement Approaches in Serverless Computing: A Survey
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1