Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Internet Technology Pub Date : 2023-02-23 DOI:https://dl.acm.org/doi/10.1145/3551638

Weiwei Fang, Wenyuan Xu, Chongchong Yu, Neal. N. Xiong

{"title":"Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters","authors":"Weiwei Fang, Wenyuan Xu, Chongchong Yu, Neal. N. Xiong","doi":"https://dl.acm.org/doi/10.1145/3551638","DOIUrl":null,"url":null,"abstract":"<p>The advent of Deep Neural Networks (DNNs) has empowered numerous computer-vision applications. Due to the high computational intensity of DNN models, as well as the resource constrained nature of Industrial Internet-of-Things (IIoT) devices, it is generally very challenging to deploy and execute DNNs efficiently in the industrial scenarios. Substantial research has focused on model compression or edge-cloud offloading, which trades off accuracy for efficiency or depends on high-quality infrastructure support, respectively. In this article, we present EdgeDI, a framework for executing DNN inference in a partitioned, distributed manner on a cluster of IIoT devices. To improve the inference performance, EdgeDI exploits two key optimization knobs, including: (1) Model compression based on deep architecture design, which transforms the target DNN model into a compact one that reduces the resource requirements for IIoT devices without sacrificing accuracy; (2) Distributed inference based on adaptive workload partitioning, which achieves high parallelism by adaptively balancing the workload distribution among IIoT devices under heterogeneous resource conditions. We have implemented EdgeDI based on PyTorch, and evaluated its performance with the NEU-CLS defect classification task and two typical DNN models (i.e., VGG and ResNet) on a cluster of heterogeneous Raspberry Pi devices. The results indicate that the proposed two optimization approaches significantly outperform the existing solutions in their specific domains. When they are well combined, EdgeDI can provide scalable DNN inference speedups that are very close to or even much higher than the theoretical speedup bounds, while still maintaining the desired accuracy.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"1 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3551638","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The advent of Deep Neural Networks (DNNs) has empowered numerous computer-vision applications. Due to the high computational intensity of DNN models, as well as the resource constrained nature of Industrial Internet-of-Things (IIoT) devices, it is generally very challenging to deploy and execute DNNs efficiently in the industrial scenarios. Substantial research has focused on model compression or edge-cloud offloading, which trades off accuracy for efficiency or depends on high-quality infrastructure support, respectively. In this article, we present EdgeDI, a framework for executing DNN inference in a partitioned, distributed manner on a cluster of IIoT devices. To improve the inference performance, EdgeDI exploits two key optimization knobs, including: (1) Model compression based on deep architecture design, which transforms the target DNN model into a compact one that reduces the resource requirements for IIoT devices without sacrificing accuracy; (2) Distributed inference based on adaptive workload partitioning, which achieves high parallelism by adaptively balancing the workload distribution among IIoT devices under heterogeneous resource conditions. We have implemented EdgeDI based on PyTorch, and evaluated its performance with the NEU-CLS defect classification task and two typical DNN models (i.e., VGG and ResNet) on a cluster of heterogeneous Raspberry Pi devices. The results indicate that the proposed two optimization approaches significantly outperform the existing solutions in their specific domains. When they are well combined, EdgeDI can provide scalable DNN inference speedups that are very close to or even much higher than the theoretical speedup bounds, while still maintaining the desired accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

工业物联网集群DNN推理联合架构设计与工作负载划分

深度神经网络(dnn)的出现为许多计算机视觉应用提供了动力。由于深度神经网络模型的高计算强度，以及工业物联网(IIoT)设备的资源约束性质，在工业场景中有效地部署和执行深度神经网络通常非常具有挑战性。大量的研究集中在模型压缩或边缘云卸载上，它们分别以准确性换取效率或依赖于高质量的基础设施支持。在本文中，我们提出了EdgeDI，这是一个在IIoT设备集群上以分区、分布式方式执行DNN推理的框架。为了提高推理性能，EdgeDI利用了两个关键的优化方法，包括:(1)基于深度架构设计的模型压缩，将目标DNN模型转换为紧凑的模型，在不牺牲精度的情况下减少IIoT设备的资源需求;(2)基于自适应工作负载分区的分布式推理，在异构资源条件下，通过自适应平衡IIoT设备之间的工作负载分布，实现高并行性。我们基于PyTorch实现了EdgeDI，并在异构树莓派设备集群上使用nue - cls缺陷分类任务和两种典型DNN模型(即VGG和ResNet)对其性能进行了评估。结果表明，所提出的两种优化方法在其特定领域内明显优于现有的解决方案。当它们很好地结合在一起时，EdgeDI可以提供非常接近甚至远远高于理论加速界限的可扩展DNN推理加速，同时仍然保持所需的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Internet Technology 工程技术-计算机：软件工程

CiteScore

10.30

自引率

1.90%

发文量

137

审稿时长

>12 weeks

期刊介绍： ACM Transactions on Internet Technology (TOIT) brings together many computing disciplines including computer software engineering, computer programming languages, middleware, database management, security, knowledge discovery and data mining, networking and distributed systems, communications, performance and scalability etc. TOIT will cover the results and roles of the individual disciplines and the relationshipsamong them.