PArtNNer: Platform-agnostic Adaptive Edge-Cloud DNN Partitioning for minimizing End-to-End Latency

IF 2.8 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Embedded Computing Systems Pub Date : 2023-10-27 DOI:10.1145/3630266

Soumendu Kumar Ghosh, Arnab Raha, Vijay Raghunathan, Anand Raghunathan

{"title":"PArtNNer: Platform-agnostic Adaptive Edge-Cloud DNN Partitioning for minimizing End-to-End Latency","authors":"Soumendu Kumar Ghosh, Arnab Raha, Vijay Raghunathan, Anand Raghunathan","doi":"10.1145/3630266","DOIUrl":null,"url":null,"abstract":"The last decade has seen the emergence of Deep Neural Networks (DNNs) as the de facto algorithm for various computer vision applications. In intelligent edge devices, sensor data streams acquired by the device are processed by a DNN application running on either the edge device itself or in the cloud. However, ‘edge-only’ and ‘cloud-only’ execution of State-of-the-Art DNNs may not meet an application’s latency requirements due to the limited compute, memory, and energy resources in edge devices, dynamically varying bandwidth of edge-cloud connectivity networks, and temporal variations in the computational load of cloud servers. This work investigates distributed (partitioned) inference across edge devices (mobile/end device) and cloud servers to minimize end-to-end DNN inference latency. We study the impact of temporally varying operating conditions and the underlying compute and communication architecture on the decision of whether to run the inference solely on the edge, entirely in the cloud, or by partitioning the DNN model execution among the two. Leveraging the insights gained from this study and the wide variation in the capabilities of various edge platforms that run DNN inference, we propose PArtNNer , a platform-agnostic adaptive DNN partitioning algorithm that finds the optimal partitioning point in DNNs to minimize inference latency. PArtNNer can adapt to dynamic variations in communication bandwidth and cloud server load without requiring pre-characterization of underlying platforms. Experimental results for six image classification and object detection DNNs on a set of five commercial off-the-shelf compute platforms and three communication standards indicate that PArtNNer results in 10.2 × and 3.2 × (on average) and up to 21.1 × and 6.7 × improvements in end-to-end inference latency compared to execution of the DNN entirely on the edge device or entirely on a cloud server, respectively. Compared to pre-characterization-based partitioning approaches, PArtNNer converges to the optimal partitioning point 17.6 × faster.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"43 5","pages":"0"},"PeriodicalIF":2.8000,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Embedded Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3630266","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The last decade has seen the emergence of Deep Neural Networks (DNNs) as the de facto algorithm for various computer vision applications. In intelligent edge devices, sensor data streams acquired by the device are processed by a DNN application running on either the edge device itself or in the cloud. However, ‘edge-only’ and ‘cloud-only’ execution of State-of-the-Art DNNs may not meet an application’s latency requirements due to the limited compute, memory, and energy resources in edge devices, dynamically varying bandwidth of edge-cloud connectivity networks, and temporal variations in the computational load of cloud servers. This work investigates distributed (partitioned) inference across edge devices (mobile/end device) and cloud servers to minimize end-to-end DNN inference latency. We study the impact of temporally varying operating conditions and the underlying compute and communication architecture on the decision of whether to run the inference solely on the edge, entirely in the cloud, or by partitioning the DNN model execution among the two. Leveraging the insights gained from this study and the wide variation in the capabilities of various edge platforms that run DNN inference, we propose PArtNNer , a platform-agnostic adaptive DNN partitioning algorithm that finds the optimal partitioning point in DNNs to minimize inference latency. PArtNNer can adapt to dynamic variations in communication bandwidth and cloud server load without requiring pre-characterization of underlying platforms. Experimental results for six image classification and object detection DNNs on a set of five commercial off-the-shelf compute platforms and three communication standards indicate that PArtNNer results in 10.2 × and 3.2 × (on average) and up to 21.1 × and 6.7 × improvements in end-to-end inference latency compared to execution of the DNN entirely on the edge device or entirely on a cloud server, respectively. Compared to pre-characterization-based partitioning approaches, PArtNNer converges to the optimal partitioning point 17.6 × faster.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

合作伙伴:平台无关的自适应边缘云DNN分区，以最大限度地减少端到端延迟

在过去的十年中，深度神经网络(dnn)已经成为各种计算机视觉应用的实际算法。在智能边缘设备中，设备获取的传感器数据流由运行在边缘设备本身或云中的DNN应用程序处理。然而，由于边缘设备中有限的计算、内存和能源资源、边缘云连接网络的动态变化带宽以及云服务器计算负载的时间变化，“仅边缘”和“仅云”执行最先进的dnn可能无法满足应用程序的延迟要求。这项工作研究了跨边缘设备(移动/端设备)和云服务器的分布式(分区)推理，以最大限度地减少端到端DNN推理延迟。我们研究了临时变化的操作条件以及底层计算和通信架构对是否仅在边缘上运行推理，完全在云中运行，或在两者之间划分DNN模型执行的决定的影响。利用从本研究中获得的见解以及运行DNN推理的各种边缘平台能力的广泛差异，我们提出了partnerner，这是一种与平台无关的自适应DNN划分算法，可在DNN中找到最佳划分点以最小化推理延迟。partnerner可以适应通信带宽和云服务器负载的动态变化，而无需预先描述底层平台。在一组五种商用现有计算平台和三种通信标准上对六种图像分类和目标检测DNN进行的实验结果表明，与完全在边缘设备上或完全在云服务器上执行DNN相比，partnerner的端到端推理延迟分别提高了10.2倍和3.2倍(平均)，高达21.1倍和6.7倍。与基于预特征的分区方法相比，partnerner收敛到最优分区点的速度快17.6倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Embedded Computing Systems 工程技术-计算机：软件工程

CiteScore

3.70

自引率

0.00%

发文量

138

审稿时长

6 months

期刊介绍： The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.