BPS：批处理、流水线作业、协作边缘智能连续深度推理的外科医生

IF 5.3 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Transactions on Cloud Computing Pub Date : 2024-03-10 DOI:10.1109/TCC.2024.3399616

Xueyu Hou;Yongjie Guan;Nakjung Choi;Tao Han

{"title":"BPS：批处理、流水线作业、协作边缘智能连续深度推理的外科医生","authors":"Xueyu Hou;Yongjie Guan;Nakjung Choi;Tao Han","doi":"10.1109/TCC.2024.3399616","DOIUrl":null,"url":null,"abstract":"Users on edge generate deep inference requests continuously over time. Mobile/edge devices located near users can undertake the computation of inference locally for users, e.g., the embedded edge device on an autonomous vehicle. Due to limited computing resources on one mobile/edge device, it may be challenging to process the inference requests from users with high throughput. An attractive solution is to (partially) offload the computation to a remote device in the network. In this paper, we examine the existing inference execution solutions across local and remote devices and propose an adaptive scheduler, a BPS scheduler, for continuous deep inference on collaborative edge intelligence. By leveraging data parallel, neurosurgeon, reinforcement learning techniques, BPS can boost the overall inference performance by up to \n<inline-formula><tex-math>$8.2 \\times$</tex-math></inline-formula>\n over the baseline schedulers. A lightweight compressor, FF, specialized in compressing intermediate output data for neurosurgeon, is proposed and integrated into the BPS scheduler. FF exploits the operating character of convolutional layers and utilizes efficient approximation algorithms. Compared to existing compression methods, FF achieves up to 86.9% lower accuracy loss and up to 83.6% lower latency overhead.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":null,"pages":null},"PeriodicalIF":5.3000,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BPS: Batching, Pipelining, Surgeon of Continuous Deep Inference on Collaborative Edge Intelligence\",\"authors\":\"Xueyu Hou;Yongjie Guan;Nakjung Choi;Tao Han\",\"doi\":\"10.1109/TCC.2024.3399616\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Users on edge generate deep inference requests continuously over time. Mobile/edge devices located near users can undertake the computation of inference locally for users, e.g., the embedded edge device on an autonomous vehicle. Due to limited computing resources on one mobile/edge device, it may be challenging to process the inference requests from users with high throughput. An attractive solution is to (partially) offload the computation to a remote device in the network. In this paper, we examine the existing inference execution solutions across local and remote devices and propose an adaptive scheduler, a BPS scheduler, for continuous deep inference on collaborative edge intelligence. By leveraging data parallel, neurosurgeon, reinforcement learning techniques, BPS can boost the overall inference performance by up to \\n<inline-formula><tex-math>$8.2 \\\\times$</tex-math></inline-formula>\\n over the baseline schedulers. A lightweight compressor, FF, specialized in compressing intermediate output data for neurosurgeon, is proposed and integrated into the BPS scheduler. FF exploits the operating character of convolutional layers and utilizes efficient approximation algorithms. Compared to existing compression methods, FF achieves up to 86.9% lower accuracy loss and up to 83.6% lower latency overhead.\",\"PeriodicalId\":13202,\"journal\":{\"name\":\"IEEE Transactions on Cloud Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2024-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cloud Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10528796/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10528796/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

边缘用户会不断产生深度推理请求。位于用户附近的移动/边缘设备（如自动驾驶汽车上的嵌入式边缘设备）可以在本地为用户进行推理计算。由于单个移动/边缘设备的计算资源有限，要以高吞吐量处理用户的推理请求可能具有挑战性。一个有吸引力的解决方案是将计算（部分）卸载到网络中的远程设备上。在本文中，我们研究了现有的跨本地和远程设备的推理执行解决方案，并提出了一种自适应调度器--BPS 调度器，用于协作边缘智能上的连续深度推理。通过利用数据并行、神经外科、强化学习等技术，BPS 可以将整体推理性能比基线调度器提高多达 8.2 美元（times$）。我们提出了一种轻量级压缩器 FF，专门用于压缩神经外科的中间输出数据，并将其集成到 BPS 调度器中。FF 利用了卷积层的工作特性，并采用了高效的近似算法。与现有的压缩方法相比，FF 最多可降低 86.9% 的精度损失和 83.6% 的延迟开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

BPS: Batching, Pipelining, Surgeon of Continuous Deep Inference on Collaborative Edge Intelligence

Users on edge generate deep inference requests continuously over time. Mobile/edge devices located near users can undertake the computation of inference locally for users, e.g., the embedded edge device on an autonomous vehicle. Due to limited computing resources on one mobile/edge device, it may be challenging to process the inference requests from users with high throughput. An attractive solution is to (partially) offload the computation to a remote device in the network. In this paper, we examine the existing inference execution solutions across local and remote devices and propose an adaptive scheduler, a BPS scheduler, for continuous deep inference on collaborative edge intelligence. By leveraging data parallel, neurosurgeon, reinforcement learning techniques, BPS can boost the overall inference performance by up to

$8.2 \times$

over the baseline schedulers. A lightweight compressor, FF, specialized in compressing intermediate output data for neurosurgeon, is proposed and integrated into the BPS scheduler. FF exploits the operating character of convolutional layers and utilizes efficient approximation algorithms. Compared to existing compression methods, FF achieves up to 86.9% lower accuracy loss and up to 83.6% lower latency overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Cloud Computing Computer Science-Software

CiteScore

9.40

自引率

6.20%

发文量

167

期刊介绍： The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.