Cooperative Inference with Interleaved Operator Partitioning for CNNs

Zhibang Liu, Chaonong Xu, Zhizhuo Liu, Lekai Huang, Jiachen Wei, Chao Li
{"title":"Cooperative Inference with Interleaved Operator Partitioning for CNNs","authors":"Zhibang Liu, Chaonong Xu, Zhizhuo Liu, Lekai Huang, Jiachen Wei, Chao Li","doi":"arxiv-2409.07693","DOIUrl":null,"url":null,"abstract":"Deploying deep learning models on Internet of Things (IoT) devices often\nfaces challenges due to limited memory resources and computing capabilities.\nCooperative inference is an important method for addressing this issue,\nrequiring the partitioning and distributive deployment of an intelligent model.\nTo perform horizontal partitions, existing cooperative inference methods take\neither the output channel of operators or the height and width of feature maps\nas the partition dimensions. In this manner, since the activation of operators\nis distributed, they have to be concatenated together before being fed to the\nnext operator, which incurs the delay for cooperative inference. In this paper,\nwe propose the Interleaved Operator Partitioning (IOP) strategy for CNN models.\nBy partitioning an operator based on the output channel dimension and its\nsuccessive operator based on the input channel dimension, activation\nconcatenation becomes unnecessary, thereby reducing the number of communication\nconnections, which consequently reduces cooperative inference de-lay. Based on\nIOP, we further present a model segmentation algorithm for minimizing\ncooperative inference time, which greedily selects operators for IOP pairing\nbased on the inference delay benefit harvested. Experimental results\ndemonstrate that compared with the state-of-the-art partition approaches used\nin CoEdge, the IOP strategy achieves 6.39% ~ 16.83% faster acceleration and\nreduces peak memory footprint by 21.22% ~ 49.98% for three classical image\nclassification models.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Deploying deep learning models on Internet of Things (IoT) devices often faces challenges due to limited memory resources and computing capabilities. Cooperative inference is an important method for addressing this issue, requiring the partitioning and distributive deployment of an intelligent model. To perform horizontal partitions, existing cooperative inference methods take either the output channel of operators or the height and width of feature maps as the partition dimensions. In this manner, since the activation of operators is distributed, they have to be concatenated together before being fed to the next operator, which incurs the delay for cooperative inference. In this paper, we propose the Interleaved Operator Partitioning (IOP) strategy for CNN models. By partitioning an operator based on the output channel dimension and its successive operator based on the input channel dimension, activation concatenation becomes unnecessary, thereby reducing the number of communication connections, which consequently reduces cooperative inference de-lay. Based on IOP, we further present a model segmentation algorithm for minimizing cooperative inference time, which greedily selects operators for IOP pairing based on the inference delay benefit harvested. Experimental results demonstrate that compared with the state-of-the-art partition approaches used in CoEdge, the IOP strategy achieves 6.39% ~ 16.83% faster acceleration and reduces peak memory footprint by 21.22% ~ 49.98% for three classical image classification models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用交错算子分区为 CNN 进行合作推理
由于内存资源和计算能力有限,在物联网(IoT)设备上部署深度学习模型经常面临挑战。合作推理是解决这一问题的重要方法,需要对智能模型进行分区和分布式部署。这样一来,由于算子的激活是分布式的,因此必须先将算子的激活串联起来,然后再输送给下一个算子,这就造成了合作推理的延迟。在本文中,我们提出了针对 CNN 模型的交错算子分区(IOP)策略。通过根据输出通道维度对算子进行分区,并根据输入通道维度对其后继算子进行分区,激活串联变得没有必要,从而减少了通信连接的数量,进而减少了合作推理的延迟。在 IOP 的基础上,我们进一步提出了一种用于最小化合作推理时间的模型分割算法,该算法基于所收获的推理延迟收益,贪婪地为 IOP 配对选择算子。实验结果表明,与 CoEdge 中使用的最先进的分割方法相比,IOP 策略在三个经典图像分类模型中实现了 6.39% ~ 16.83% 的加速,并将峰值内存占用减少了 21.22% ~ 49.98%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1