{"title":"利用交错算子分区为 CNN 进行合作推理","authors":"Zhibang Liu, Chaonong Xu, Zhizhuo Liu, Lekai Huang, Jiachen Wei, Chao Li","doi":"arxiv-2409.07693","DOIUrl":null,"url":null,"abstract":"Deploying deep learning models on Internet of Things (IoT) devices often\nfaces challenges due to limited memory resources and computing capabilities.\nCooperative inference is an important method for addressing this issue,\nrequiring the partitioning and distributive deployment of an intelligent model.\nTo perform horizontal partitions, existing cooperative inference methods take\neither the output channel of operators or the height and width of feature maps\nas the partition dimensions. In this manner, since the activation of operators\nis distributed, they have to be concatenated together before being fed to the\nnext operator, which incurs the delay for cooperative inference. In this paper,\nwe propose the Interleaved Operator Partitioning (IOP) strategy for CNN models.\nBy partitioning an operator based on the output channel dimension and its\nsuccessive operator based on the input channel dimension, activation\nconcatenation becomes unnecessary, thereby reducing the number of communication\nconnections, which consequently reduces cooperative inference de-lay. Based on\nIOP, we further present a model segmentation algorithm for minimizing\ncooperative inference time, which greedily selects operators for IOP pairing\nbased on the inference delay benefit harvested. Experimental results\ndemonstrate that compared with the state-of-the-art partition approaches used\nin CoEdge, the IOP strategy achieves 6.39% ~ 16.83% faster acceleration and\nreduces peak memory footprint by 21.22% ~ 49.98% for three classical image\nclassification models.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cooperative Inference with Interleaved Operator Partitioning for CNNs\",\"authors\":\"Zhibang Liu, Chaonong Xu, Zhizhuo Liu, Lekai Huang, Jiachen Wei, Chao Li\",\"doi\":\"arxiv-2409.07693\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deploying deep learning models on Internet of Things (IoT) devices often\\nfaces challenges due to limited memory resources and computing capabilities.\\nCooperative inference is an important method for addressing this issue,\\nrequiring the partitioning and distributive deployment of an intelligent model.\\nTo perform horizontal partitions, existing cooperative inference methods take\\neither the output channel of operators or the height and width of feature maps\\nas the partition dimensions. In this manner, since the activation of operators\\nis distributed, they have to be concatenated together before being fed to the\\nnext operator, which incurs the delay for cooperative inference. In this paper,\\nwe propose the Interleaved Operator Partitioning (IOP) strategy for CNN models.\\nBy partitioning an operator based on the output channel dimension and its\\nsuccessive operator based on the input channel dimension, activation\\nconcatenation becomes unnecessary, thereby reducing the number of communication\\nconnections, which consequently reduces cooperative inference de-lay. Based on\\nIOP, we further present a model segmentation algorithm for minimizing\\ncooperative inference time, which greedily selects operators for IOP pairing\\nbased on the inference delay benefit harvested. Experimental results\\ndemonstrate that compared with the state-of-the-art partition approaches used\\nin CoEdge, the IOP strategy achieves 6.39% ~ 16.83% faster acceleration and\\nreduces peak memory footprint by 21.22% ~ 49.98% for three classical image\\nclassification models.\",\"PeriodicalId\":501422,\"journal\":{\"name\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"volume\":\"45 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07693\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07693","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cooperative Inference with Interleaved Operator Partitioning for CNNs
Deploying deep learning models on Internet of Things (IoT) devices often
faces challenges due to limited memory resources and computing capabilities.
Cooperative inference is an important method for addressing this issue,
requiring the partitioning and distributive deployment of an intelligent model.
To perform horizontal partitions, existing cooperative inference methods take
either the output channel of operators or the height and width of feature maps
as the partition dimensions. In this manner, since the activation of operators
is distributed, they have to be concatenated together before being fed to the
next operator, which incurs the delay for cooperative inference. In this paper,
we propose the Interleaved Operator Partitioning (IOP) strategy for CNN models.
By partitioning an operator based on the output channel dimension and its
successive operator based on the input channel dimension, activation
concatenation becomes unnecessary, thereby reducing the number of communication
connections, which consequently reduces cooperative inference de-lay. Based on
IOP, we further present a model segmentation algorithm for minimizing
cooperative inference time, which greedily selects operators for IOP pairing
based on the inference delay benefit harvested. Experimental results
demonstrate that compared with the state-of-the-art partition approaches used
in CoEdge, the IOP strategy achieves 6.39% ~ 16.83% faster acceleration and
reduces peak memory footprint by 21.22% ~ 49.98% for three classical image
classification models.