{"title":"边缘分布式系统的低内存高性能CNN推理","authors":"Erqian Tang, T. Stefanov","doi":"10.1145/3492323.3495629","DOIUrl":null,"url":null,"abstract":"Nowadays, some applications need CNN inference on resource-constrained edge devices that may have very limited memory and computation capacity to fit a large CNN model. In such application scenarios, to deploy a large CNN model and perform inference on a single edge device is not feasible. A possible solution approach is to deploy a large CNN model on a (fully) distributed system at the edge and take advantage of all available edge devices to cooperatively perform the CNN inference. We have observed that existing methodologies, utilizing different partitioning strategies to deploy a CNN model and perform inference at the edge on a distributed system, have several disadvantages. Therefore, in this paper, we propose a novel partitioning strategy, called Vertical Partitioning Strategy, together with a novel methodology needed to utilize our partitioning strategy efficiently, for CNN model inference on a distributed system at the edge. We compare our experimental results on the YOLOv2 CNN model with results obtained by the existing three methodologies and show the advantages of our methodologies in terms of memory requirement per edge device and overall system performance. Moreover, our experimental results on other representative CNN models show that our novel methodology utilizing our novel partitioning strategy is able to deliver CNN inference with very reduced memory requirement per edge device and improved overall system performance at the same time.","PeriodicalId":440884,"journal":{"name":"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Low-memory and high-performance CNN inference on distributed systems at the edge\",\"authors\":\"Erqian Tang, T. Stefanov\",\"doi\":\"10.1145/3492323.3495629\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, some applications need CNN inference on resource-constrained edge devices that may have very limited memory and computation capacity to fit a large CNN model. In such application scenarios, to deploy a large CNN model and perform inference on a single edge device is not feasible. A possible solution approach is to deploy a large CNN model on a (fully) distributed system at the edge and take advantage of all available edge devices to cooperatively perform the CNN inference. We have observed that existing methodologies, utilizing different partitioning strategies to deploy a CNN model and perform inference at the edge on a distributed system, have several disadvantages. Therefore, in this paper, we propose a novel partitioning strategy, called Vertical Partitioning Strategy, together with a novel methodology needed to utilize our partitioning strategy efficiently, for CNN model inference on a distributed system at the edge. We compare our experimental results on the YOLOv2 CNN model with results obtained by the existing three methodologies and show the advantages of our methodologies in terms of memory requirement per edge device and overall system performance. Moreover, our experimental results on other representative CNN models show that our novel methodology utilizing our novel partitioning strategy is able to deliver CNN inference with very reduced memory requirement per edge device and improved overall system performance at the same time.\",\"PeriodicalId\":440884,\"journal\":{\"name\":\"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3492323.3495629\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3492323.3495629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Low-memory and high-performance CNN inference on distributed systems at the edge
Nowadays, some applications need CNN inference on resource-constrained edge devices that may have very limited memory and computation capacity to fit a large CNN model. In such application scenarios, to deploy a large CNN model and perform inference on a single edge device is not feasible. A possible solution approach is to deploy a large CNN model on a (fully) distributed system at the edge and take advantage of all available edge devices to cooperatively perform the CNN inference. We have observed that existing methodologies, utilizing different partitioning strategies to deploy a CNN model and perform inference at the edge on a distributed system, have several disadvantages. Therefore, in this paper, we propose a novel partitioning strategy, called Vertical Partitioning Strategy, together with a novel methodology needed to utilize our partitioning strategy efficiently, for CNN model inference on a distributed system at the edge. We compare our experimental results on the YOLOv2 CNN model with results obtained by the existing three methodologies and show the advantages of our methodologies in terms of memory requirement per edge device and overall system performance. Moreover, our experimental results on other representative CNN models show that our novel methodology utilizing our novel partitioning strategy is able to deliver CNN inference with very reduced memory requirement per edge device and improved overall system performance at the same time.