Yanming Chen, Tong Luo, Weiwei Fang, Neal N. Xiong
{"title":"EdgeCI: Distributed Workload Assignment and Model Partitioning for CNN Inference on Edge Clusters","authors":"Yanming Chen, Tong Luo, Weiwei Fang, Neal N. Xiong","doi":"10.1145/3656041","DOIUrl":null,"url":null,"abstract":"<p>Deep learning technology has grown significantly in new application scenarios such as smart cities and driverless vehicles, but its deployment needs to consume a lot of resources. It is usually difficult to execute inference task solely on resource-constrained Intelligent Internet-of-Things (IoT) devices to meet strictly service delay requirements. CNN-based inference task is usually offloaded to the edge servers or cloud. However, it maybe lead to unstable performance and privacy leaks. To address the above challenges, this paper aims to design a low latency distributed inference framework, EdgeCI, which assigns inference tasks to locally idle, connected and resource-constrained IoT device cluster networks. EdgeCI exploits two key optimization knobs, including: (1) Auction-based Workload Assignment Scheme (AWAS), which achieves the workload balance by assigning each workload partition to the more matching IoT device; (2) Fused-Layer parallelization strategy based on non-recursive Dynamic Programming (DPFL), which is aimed at further minimizing the inference time. We have implemented EdgeCI based on PyTorch and evaluated its performance with VGG-16 and ResNet-34 image recognition models. The experimental results prove that our proposed AWAS and DPFL outperform the typical state-of-the-art solutions. When they are well combined, EdgeCI can improve inference speed by 34.72% to 43.52%. EdgeCI outperforms the state-of-the art approaches on the tested platform.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"24 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3656041","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning technology has grown significantly in new application scenarios such as smart cities and driverless vehicles, but its deployment needs to consume a lot of resources. It is usually difficult to execute inference task solely on resource-constrained Intelligent Internet-of-Things (IoT) devices to meet strictly service delay requirements. CNN-based inference task is usually offloaded to the edge servers or cloud. However, it maybe lead to unstable performance and privacy leaks. To address the above challenges, this paper aims to design a low latency distributed inference framework, EdgeCI, which assigns inference tasks to locally idle, connected and resource-constrained IoT device cluster networks. EdgeCI exploits two key optimization knobs, including: (1) Auction-based Workload Assignment Scheme (AWAS), which achieves the workload balance by assigning each workload partition to the more matching IoT device; (2) Fused-Layer parallelization strategy based on non-recursive Dynamic Programming (DPFL), which is aimed at further minimizing the inference time. We have implemented EdgeCI based on PyTorch and evaluated its performance with VGG-16 and ResNet-34 image recognition models. The experimental results prove that our proposed AWAS and DPFL outperform the typical state-of-the-art solutions. When they are well combined, EdgeCI can improve inference speed by 34.72% to 43.52%. EdgeCI outperforms the state-of-the art approaches on the tested platform.
期刊介绍:
ACM Transactions on Internet Technology (TOIT) brings together many computing disciplines including computer software engineering, computer programming languages, middleware, database management, security, knowledge discovery and data mining, networking and distributed systems, communications, performance and scalability etc. TOIT will cover the results and roles of the individual disciplines and the relationshipsamong them.