Lyapunov-Assisted Decentralized Dynamic Offloading Strategy Based on Deep Reinforcement Learning

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Internet of Things Journal Pub Date : 2024-11-15 DOI:10.1109/JIOT.2024.3498839

Jingjing Wang;Hui Zhang;Xu Han;Jiaxiang Zhao;Jiangzhou Wang

{"title":"Lyapunov-Assisted Decentralized Dynamic Offloading Strategy Based on Deep Reinforcement Learning","authors":"Jingjing Wang;Hui Zhang;Xu Han;Jiaxiang Zhao;Jiangzhou Wang","doi":"10.1109/JIOT.2024.3498839","DOIUrl":null,"url":null,"abstract":"To enhance the edge offloading capabilities of massive Internet of Things (IoT) devices with limited resources, a novel task offloading algorithm, namely, reduced target deep deterministic policy gradient (RT-DDPG), is proposed, which can generate near-optimal offloading decisions on the user and edge server sides, especially in mobile edge computing (MEC) and multiuser multiple input multiple output (MIMO) scenarios. In the RT-DDPG algorithm, the combination of Lyapunov optimization and improved deep deterministic policy gradient (DDPG) not only reduces the Q-value estimation bias of the neural network, but also constrains the long-term stability of the queue and reduces buffering delay. Moreover, by placing the algorithm agent independently on the device side, each device can adaptively formulate a decentralized computing offloading strategy based on environmental information. The simulation results show that with the help of the RT-DDPG algorithm, the optimal dynamic offloading strategy can be learned in the continuous action space. Compared with traditional reinforcement learning and other greedy strategy algorithms, the RT-DDPG algorithm can reduce the long-term average computing cost of users by 50%.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 7","pages":"8368-8380"},"PeriodicalIF":8.9000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10753491/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

To enhance the edge offloading capabilities of massive Internet of Things (IoT) devices with limited resources, a novel task offloading algorithm, namely, reduced target deep deterministic policy gradient (RT-DDPG), is proposed, which can generate near-optimal offloading decisions on the user and edge server sides, especially in mobile edge computing (MEC) and multiuser multiple input multiple output (MIMO) scenarios. In the RT-DDPG algorithm, the combination of Lyapunov optimization and improved deep deterministic policy gradient (DDPG) not only reduces the Q-value estimation bias of the neural network, but also constrains the long-term stability of the queue and reduces buffering delay. Moreover, by placing the algorithm agent independently on the device side, each device can adaptively formulate a decentralized computing offloading strategy based on environmental information. The simulation results show that with the help of the RT-DDPG algorithm, the optimal dynamic offloading strategy can be learned in the continuous action space. Compared with traditional reinforcement learning and other greedy strategy algorithms, the RT-DDPG algorithm can reduce the long-term average computing cost of users by 50%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度强化学习的李亚普诺夫辅助分散动态卸载策略

为了增强资源有限的海量物联网（IoT）设备的边缘卸载能力，提出了一种新的任务卸载算法，即减少目标深度确定性策略梯度（RT-DDPG），该算法可以在用户和边缘服务器端生成接近最优的卸载决策，特别是在移动边缘计算（MEC）和多用户多输入多输出（MIMO）场景下。在RT-DDPG算法中，将Lyapunov优化与改进的深度确定性策略梯度（deep deterministic policy gradient， DDPG）相结合，既降低了神经网络的q值估计偏差，又约束了队列的长期稳定性，减少了缓冲延迟。此外，通过将算法代理独立放置在设备端，每个设备可以根据环境信息自适应制定分散的计算卸载策略。仿真结果表明，利用RT-DDPG算法可以在连续动作空间中学习到最优的动态卸载策略。与传统的强化学习和其他贪婪策略算法相比，RT-DDPG算法可以将用户的长期平均计算成本降低50%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Internet of Things Journal Computer Science-Information Systems

CiteScore

17.60

自引率

13.20%

发文量

1982

期刊介绍： The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.