利用监督学习在太赫兹无人机辅助网络中进行目标导向强化学习

IF 6.3 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Open Journal of the Communications Society Pub Date : 2024-08-12 DOI:10.1109/OJCOMS.2024.3442709

Atefeh Termehchi;Tingnan Bao;Aisha Syed;William Sean Kennedy;Melike Erol-Kantarci

{"title":"利用监督学习在太赫兹无人机辅助网络中进行目标导向强化学习","authors":"Atefeh Termehchi;Tingnan Bao;Aisha Syed;William Sean Kennedy;Melike Erol-Kantarci","doi":"10.1109/OJCOMS.2024.3442709","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning (DRL) has been a key machine learning technique in many 5G and 6G applications. DRL agents learn optimal (or sub-optimal) policies by interacting with the environment. However, this process often involves numerous uninformative and repetitive message transmissions between the DRL agent and its environment. In this paper, we address the problem of reducing interactions between the DRL agent and the environment, called goal-oriented DRL. Meanwhile, Terahertz (THz) bands and unmanned aerial vehicles (UAVs) are considered two of the main enablers of 6G. Therefore, we investigate the goal-oriented DRL problem in a THz-enabled UAV-aided network. We formulate it as an optimization problem with the goals of i) reducing interactions between the UAV (DRL agent) and IoT devices (environment), ii) maximizing the number of served IoT devices, and iii) ensuring fairness. The constraints include the movement characteristics of IoT devices, the maximum speed limitation of the UAV, the QoS requirements of the served IoT devices, and the limited uplink coverage of the THz-enabled UAV. This problem is a mixed-integer nonlinear programming optimization problem and is NP-hard. To address this problem, we employ the decoupling optimization method and an approach inspired by the self-triggered method from control engineering. Specifically, the problem is divided into two sub-problems; Then, we propose using supervised learning as a teacher for DRL to reduce the interactions. Our simulation results show that the goal-oriented DRL approach outperforms conventional methods by reducing interactions and maintaining good performance in terms of the number of served IoT devices and fairness.","PeriodicalId":33803,"journal":{"name":"IEEE Open Journal of the Communications Society","volume":null,"pages":null},"PeriodicalIF":6.3000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10634216","citationCount":"0","resultStr":"{\"title\":\"Goal-Oriented Reinforcement Learning in THz-Enabled UAV-Aided Network Using Supervised Learning\",\"authors\":\"Atefeh Termehchi;Tingnan Bao;Aisha Syed;William Sean Kennedy;Melike Erol-Kantarci\",\"doi\":\"10.1109/OJCOMS.2024.3442709\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep reinforcement learning (DRL) has been a key machine learning technique in many 5G and 6G applications. DRL agents learn optimal (or sub-optimal) policies by interacting with the environment. However, this process often involves numerous uninformative and repetitive message transmissions between the DRL agent and its environment. In this paper, we address the problem of reducing interactions between the DRL agent and the environment, called goal-oriented DRL. Meanwhile, Terahertz (THz) bands and unmanned aerial vehicles (UAVs) are considered two of the main enablers of 6G. Therefore, we investigate the goal-oriented DRL problem in a THz-enabled UAV-aided network. We formulate it as an optimization problem with the goals of i) reducing interactions between the UAV (DRL agent) and IoT devices (environment), ii) maximizing the number of served IoT devices, and iii) ensuring fairness. The constraints include the movement characteristics of IoT devices, the maximum speed limitation of the UAV, the QoS requirements of the served IoT devices, and the limited uplink coverage of the THz-enabled UAV. This problem is a mixed-integer nonlinear programming optimization problem and is NP-hard. To address this problem, we employ the decoupling optimization method and an approach inspired by the self-triggered method from control engineering. Specifically, the problem is divided into two sub-problems; Then, we propose using supervised learning as a teacher for DRL to reduce the interactions. Our simulation results show that the goal-oriented DRL approach outperforms conventional methods by reducing interactions and maintaining good performance in terms of the number of served IoT devices and fairness.\",\"PeriodicalId\":33803,\"journal\":{\"name\":\"IEEE Open Journal of the Communications Society\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2024-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10634216\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Open Journal of the Communications Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10634216/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Communications Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10634216/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

深度强化学习（DRL）是许多 5G 和 6G 应用中的关键机器学习技术。DRL 代理通过与环境交互来学习最优（或次优）策略。然而，在这一过程中，DRL 代理与其环境之间往往需要进行大量无信息的重复信息传输。在本文中，我们要解决的问题是减少 DRL 代理与环境之间的交互，即所谓的 "目标导向 DRL"。与此同时，太赫兹（THz）频段和无人机（UAV）被认为是 6G 的两个主要推动因素。因此，我们研究了太赫兹无人机辅助网络中面向目标的 DRL 问题。我们将其表述为一个优化问题，其目标是 i) 减少无人机（DRL 代理）与物联网设备（环境）之间的交互；ii) 使服务的物联网设备数量最大化；iii) 确保公平性。约束条件包括物联网设备的移动特性、无人机的最大速度限制、所服务物联网设备的 QoS 要求以及太赫兹无人机有限的上行链路覆盖范围。该问题是一个混合整数非线性编程优化问题，具有 NP 难度。为了解决这个问题，我们采用了解耦优化方法和受控制工程中自触发方法启发的方法。具体来说，该问题被分为两个子问题；然后，我们提出使用监督学习作为 DRL 的教师，以减少交互。我们的仿真结果表明，以目标为导向的 DRL 方法优于传统方法，不仅减少了交互，还在服务的物联网设备数量和公平性方面保持了良好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Goal-Oriented Reinforcement Learning in THz-Enabled UAV-Aided Network Using Supervised Learning

Deep reinforcement learning (DRL) has been a key machine learning technique in many 5G and 6G applications. DRL agents learn optimal (or sub-optimal) policies by interacting with the environment. However, this process often involves numerous uninformative and repetitive message transmissions between the DRL agent and its environment. In this paper, we address the problem of reducing interactions between the DRL agent and the environment, called goal-oriented DRL. Meanwhile, Terahertz (THz) bands and unmanned aerial vehicles (UAVs) are considered two of the main enablers of 6G. Therefore, we investigate the goal-oriented DRL problem in a THz-enabled UAV-aided network. We formulate it as an optimization problem with the goals of i) reducing interactions between the UAV (DRL agent) and IoT devices (environment), ii) maximizing the number of served IoT devices, and iii) ensuring fairness. The constraints include the movement characteristics of IoT devices, the maximum speed limitation of the UAV, the QoS requirements of the served IoT devices, and the limited uplink coverage of the THz-enabled UAV. This problem is a mixed-integer nonlinear programming optimization problem and is NP-hard. To address this problem, we employ the decoupling optimization method and an approach inspired by the self-triggered method from control engineering. Specifically, the problem is divided into two sub-problems; Then, we propose using supervised learning as a teacher for DRL to reduce the interactions. Our simulation results show that the goal-oriented DRL approach outperforms conventional methods by reducing interactions and maintaining good performance in terms of the number of served IoT devices and fairness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Open Journal of the Communications Society Multiple-

CiteScore

13.70

自引率

3.80%

发文量

审稿时长

10 weeks

期刊介绍： The IEEE Open Journal of the Communications Society (OJ-COMS) is an open access, all-electronic journal that publishes original high-quality manuscripts on advances in the state of the art of telecommunications systems and networks. The papers in IEEE OJ-COMS are included in Scopus. Submissions reporting new theoretical findings (including novel methods, concepts, and studies) and practical contributions (including experiments and development of prototypes) are welcome. Additionally, survey and tutorial articles are considered. The IEEE OJCOMS received its debut impact factor of 7.9 according to the Journal Citation Reports (JCR) 2023. The IEEE Open Journal of the Communications Society covers science, technology, applications and standards for information organization, collection and transfer using electronic, optical and wireless channels and networks. Some specific areas covered include: Systems and network architecture, control and management Protocols, software, and middleware Quality of service, reliability, and security Modulation, detection, coding, and signaling Switching and routing Mobile and portable communications Terminals and other end-user devices Networks for content distribution and distributed computing Communications-based distributed resources control.