通过进化多目标深度强化学习实现地面空间协同通信

IEEE journal on selected areas in communications : a publication of the IEEE Communications Society Pub Date : 2024-09-12 DOI:10.1109/JSAC.2024.3459029

Jiahui Li;Geng Sun;Qingqing Wu;Dusit Niyato;Jiawen Kang;Abbas Jamalipour;Victor C. M. Leung

{"title":"通过进化多目标深度强化学习实现地面空间协同通信","authors":"Jiahui Li;Geng Sun;Qingqing Wu;Dusit Niyato;Jiawen Kang;Abbas Jamalipour;Victor C. M. Leung","doi":"10.1109/JSAC.2024.3459029","DOIUrl":null,"url":null,"abstract":"Low Earth Orbit (LEO) satellites have emerged as crucial enablers of direct connections with remote terrestrial terminals. However, energy limitations and insufficient antenna capabilities at the terminals often hamper these connections, resulting in inefficient communications and frequent ping-pong handovers. This paper proposes a Distributed Collaborative Beamforming (DCB)-based uplink communication paradigm for enabling ground-space direct communications. Specifically, DCB treats the terminals that are unable to establish efficient direct connections with the LEO satellites as distributed antennas, forming a virtual antenna array to enhance the terminal-to-satellite uplink achievable rates and durations. However, such systems need multiple trade-off policies that jointly balance the terminal-satellite uplink achievable rate, energy consumption of terminals, and satellite switching frequency to satisfy the scenario requirement changes. Thus, we formulate a long-term multi-objective optimization problem to optimize these goals simultaneously. To address availability in different terminal cluster scales, we reformulate this problem into an action space-reduced and universal Multi-Objective Markov Decision Process (MOMDP). Then, we propose an Evolutionary Multi-Objective Deep Reinforcement Learning (EMODRL) algorithm to obtain multiple policies, in which the low-value actions are masked to speed up the training process. Simulation results show that DCB enables terminals that cannot reach the uplink achievable rate threshold to achieve efficient direct uplink transmission. Moreover, the proposed algorithm outmatches various baselines and saves 30% handover frequency with a similar uplink achievable rate compared with the rate greedy method, which thus reveals that the proposed method is an effective solution for enabling direct ground-space communications.","PeriodicalId":73294,"journal":{"name":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","volume":"42 12","pages":"3395-3411"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Collaborative Ground-Space Communications via Evolutionary Multi-Objective Deep Reinforcement Learning\",\"authors\":\"Jiahui Li;Geng Sun;Qingqing Wu;Dusit Niyato;Jiawen Kang;Abbas Jamalipour;Victor C. M. Leung\",\"doi\":\"10.1109/JSAC.2024.3459029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Low Earth Orbit (LEO) satellites have emerged as crucial enablers of direct connections with remote terrestrial terminals. However, energy limitations and insufficient antenna capabilities at the terminals often hamper these connections, resulting in inefficient communications and frequent ping-pong handovers. This paper proposes a Distributed Collaborative Beamforming (DCB)-based uplink communication paradigm for enabling ground-space direct communications. Specifically, DCB treats the terminals that are unable to establish efficient direct connections with the LEO satellites as distributed antennas, forming a virtual antenna array to enhance the terminal-to-satellite uplink achievable rates and durations. However, such systems need multiple trade-off policies that jointly balance the terminal-satellite uplink achievable rate, energy consumption of terminals, and satellite switching frequency to satisfy the scenario requirement changes. Thus, we formulate a long-term multi-objective optimization problem to optimize these goals simultaneously. To address availability in different terminal cluster scales, we reformulate this problem into an action space-reduced and universal Multi-Objective Markov Decision Process (MOMDP). Then, we propose an Evolutionary Multi-Objective Deep Reinforcement Learning (EMODRL) algorithm to obtain multiple policies, in which the low-value actions are masked to speed up the training process. Simulation results show that DCB enables terminals that cannot reach the uplink achievable rate threshold to achieve efficient direct uplink transmission. Moreover, the proposed algorithm outmatches various baselines and saves 30% handover frequency with a similar uplink achievable rate compared with the rate greedy method, which thus reveals that the proposed method is an effective solution for enabling direct ground-space communications.\",\"PeriodicalId\":73294,\"journal\":{\"name\":\"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society\",\"volume\":\"42 12\",\"pages\":\"3395-3411\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10679228/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10679228/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

低地球轨道（LEO）卫星已成为与远程地面终端直接连接的关键推动者。然而，终端的能量限制和天线能力不足常常阻碍这些连接，导致通信效率低下和频繁的乒乓切换。本文提出了一种基于分布式协同波束形成（DCB）的上行通信范式，用于实现地面空间直接通信。具体而言，DCB将无法与LEO卫星建立有效直接连接的终端视为分布式天线，形成虚拟天线阵列，以提高终端到卫星上行可达速率和持续时间。然而，这种系统需要多个权衡策略，共同平衡终端-卫星上行可达速率、终端能耗和卫星交换频率，以满足场景需求的变化。因此，我们制定了一个长期的多目标优化问题，以同时优化这些目标。为了解决不同终端集群尺度下的可用性问题，我们将这个问题重新表述为一个行动空间缩减的通用多目标马尔可夫决策过程（MOMDP）。然后，我们提出了一种进化多目标深度强化学习（EMODRL）算法来获得多个策略，其中低值动作被掩盖以加快训练过程。仿真结果表明，DCB可以使无法达到上行可达速率阈值的终端实现高效的上行直连传输。此外，该算法优于各种基线，在上行可达速率相似的情况下，与速率贪婪方法相比节省了30%的切换频率，表明该方法是实现地空直接通信的有效解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Collaborative Ground-Space Communications via Evolutionary Multi-Objective Deep Reinforcement Learning

Low Earth Orbit (LEO) satellites have emerged as crucial enablers of direct connections with remote terrestrial terminals. However, energy limitations and insufficient antenna capabilities at the terminals often hamper these connections, resulting in inefficient communications and frequent ping-pong handovers. This paper proposes a Distributed Collaborative Beamforming (DCB)-based uplink communication paradigm for enabling ground-space direct communications. Specifically, DCB treats the terminals that are unable to establish efficient direct connections with the LEO satellites as distributed antennas, forming a virtual antenna array to enhance the terminal-to-satellite uplink achievable rates and durations. However, such systems need multiple trade-off policies that jointly balance the terminal-satellite uplink achievable rate, energy consumption of terminals, and satellite switching frequency to satisfy the scenario requirement changes. Thus, we formulate a long-term multi-objective optimization problem to optimize these goals simultaneously. To address availability in different terminal cluster scales, we reformulate this problem into an action space-reduced and universal Multi-Objective Markov Decision Process (MOMDP). Then, we propose an Evolutionary Multi-Objective Deep Reinforcement Learning (EMODRL) algorithm to obtain multiple policies, in which the low-value actions are masked to speed up the training process. Simulation results show that DCB enables terminals that cannot reach the uplink achievable rate threshold to achieve efficient direct uplink transmission. Moreover, the proposed algorithm outmatches various baselines and saves 30% handover frequency with a similar uplink achievable rate compared with the rate greedy method, which thus reveals that the proposed method is an effective solution for enabling direct ground-space communications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE journal on selected areas in communications : a publication of the IEEE Communications Society

自引率

0.00%

发文量

期刊最新文献

Table of Contents IEEE Communications Society Information Corrections to “Coverage Rate Analysis for Integrated Sensing and Communication Networks” IEEE Journal on Selected Areas in Communications Publication Information Guest Editorial: Integrated Ground-Air-Space Wireless Networks for 6G Mobile—Part II