Multi-Agent Double Deep Q-Learning for Fairness in Multiple-Access Underlay Cognitive Radio Networks

IEEE Transactions on Machine Learning in Communications and Networking Pub Date : 2024-04-18 DOI:10.1109/TMLCN.2024.3391216

Zain Ali;Zouheir Rezki;Hamid Sadjadpour

{"title":"Multi-Agent Double Deep Q-Learning for Fairness in Multiple-Access Underlay Cognitive Radio Networks","authors":"Zain Ali;Zouheir Rezki;Hamid Sadjadpour","doi":"10.1109/TMLCN.2024.3391216","DOIUrl":null,"url":null,"abstract":"Underlay Cognitive Radio (CR) systems were introduced to resolve the issue of spectrum scarcity in wireless communication. In CR systems, an unlicensed Secondary Transmitter (ST) shares the channel with a licensed Primary Transmitter (PT). Spectral efficiency of the CR systems can be further increased if multiple STs share the same channel. In underlay CR systems, the STs are required to keep interference at a low level to avoid outage at the primary system. The restriction on interference in underlay CR prevents some STs from transmitting while other STs may achieve high data rates, thus making the underlay CR network unfair. In this work, we consider the problem of achieving fairness in the rates of the STs. The considered optimization problem is non-convex in nature. The conventional iteration-based optimizers are time-consuming and may not converge when the considered problem is non-convex. To deal with the problem, we propose a deep-Q reinforcement learning (DQ-RL) framework that employs two separate deep neural networks for the computation and estimation of the Q-values which provides a fast solution and is robust to channel dynamic. The proposed technique achieves near optimal values of fairness while offering primary outage probability of less than 4%. Further, increasing the number of STs results in a linear increase in the computational complexity of the proposed framework. A comparison of several variants of the proposed scheme with the optimal solution is also presented. Finally, we present a novel cumulative reward framework and discuss how the combined-reward approach improves the performance of the communication system.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"2 ","pages":"580-595"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10504881","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10504881/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Underlay Cognitive Radio (CR) systems were introduced to resolve the issue of spectrum scarcity in wireless communication. In CR systems, an unlicensed Secondary Transmitter (ST) shares the channel with a licensed Primary Transmitter (PT). Spectral efficiency of the CR systems can be further increased if multiple STs share the same channel. In underlay CR systems, the STs are required to keep interference at a low level to avoid outage at the primary system. The restriction on interference in underlay CR prevents some STs from transmitting while other STs may achieve high data rates, thus making the underlay CR network unfair. In this work, we consider the problem of achieving fairness in the rates of the STs. The considered optimization problem is non-convex in nature. The conventional iteration-based optimizers are time-consuming and may not converge when the considered problem is non-convex. To deal with the problem, we propose a deep-Q reinforcement learning (DQ-RL) framework that employs two separate deep neural networks for the computation and estimation of the Q-values which provides a fast solution and is robust to channel dynamic. The proposed technique achieves near optimal values of fairness while offering primary outage probability of less than 4%. Further, increasing the number of STs results in a linear increase in the computational complexity of the proposed framework. A comparison of several variants of the proposed scheme with the optimal solution is also presented. Finally, we present a novel cumulative reward framework and discuss how the combined-reward approach improves the performance of the communication system.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多代理双深度 Q 学习促进多重接入下层认知无线电网络的公平性

底层认知无线电（CR）系统的出现是为了解决无线通信中频谱稀缺的问题。在认知无线电系统中，未获得许可的二级发射机（ST）与获得许可的一级发射机（PT）共享信道。如果多个 ST 共享同一信道，则可进一步提高 CR 系统的频谱效率。在下层 CR 系统中，ST 必须将干扰控制在较低水平，以避免主系统中断。在底层 CR 中，对干扰的限制使一些 ST 无法进行传输，而其他 ST 则可能实现很高的数据传输速率，从而使底层 CR 网络变得不公平。在这项工作中，我们考虑了如何实现 ST 速率公平性的问题。所考虑的优化问题在本质上是非凸的。传统的基于迭代的优化器非常耗时，而且当所考虑的问题是非凸问题时可能无法收敛。为解决这一问题，我们提出了一种深度 Q 强化学习（DQ-RL）框架，该框架采用两个独立的深度神经网络来计算和估计 Q 值，可提供快速解决方案，并对信道动态具有鲁棒性。所提出的技术可实现接近最优的公平值，同时提供小于 4% 的主中断概率。此外，增加 ST 的数量会导致拟议框架的计算复杂度线性增加。此外，我们还对所提方案的几种变体与最优解决方案进行了比较。最后，我们提出了一个新颖的累积奖励框架，并讨论了综合奖励方法如何提高通信系统的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Machine Learning in Communications and Networking

自引率

0.00%

发文量