基于多智能体深度强化学习的窃听博弈

2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC) Pub Date : 2022-07-04 DOI:10.1109/spawc51304.2022.9833927

Delin Guo, Lan Tang, Luxi Yang, Ying-Chang Liang

{"title":"基于多智能体深度强化学习的窃听博弈","authors":"Delin Guo, Lan Tang, Luxi Yang, Ying-Chang Liang","doi":"10.1109/spawc51304.2022.9833927","DOIUrl":null,"url":null,"abstract":"This paper considers an adversarial scenario between a legitimate eavesdropper and a suspicious communication pair. All three nodes are equipped with multiple antennas. The eavesdropper, which operates in a full-duplex model, aims to wiretap the dubious communication pair via proactive jamming. On the other hand, the suspicious transmitter, which can send artificial noise (AN) to disturb the wiretap channel, aims to guarantee secrecy. More specifically, the eavesdropper adjusts jamming power to enhance the wiretap rate, while the suspicious transmitter jointly adapts the transmit power and noise power against the eavesdropping. Considering the partial observation and complicated interactions between the eavesdropper and the suspicious pair in unknown system dynamics, we model the problem as an imperfect-information stochastic game. To approach the Nash equilibrium solution of the eavesdropping game, we develop a multi-agent reinforcement learning (MARL) algorithm, termed neural fictitious self-play with soft actor-critic (NFSP-SAC), by combining the fictitious self-play (FSP) with a deep reinforcement learning algorithm, SAC. The introduction of SAC enables FSP to handle the problems with continuous and high dimension observation and action space. The simulation results demonstrate that the power allocation policies learned by our method empirically converge to a Nash equilibrium, while the compared reinforcement learning algorithms suffer from severe fluctuations during the learning process.","PeriodicalId":423807,"journal":{"name":"2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Eavesdropping Game Based on Multi-Agent Deep Reinforcement Learning\",\"authors\":\"Delin Guo, Lan Tang, Luxi Yang, Ying-Chang Liang\",\"doi\":\"10.1109/spawc51304.2022.9833927\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper considers an adversarial scenario between a legitimate eavesdropper and a suspicious communication pair. All three nodes are equipped with multiple antennas. The eavesdropper, which operates in a full-duplex model, aims to wiretap the dubious communication pair via proactive jamming. On the other hand, the suspicious transmitter, which can send artificial noise (AN) to disturb the wiretap channel, aims to guarantee secrecy. More specifically, the eavesdropper adjusts jamming power to enhance the wiretap rate, while the suspicious transmitter jointly adapts the transmit power and noise power against the eavesdropping. Considering the partial observation and complicated interactions between the eavesdropper and the suspicious pair in unknown system dynamics, we model the problem as an imperfect-information stochastic game. To approach the Nash equilibrium solution of the eavesdropping game, we develop a multi-agent reinforcement learning (MARL) algorithm, termed neural fictitious self-play with soft actor-critic (NFSP-SAC), by combining the fictitious self-play (FSP) with a deep reinforcement learning algorithm, SAC. The introduction of SAC enables FSP to handle the problems with continuous and high dimension observation and action space. The simulation results demonstrate that the power allocation policies learned by our method empirically converge to a Nash equilibrium, while the compared reinforcement learning algorithms suffer from severe fluctuations during the learning process.\",\"PeriodicalId\":423807,\"journal\":{\"name\":\"2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC)\",\"volume\":\"133 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/spawc51304.2022.9833927\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/spawc51304.2022.9833927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文考虑了合法窃听者和可疑通信对之间的对抗场景。三个节点均配置多天线。该窃听器工作在全双工模式下，旨在通过主动干扰窃听可疑通信对。另一方面，可疑发射机通过发射人工噪声干扰窃听信道，以保证窃听信道的保密性。具体来说，窃听者通过调整干扰功率来提高窃听率，而可疑发射方则通过调整发射功率和噪声功率来共同对抗窃听。考虑到未知系统动力学条件下偷听者与可疑对之间的局部观察和复杂的相互作用，将该问题建模为不完全信息随机博弈。为了接近窃听博弈的纳什均衡解，我们通过将虚拟自我博弈(FSP)与深度强化学习算法SAC相结合，开发了一种多智能体强化学习(MARL)算法，称为神经虚拟自我博弈与软行为评论(NFSP-SAC)。SAC的引入使FSP能够处理具有连续高维观测和行动空间的问题。仿真结果表明，本文方法学习到的权力分配策略经验收敛于纳什均衡，而与之比较的强化学习算法在学习过程中存在较大的波动。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Eavesdropping Game Based on Multi-Agent Deep Reinforcement Learning

This paper considers an adversarial scenario between a legitimate eavesdropper and a suspicious communication pair. All three nodes are equipped with multiple antennas. The eavesdropper, which operates in a full-duplex model, aims to wiretap the dubious communication pair via proactive jamming. On the other hand, the suspicious transmitter, which can send artificial noise (AN) to disturb the wiretap channel, aims to guarantee secrecy. More specifically, the eavesdropper adjusts jamming power to enhance the wiretap rate, while the suspicious transmitter jointly adapts the transmit power and noise power against the eavesdropping. Considering the partial observation and complicated interactions between the eavesdropper and the suspicious pair in unknown system dynamics, we model the problem as an imperfect-information stochastic game. To approach the Nash equilibrium solution of the eavesdropping game, we develop a multi-agent reinforcement learning (MARL) algorithm, termed neural fictitious self-play with soft actor-critic (NFSP-SAC), by combining the fictitious self-play (FSP) with a deep reinforcement learning algorithm, SAC. The introduction of SAC enables FSP to handle the problems with continuous and high dimension observation and action space. The simulation results demonstrate that the power allocation policies learned by our method empirically converge to a Nash equilibrium, while the compared reinforcement learning algorithms suffer from severe fluctuations during the learning process.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC)

自引率

0.00%

发文量

期刊最新文献

Secure Multi-Antenna Coded Caching Deep Transfer Learning Based Radio Map Estimation for Indoor Wireless Communications A New Outage Probability Bound for IR-HARQ and Its Application to Power Adaptation SPAWC 2022 Cover Page A Sequential Experience-driven Contextual Bandit Policy for MIMO TWAF Online Relay Selection