徒弟胜过师傅针对跳频扩频的不完美演示辅助信任区域干扰策略优化

IF 4.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Computer Communications Pub Date : 2024-11-10 DOI:10.1016/j.comcom.2024.107993
Ning Rao, Hua Xu, Zisen Qi, Dan Wang, Yue Zhang, Xiang Peng, Lei Jiang
{"title":"徒弟胜过师傅针对跳频扩频的不完美演示辅助信任区域干扰策略优化","authors":"Ning Rao,&nbsp;Hua Xu,&nbsp;Zisen Qi,&nbsp;Dan Wang,&nbsp;Yue Zhang,&nbsp;Xiang Peng,&nbsp;Lei Jiang","doi":"10.1016/j.comcom.2024.107993","DOIUrl":null,"url":null,"abstract":"<div><div>Jamming decision-making is a pivotal component of modern electromagnetic warfare, wherein recent years have witnessed the extensive application of deep reinforcement learning techniques to enhance the autonomy and intelligence of wireless communication jamming decisions. However, existing researches heavily rely on manually designed customized jamming reward functions, leading to significant consumption of human and computational resources. To this end, under the premise of obviating designing task-customized reward functions, we propose a jamming policy optimization method that learns from imperfect demonstrations to effectively address the complex and high-dimensional jamming resource allocation problem against frequency hopping spread spectrum (FHSS) communication systems. To achieve this, a policy network is meticulously architected to consecutively ascertain jamming schemes for each jamming node, facilitating the construction of the dynamic transition within the Markov decision process. Subsequently, anchored in the dual-trust region concept, we design policy improvement and policy adversarial imitation phases. During the policy improvement phase, the trust region policy optimization method is utilized to refine the policy, while the policy adversarial imitation phase employs adversarial training to guide policy exploration using information embedded in demonstrations. Extensive simulation results indicate that our proposed method can approximate the optimal jamming performance trained under customized reward functions, even with rough binary reward settings, and also significantly surpass demonstration performance.</div></div>","PeriodicalId":55224,"journal":{"name":"Computer Communications","volume":"229 ","pages":"Article 107993"},"PeriodicalIF":4.5000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The pupil outdoes the master: Imperfect demonstration-assisted trust region jamming policy optimization against frequency-hopping spread spectrum\",\"authors\":\"Ning Rao,&nbsp;Hua Xu,&nbsp;Zisen Qi,&nbsp;Dan Wang,&nbsp;Yue Zhang,&nbsp;Xiang Peng,&nbsp;Lei Jiang\",\"doi\":\"10.1016/j.comcom.2024.107993\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Jamming decision-making is a pivotal component of modern electromagnetic warfare, wherein recent years have witnessed the extensive application of deep reinforcement learning techniques to enhance the autonomy and intelligence of wireless communication jamming decisions. However, existing researches heavily rely on manually designed customized jamming reward functions, leading to significant consumption of human and computational resources. To this end, under the premise of obviating designing task-customized reward functions, we propose a jamming policy optimization method that learns from imperfect demonstrations to effectively address the complex and high-dimensional jamming resource allocation problem against frequency hopping spread spectrum (FHSS) communication systems. To achieve this, a policy network is meticulously architected to consecutively ascertain jamming schemes for each jamming node, facilitating the construction of the dynamic transition within the Markov decision process. Subsequently, anchored in the dual-trust region concept, we design policy improvement and policy adversarial imitation phases. During the policy improvement phase, the trust region policy optimization method is utilized to refine the policy, while the policy adversarial imitation phase employs adversarial training to guide policy exploration using information embedded in demonstrations. Extensive simulation results indicate that our proposed method can approximate the optimal jamming performance trained under customized reward functions, even with rough binary reward settings, and also significantly surpass demonstration performance.</div></div>\",\"PeriodicalId\":55224,\"journal\":{\"name\":\"Computer Communications\",\"volume\":\"229 \",\"pages\":\"Article 107993\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2024-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Communications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0140366424003402\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Communications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0140366424003402","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

干扰决策是现代电磁战的关键组成部分,近年来,人们广泛应用深度强化学习技术来增强无线通信干扰决策的自主性和智能性。然而,现有研究严重依赖人工设计的定制干扰奖励函数,导致大量人力和计算资源的消耗。为此,在避免设计任务定制奖励函数的前提下,我们提出了一种干扰策略优化方法,该方法可从不完善的演示中学习,从而有效解决针对跳频扩频(FHSS)通信系统的复杂、高维干扰资源分配问题。为此,我们精心构建了一个策略网络,以连续确定每个干扰节点的干扰方案,从而促进马尔可夫决策过程中动态转换的构建。随后,我们以双信任区域概念为基础,设计了策略改进和策略对抗模仿阶段。在策略改进阶段,我们利用信任区域策略优化方法来完善策略;而在策略对抗模仿阶段,我们利用对抗训练来引导策略探索,并将信息嵌入到演示中。广泛的仿真结果表明,我们提出的方法即使在粗略的二进制奖励设置下,也能逼近在定制奖励函数下训练出的最佳干扰性能,而且还能显著超越演示性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
The pupil outdoes the master: Imperfect demonstration-assisted trust region jamming policy optimization against frequency-hopping spread spectrum
Jamming decision-making is a pivotal component of modern electromagnetic warfare, wherein recent years have witnessed the extensive application of deep reinforcement learning techniques to enhance the autonomy and intelligence of wireless communication jamming decisions. However, existing researches heavily rely on manually designed customized jamming reward functions, leading to significant consumption of human and computational resources. To this end, under the premise of obviating designing task-customized reward functions, we propose a jamming policy optimization method that learns from imperfect demonstrations to effectively address the complex and high-dimensional jamming resource allocation problem against frequency hopping spread spectrum (FHSS) communication systems. To achieve this, a policy network is meticulously architected to consecutively ascertain jamming schemes for each jamming node, facilitating the construction of the dynamic transition within the Markov decision process. Subsequently, anchored in the dual-trust region concept, we design policy improvement and policy adversarial imitation phases. During the policy improvement phase, the trust region policy optimization method is utilized to refine the policy, while the policy adversarial imitation phase employs adversarial training to guide policy exploration using information embedded in demonstrations. Extensive simulation results indicate that our proposed method can approximate the optimal jamming performance trained under customized reward functions, even with rough binary reward settings, and also significantly surpass demonstration performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer Communications
Computer Communications 工程技术-电信学
CiteScore
14.10
自引率
5.00%
发文量
397
审稿时长
66 days
期刊介绍: Computer and Communications networks are key infrastructures of the information society with high socio-economic value as they contribute to the correct operations of many critical services (from healthcare to finance and transportation). Internet is the core of today''s computer-communication infrastructures. This has transformed the Internet, from a robust network for data transfer between computers, to a global, content-rich, communication and information system where contents are increasingly generated by the users, and distributed according to human social relations. Next-generation network technologies, architectures and protocols are therefore required to overcome the limitations of the legacy Internet and add new capabilities and services. The future Internet should be ubiquitous, secure, resilient, and closer to human communication paradigms. Computer Communications is a peer-reviewed international journal that publishes high-quality scientific articles (both theory and practice) and survey papers covering all aspects of future computer communication networks (on all layers, except the physical layer), with a special attention to the evolution of the Internet architecture, protocols, services, and applications.
期刊最新文献
Editorial Board A deep dive into cybersecurity solutions for AI-driven IoT-enabled smart cities in advanced communication networks The pupil outdoes the master: Imperfect demonstration-assisted trust region jamming policy optimization against frequency-hopping spread spectrum High-performance BFT consensus for Metaverse through block linking and shortcut loop Automating 5G network slice management for industrial applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1