Ziyu Meng , Shaogang Dai , Zhijin Zhao , Xueyi Ye , Shilian Zheng , Caiyi Lou , Xiaoniu Yang
{"title":"Intelligent decision-making for a “Three-Variable” frequency-hopping pattern based on OC-CDRL","authors":"Ziyu Meng , Shaogang Dai , Zhijin Zhao , Xueyi Ye , Shilian Zheng , Caiyi Lou , Xiaoniu Yang","doi":"10.1016/j.phycom.2024.102434","DOIUrl":null,"url":null,"abstract":"<div><p>The frequency hopping pattern of the existing frequency hopping communication system is not designed according to the electromagnetic interference environment, resulting in blind anti-jamming. Therefore, to address this problem, a “three-variable” frequency-hopping pattern is proposed, where the frequency, hopping rate, and instantaneous bandwidth of the frequency-hopping signal vary randomly based on the background electromagnetic interference. The decision-making problem of the “three-variable” frequency-hopping pattern is modeled as a Markov decision process (MDP) by constructing the state-action-reward tuple. The designed frequency varies continuously within a small frequency band selected from a pseudo-random sequence to alleviate the problem of dimension explosion in decision-making. At the same time, discrete values for the hopping rate and instantaneous bandwidth are designed. To solve this MDP problem efficiently, a combined deep reinforcement learning algorithm (OC-CDRL) based on optimistic exploration and conservative estimation is proposed, which combines the features of TD3 and D3QN algorithms and designs the corresponding states, actions, and rewards to deal with continuous and discrete action spaces, respectively. To address the problem that the D3QN algorithm tends to fall into local optimal solutions, an optimistic exploration strategy (OES) for action selection is proposed to improve the degree of exploration. Moreover, the loss function is improved by conservatively estimating state–action pairs outside the experience replay buffer, reducing the overestimation of the optimistic action-value function and increasing the stability and convergence of the algorithm. Comparative simulation results of the algorithms in different electromagnetic interference environments show that the OC-CDRL algorithm effectively avoids most regions with higher interference and has better adaptability and anti-jamming capability.</p></div>","PeriodicalId":48707,"journal":{"name":"Physical Communication","volume":"66 ","pages":"Article 102434"},"PeriodicalIF":2.0000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1874490724001526","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The frequency hopping pattern of the existing frequency hopping communication system is not designed according to the electromagnetic interference environment, resulting in blind anti-jamming. Therefore, to address this problem, a “three-variable” frequency-hopping pattern is proposed, where the frequency, hopping rate, and instantaneous bandwidth of the frequency-hopping signal vary randomly based on the background electromagnetic interference. The decision-making problem of the “three-variable” frequency-hopping pattern is modeled as a Markov decision process (MDP) by constructing the state-action-reward tuple. The designed frequency varies continuously within a small frequency band selected from a pseudo-random sequence to alleviate the problem of dimension explosion in decision-making. At the same time, discrete values for the hopping rate and instantaneous bandwidth are designed. To solve this MDP problem efficiently, a combined deep reinforcement learning algorithm (OC-CDRL) based on optimistic exploration and conservative estimation is proposed, which combines the features of TD3 and D3QN algorithms and designs the corresponding states, actions, and rewards to deal with continuous and discrete action spaces, respectively. To address the problem that the D3QN algorithm tends to fall into local optimal solutions, an optimistic exploration strategy (OES) for action selection is proposed to improve the degree of exploration. Moreover, the loss function is improved by conservatively estimating state–action pairs outside the experience replay buffer, reducing the overestimation of the optimistic action-value function and increasing the stability and convergence of the algorithm. Comparative simulation results of the algorithms in different electromagnetic interference environments show that the OC-CDRL algorithm effectively avoids most regions with higher interference and has better adaptability and anti-jamming capability.
期刊介绍:
PHYCOM: Physical Communication is an international and archival journal providing complete coverage of all topics of interest to those involved in all aspects of physical layer communications. Theoretical research contributions presenting new techniques, concepts or analyses, applied contributions reporting on experiences and experiments, and tutorials are published.
Topics of interest include but are not limited to:
Physical layer issues of Wireless Local Area Networks, WiMAX, Wireless Mesh Networks, Sensor and Ad Hoc Networks, PCS Systems; Radio access protocols and algorithms for the physical layer; Spread Spectrum Communications; Channel Modeling; Detection and Estimation; Modulation and Coding; Multiplexing and Carrier Techniques; Broadband Wireless Communications; Wireless Personal Communications; Multi-user Detection; Signal Separation and Interference rejection: Multimedia Communications over Wireless; DSP Applications to Wireless Systems; Experimental and Prototype Results; Multiple Access Techniques; Space-time Processing; Synchronization Techniques; Error Control Techniques; Cryptography; Software Radios; Tracking; Resource Allocation and Inference Management; Multi-rate and Multi-carrier Communications; Cross layer Design and Optimization; Propagation and Channel Characterization; OFDM Systems; MIMO Systems; Ultra-Wideband Communications; Cognitive Radio System Architectures; Platforms and Hardware Implementations for the Support of Cognitive, Radio Systems; Cognitive Radio Resource Management and Dynamic Spectrum Sharing.