{"title":"A UAV collaborative defense scheme driven by DDPG algorithm","authors":"Zhang Yaozhong;Wu Zhuoran;Xiong Zhenkai;Chen Long","doi":"10.23919/JSEE.2023.000128","DOIUrl":null,"url":null,"abstract":"The deep deterministic policy gradient (DDPG) algorithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration. Using the DDPG algorithm, agents can explore and summarize the environment to achieve autonomous decisions in the continuous state space and action space. In this paper, a cooperative defense with DDPG via swarms of unmanned aerial vehicle (UAV) is developed and validated, which has shown promising practical value in the effect of defending. We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process. The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently, meeting the requirements of a UAV swarm for non-centralization, autonomy, and promoting the intelligent development of UAVs swarm as well as the decision-making process.","PeriodicalId":50030,"journal":{"name":"Journal of Systems Engineering and Electronics","volume":"34 5","pages":"1211-1224"},"PeriodicalIF":1.9000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Engineering and Electronics","FirstCategoryId":"1087","ListUrlMain":"https://ieeexplore.ieee.org/document/10308761/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The deep deterministic policy gradient (DDPG) algorithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration. Using the DDPG algorithm, agents can explore and summarize the environment to achieve autonomous decisions in the continuous state space and action space. In this paper, a cooperative defense with DDPG via swarms of unmanned aerial vehicle (UAV) is developed and validated, which has shown promising practical value in the effect of defending. We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process. The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently, meeting the requirements of a UAV swarm for non-centralization, autonomy, and promoting the intelligent development of UAVs swarm as well as the decision-making process.