Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu, Mingke Gao
{"title":"结合模仿学习和强化学习的网络增强型无人机目标搜索方法","authors":"Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu, Mingke Gao","doi":"10.1108/ijwis-10-2023-0186","DOIUrl":null,"url":null,"abstract":"Purpose\nThis paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.\n\nDesign/methodology/approach\nFirstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.\n\nFindings\nThe method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.\n\nOriginality/value\nThe agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.\n","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":"53 47","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Web-enhanced unmanned aerial vehicle target search method combining imitation learning and reinforcement learning\",\"authors\":\"Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu, Mingke Gao\",\"doi\":\"10.1108/ijwis-10-2023-0186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose\\nThis paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.\\n\\nDesign/methodology/approach\\nFirstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.\\n\\nFindings\\nThe method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.\\n\\nOriginality/value\\nThe agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.\\n\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":\"53 47\",\"pages\":\"\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2024-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/ijwis-10-2023-0186\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ijwis-10-2023-0186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0
摘要
目的 本文旨在研究代理从专家示范数据中学习,同时结合强化学习(RL),使代理突破专家示范数据的限制,降低代理探索空间的维度,加快训练收敛速度。设计/方法/途径首先,在代理训练的目标函数中设置衰减权重函数,将两类方法结合起来,在更新策略时同时考虑RL和模仿学习(IL)来指导代理的行为。研究结果该方法在收敛速度和决策稳定性方面优于其他算法,避免了从头开始训练奖励值,突破了示范数据带来的限制。原创性/价值该方法基于示范轨迹经验,通过探索和试错机制,使代理能够适应动态场景。将 IL 中使用的演示数据集和 RL 过程中获得的经验样本耦合使用,提高了数据利用效率和代理的泛化能力。
Purpose
This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.
Design/methodology/approach
Firstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.
Findings
The method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.
Originality/value
The agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.
期刊介绍:
ACS Applied Bio Materials is an interdisciplinary journal publishing original research covering all aspects of biomaterials and biointerfaces including and beyond the traditional biosensing, biomedical and therapeutic applications.
The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrates knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important bio applications. The journal is specifically interested in work that addresses the relationship between structure and function and assesses the stability and degradation of materials under relevant environmental and biological conditions.