结合模仿学习和强化学习的网络增强型无人机目标搜索方法

IF 2.5 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS International Journal of Web Information Systems Pub Date : 2024-04-01 DOI:10.1108/ijwis-10-2023-0186
Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu, Mingke Gao
{"title":"结合模仿学习和强化学习的网络增强型无人机目标搜索方法","authors":"Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu, Mingke Gao","doi":"10.1108/ijwis-10-2023-0186","DOIUrl":null,"url":null,"abstract":"Purpose\nThis paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.\n\nDesign/methodology/approach\nFirstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.\n\nFindings\nThe method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.\n\nOriginality/value\nThe agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.\n","PeriodicalId":44153,"journal":{"name":"International Journal of Web Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Web-enhanced unmanned aerial vehicle target search method combining imitation learning and reinforcement learning\",\"authors\":\"Tao Pang, Wenwen Xiao, Yilin Liu, Tao Wang, Jie Liu, Mingke Gao\",\"doi\":\"10.1108/ijwis-10-2023-0186\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose\\nThis paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate.\\n\\nDesign/methodology/approach\\nFirstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved.\\n\\nFindings\\nThe method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data.\\n\\nOriginality/value\\nThe agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.\\n\",\"PeriodicalId\":44153,\"journal\":{\"name\":\"International Journal of Web Information Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Web Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/ijwis-10-2023-0186\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Web Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ijwis-10-2023-0186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

目的 本文旨在研究代理从专家示范数据中学习,同时结合强化学习(RL),使代理突破专家示范数据的限制,降低代理探索空间的维度,加快训练收敛速度。设计/方法/途径首先,在代理训练的目标函数中设置衰减权重函数,将两类方法结合起来,在更新策略时同时考虑RL和模仿学习(IL)来指导代理的行为。研究结果该方法在收敛速度和决策稳定性方面优于其他算法,避免了从头开始训练奖励值,突破了示范数据带来的限制。原创性/价值该方法基于示范轨迹经验,通过探索和试错机制,使代理能够适应动态场景。将 IL 中使用的演示数据集和 RL 过程中获得的经验样本耦合使用,提高了数据利用效率和代理的泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Web-enhanced unmanned aerial vehicle target search method combining imitation learning and reinforcement learning
Purpose This paper aims to study the agent learning from expert demonstration data while incorporating reinforcement learning (RL), which enables the agent to break through the limitations of expert demonstration data and reduces the dimensionality of the agent’s exploration space to speed up the training convergence rate. Design/methodology/approach Firstly, the decay weight function is set in the objective function of the agent’s training to combine both types of methods, and both RL and imitation learning (IL) are considered to guide the agent's behavior when updating the policy. Second, this study designs a coupling utilization method between the demonstration trajectory and the training experience, so that samples from both aspects can be combined during the agent’s learning process, and the utilization rate of the data and the agent’s learning speed can be improved. Findings The method is superior to other algorithms in terms of convergence speed and decision stability, avoiding training from scratch for reward values, and breaking through the restrictions brought by demonstration data. Originality/value The agent can adapt to dynamic scenes through exploration and trial-and-error mechanisms based on the experience of demonstrating trajectories. The demonstration data set used in IL and the experience samples obtained in the process of RL are coupled and used to improve the data utilization efficiency and the generalization ability of the agent.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Web Information Systems
International Journal of Web Information Systems COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
4.60
自引率
0.00%
发文量
19
期刊介绍: The Global Information Infrastructure is a daily reality. In spite of the many applications in all domains of our societies: e-business, e-commerce, e-learning, e-science, and e-government, for instance, and in spite of the tremendous advances by engineers and scientists, the seamless development of Web information systems and services remains a major challenge. The journal examines how current shared vision for the future is one of semantically-rich information and service oriented architecture for global information systems. This vision is at the convergence of progress in technologies such as XML, Web services, RDF, OWL, of multimedia, multimodal, and multilingual information retrieval, and of distributed, mobile and ubiquitous computing. Topicality While the International Journal of Web Information Systems covers a broad range of topics, the journal welcomes papers that provide a perspective on all aspects of Web information systems: Web semantics and Web dynamics, Web mining and searching, Web databases and Web data integration, Web-based commerce and e-business, Web collaboration and distributed computing, Internet computing and networks, performance of Web applications, and Web multimedia services and Web-based education.
期刊最新文献
Web-aided data set expansion in deep learning: evaluating trainable activation functions in ResNet for improved image classification Click-through rate prediction model based on graph networks and feature squeeze-and-excitation mechanism Enhancing the viewing, browsing and searching of knowledge graphs with virtual properties GethReplayer: a smart contract testing method based on transaction replay Large language models for automated Q&A involving legal documents: a survey on algorithms, frameworks and applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1