一种基于强化学习的多智能体追逃博弈方法

A. Bilgin, Esra Kadioglu Urtis
{"title":"一种基于强化学习的多智能体追逃博弈方法","authors":"A. Bilgin, Esra Kadioglu Urtis","doi":"10.1109/ICAR.2015.7251450","DOIUrl":null,"url":null,"abstract":"The game of pursuit-evasion has always been a popular research subject in the field of robotics. Reinforcement learning, which employs an agent's interaction with the environment, is a method widely used in pursuit-evasion domain. In this paper, a research is conducted on multi-agent pursuit-evasion problem using reinforcement learning and the experimental results are shown. The intelligent agents use Watkins's Q(λ)-learning algorithm to learn from their interactions. Q-learning is an off-policy temporal difference control algorithm. The method we utilize on the other hand, is a unified version of Q-learning and eligibility traces. It uses backup information until the first occurrence of an exploration. In our work, concurrent learning is adopted for the pursuit team. In this approach, each member of the team has got its own action-value function and updates its information space independently.","PeriodicalId":432004,"journal":{"name":"2015 International Conference on Advanced Robotics (ICAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"An approach to multi-agent pursuit evasion games using reinforcement learning\",\"authors\":\"A. Bilgin, Esra Kadioglu Urtis\",\"doi\":\"10.1109/ICAR.2015.7251450\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The game of pursuit-evasion has always been a popular research subject in the field of robotics. Reinforcement learning, which employs an agent's interaction with the environment, is a method widely used in pursuit-evasion domain. In this paper, a research is conducted on multi-agent pursuit-evasion problem using reinforcement learning and the experimental results are shown. The intelligent agents use Watkins's Q(λ)-learning algorithm to learn from their interactions. Q-learning is an off-policy temporal difference control algorithm. The method we utilize on the other hand, is a unified version of Q-learning and eligibility traces. It uses backup information until the first occurrence of an exploration. In our work, concurrent learning is adopted for the pursuit team. In this approach, each member of the team has got its own action-value function and updates its information space independently.\",\"PeriodicalId\":432004,\"journal\":{\"name\":\"2015 International Conference on Advanced Robotics (ICAR)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Advanced Robotics (ICAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAR.2015.7251450\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Advanced Robotics (ICAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAR.2015.7251450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

摘要

追赶-逃避博弈一直是机器人领域的热门研究课题。强化学习是一种广泛应用于逃避追踪领域的方法,它利用了智能体与环境的相互作用。本文采用强化学习方法对多智能体追逃问题进行了研究,并给出了实验结果。智能代理使用Watkins的Q(λ)学习算法从它们的交互中学习。Q-learning是一种离策略时间差分控制算法。另一方面,我们使用的方法是q学习和资格跟踪的统一版本。它使用备份信息,直到第一次勘探发生。在我们的工作中,追求团队采用了并行学习的方式。在这种方法中,团队的每个成员都有自己的行动价值函数,并独立地更新其信息空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An approach to multi-agent pursuit evasion games using reinforcement learning
The game of pursuit-evasion has always been a popular research subject in the field of robotics. Reinforcement learning, which employs an agent's interaction with the environment, is a method widely used in pursuit-evasion domain. In this paper, a research is conducted on multi-agent pursuit-evasion problem using reinforcement learning and the experimental results are shown. The intelligent agents use Watkins's Q(λ)-learning algorithm to learn from their interactions. Q-learning is an off-policy temporal difference control algorithm. The method we utilize on the other hand, is a unified version of Q-learning and eligibility traces. It uses backup information until the first occurrence of an exploration. In our work, concurrent learning is adopted for the pursuit team. In this approach, each member of the team has got its own action-value function and updates its information space independently.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On the EMG-based torque estimation for humans coupled with a force-controlled elbow exoskeleton The KIT whole-body human motion database Visual matching of stroke order in robotic calligraphy Real-time motion adaptation using relative distance space representation Optimization of the switching surface for the simplest passive dynamic biped
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1