{"title":"一种基于强化学习的多智能体追逃博弈方法","authors":"A. Bilgin, Esra Kadioglu Urtis","doi":"10.1109/ICAR.2015.7251450","DOIUrl":null,"url":null,"abstract":"The game of pursuit-evasion has always been a popular research subject in the field of robotics. Reinforcement learning, which employs an agent's interaction with the environment, is a method widely used in pursuit-evasion domain. In this paper, a research is conducted on multi-agent pursuit-evasion problem using reinforcement learning and the experimental results are shown. The intelligent agents use Watkins's Q(λ)-learning algorithm to learn from their interactions. Q-learning is an off-policy temporal difference control algorithm. The method we utilize on the other hand, is a unified version of Q-learning and eligibility traces. It uses backup information until the first occurrence of an exploration. In our work, concurrent learning is adopted for the pursuit team. In this approach, each member of the team has got its own action-value function and updates its information space independently.","PeriodicalId":432004,"journal":{"name":"2015 International Conference on Advanced Robotics (ICAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"An approach to multi-agent pursuit evasion games using reinforcement learning\",\"authors\":\"A. Bilgin, Esra Kadioglu Urtis\",\"doi\":\"10.1109/ICAR.2015.7251450\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The game of pursuit-evasion has always been a popular research subject in the field of robotics. Reinforcement learning, which employs an agent's interaction with the environment, is a method widely used in pursuit-evasion domain. In this paper, a research is conducted on multi-agent pursuit-evasion problem using reinforcement learning and the experimental results are shown. The intelligent agents use Watkins's Q(λ)-learning algorithm to learn from their interactions. Q-learning is an off-policy temporal difference control algorithm. The method we utilize on the other hand, is a unified version of Q-learning and eligibility traces. It uses backup information until the first occurrence of an exploration. In our work, concurrent learning is adopted for the pursuit team. In this approach, each member of the team has got its own action-value function and updates its information space independently.\",\"PeriodicalId\":432004,\"journal\":{\"name\":\"2015 International Conference on Advanced Robotics (ICAR)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Advanced Robotics (ICAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAR.2015.7251450\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Advanced Robotics (ICAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAR.2015.7251450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An approach to multi-agent pursuit evasion games using reinforcement learning
The game of pursuit-evasion has always been a popular research subject in the field of robotics. Reinforcement learning, which employs an agent's interaction with the environment, is a method widely used in pursuit-evasion domain. In this paper, a research is conducted on multi-agent pursuit-evasion problem using reinforcement learning and the experimental results are shown. The intelligent agents use Watkins's Q(λ)-learning algorithm to learn from their interactions. Q-learning is an off-policy temporal difference control algorithm. The method we utilize on the other hand, is a unified version of Q-learning and eligibility traces. It uses backup information until the first occurrence of an exploration. In our work, concurrent learning is adopted for the pursuit team. In this approach, each member of the team has got its own action-value function and updates its information space independently.