{"title":"基于柔性预测的离散决策和通过actor- q学习的连续运动生成的出现","authors":"K. Shibata, Kenta Goto","doi":"10.1109/DEVLRN.2013.6652559","DOIUrl":null,"url":null,"abstract":"In this paper, the authors first point the importance of three factors for filling the gap between humans and robots in the flexibility in the real world. Those are (1)parallel processing, (2)emergence through learning and solving “what” problems, and (3)abstraction and generalization on the abstract space. To explore the possibility of human-like flexibility in robots, a prediction-required task in which an agent (robot) gets a reward by capturing a moving target that sometimes becomes invisible was learned by reinforcement learning using a recurrent neural network. Even though the agent did not know in advance that “prediction is required” or “what information should be predicted”, appropriate discrete decision making, in which `capture' or `move' was chosen, and also continuous motion generation in two-dimensional space, could be acquired. Furthermore, in this task, the target sometimes changed its moving direction randomly when it became visible again from invisible state. Then the agent could change its moving direction promptly and appropriately without introducing any special architecture or technique. Such emergent property is what general parallel processing systems such as Subsumption architecture do not have, and the authors believe it is a key to solve the “Frame Problem” fundamentally.","PeriodicalId":106997,"journal":{"name":"2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Emergence of flexible prediction-based discrete decision making and continuous motion generation through actor-Q-learning\",\"authors\":\"K. Shibata, Kenta Goto\",\"doi\":\"10.1109/DEVLRN.2013.6652559\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, the authors first point the importance of three factors for filling the gap between humans and robots in the flexibility in the real world. Those are (1)parallel processing, (2)emergence through learning and solving “what” problems, and (3)abstraction and generalization on the abstract space. To explore the possibility of human-like flexibility in robots, a prediction-required task in which an agent (robot) gets a reward by capturing a moving target that sometimes becomes invisible was learned by reinforcement learning using a recurrent neural network. Even though the agent did not know in advance that “prediction is required” or “what information should be predicted”, appropriate discrete decision making, in which `capture' or `move' was chosen, and also continuous motion generation in two-dimensional space, could be acquired. Furthermore, in this task, the target sometimes changed its moving direction randomly when it became visible again from invisible state. Then the agent could change its moving direction promptly and appropriately without introducing any special architecture or technique. Such emergent property is what general parallel processing systems such as Subsumption architecture do not have, and the authors believe it is a key to solve the “Frame Problem” fundamentally.\",\"PeriodicalId\":106997,\"journal\":{\"name\":\"2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEVLRN.2013.6652559\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2013.6652559","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Emergence of flexible prediction-based discrete decision making and continuous motion generation through actor-Q-learning
In this paper, the authors first point the importance of three factors for filling the gap between humans and robots in the flexibility in the real world. Those are (1)parallel processing, (2)emergence through learning and solving “what” problems, and (3)abstraction and generalization on the abstract space. To explore the possibility of human-like flexibility in robots, a prediction-required task in which an agent (robot) gets a reward by capturing a moving target that sometimes becomes invisible was learned by reinforcement learning using a recurrent neural network. Even though the agent did not know in advance that “prediction is required” or “what information should be predicted”, appropriate discrete decision making, in which `capture' or `move' was chosen, and also continuous motion generation in two-dimensional space, could be acquired. Furthermore, in this task, the target sometimes changed its moving direction randomly when it became visible again from invisible state. Then the agent could change its moving direction promptly and appropriately without introducing any special architecture or technique. Such emergent property is what general parallel processing systems such as Subsumption architecture do not have, and the authors believe it is a key to solve the “Frame Problem” fundamentally.