Emergence of flexible prediction-based discrete decision making and continuous motion generation through actor-Q-learning

2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL) Pub Date : 2013-08-01 DOI:10.1109/DEVLRN.2013.6652559

K. Shibata, Kenta Goto

{"title":"Emergence of flexible prediction-based discrete decision making and continuous motion generation through actor-Q-learning","authors":"K. Shibata, Kenta Goto","doi":"10.1109/DEVLRN.2013.6652559","DOIUrl":null,"url":null,"abstract":"In this paper, the authors first point the importance of three factors for filling the gap between humans and robots in the flexibility in the real world. Those are (1)parallel processing, (2)emergence through learning and solving “what” problems, and (3)abstraction and generalization on the abstract space. To explore the possibility of human-like flexibility in robots, a prediction-required task in which an agent (robot) gets a reward by capturing a moving target that sometimes becomes invisible was learned by reinforcement learning using a recurrent neural network. Even though the agent did not know in advance that “prediction is required” or “what information should be predicted”, appropriate discrete decision making, in which `capture' or `move' was chosen, and also continuous motion generation in two-dimensional space, could be acquired. Furthermore, in this task, the target sometimes changed its moving direction randomly when it became visible again from invisible state. Then the agent could change its moving direction promptly and appropriately without introducing any special architecture or technique. Such emergent property is what general parallel processing systems such as Subsumption architecture do not have, and the authors believe it is a key to solve the “Frame Problem” fundamentally.","PeriodicalId":106997,"journal":{"name":"2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2013.6652559","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

In this paper, the authors first point the importance of three factors for filling the gap between humans and robots in the flexibility in the real world. Those are (1)parallel processing, (2)emergence through learning and solving “what” problems, and (3)abstraction and generalization on the abstract space. To explore the possibility of human-like flexibility in robots, a prediction-required task in which an agent (robot) gets a reward by capturing a moving target that sometimes becomes invisible was learned by reinforcement learning using a recurrent neural network. Even though the agent did not know in advance that “prediction is required” or “what information should be predicted”, appropriate discrete decision making, in which `capture' or `move' was chosen, and also continuous motion generation in two-dimensional space, could be acquired. Furthermore, in this task, the target sometimes changed its moving direction randomly when it became visible again from invisible state. Then the agent could change its moving direction promptly and appropriately without introducing any special architecture or technique. Such emergent property is what general parallel processing systems such as Subsumption architecture do not have, and the authors believe it is a key to solve the “Frame Problem” fundamentally.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于柔性预测的离散决策和通过actor- q学习的连续运动生成的出现

在本文中，作者首先指出了三个因素对于填补现实世界中人类与机器人在灵活性方面的差距的重要性。它们是:(1)并行处理;(2)通过学习和解决“什么”问题而出现;(3)抽象空间的抽象和泛化。为了探索机器人具有类似人类的灵活性的可能性，通过使用循环神经网络的强化学习来学习一个需要预测的任务，其中代理(机器人)通过捕获有时变得不可见的移动目标获得奖励。即使agent事先不知道“需要预测”或“应该预测什么信息”，也可以获得适当的离散决策，选择“捕获”或“移动”，并在二维空间中连续生成运动。此外，在该任务中，当目标从不可见状态变为可见状态时，有时会随机改变其运动方向。这样，智能体就可以在不引入任何特殊架构或技术的情况下，迅速而适当地改变其移动方向。这种涌现性是一般的并行处理系统如包容架构所不具备的，是从根本上解决“框架问题”的关键。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL)

自引率

0.00%

发文量

期刊最新文献

Epigenetic adaptation through hormone modulation in autonomous robots Attentional constraints and statistics in toddlers' word learning Do humans need learning to read humanoid lifting actions? Temporal emphasis for goal extraction in task demonstration to a humanoid robot by naive users Developing learnability — The case for reduced dimensionality