Emergence of flexible prediction-based discrete decision making and continuous motion generation through actor-Q-learning

K. Shibata, Kenta Goto
{"title":"Emergence of flexible prediction-based discrete decision making and continuous motion generation through actor-Q-learning","authors":"K. Shibata, Kenta Goto","doi":"10.1109/DEVLRN.2013.6652559","DOIUrl":null,"url":null,"abstract":"In this paper, the authors first point the importance of three factors for filling the gap between humans and robots in the flexibility in the real world. Those are (1)parallel processing, (2)emergence through learning and solving “what” problems, and (3)abstraction and generalization on the abstract space. To explore the possibility of human-like flexibility in robots, a prediction-required task in which an agent (robot) gets a reward by capturing a moving target that sometimes becomes invisible was learned by reinforcement learning using a recurrent neural network. Even though the agent did not know in advance that “prediction is required” or “what information should be predicted”, appropriate discrete decision making, in which `capture' or `move' was chosen, and also continuous motion generation in two-dimensional space, could be acquired. Furthermore, in this task, the target sometimes changed its moving direction randomly when it became visible again from invisible state. Then the agent could change its moving direction promptly and appropriately without introducing any special architecture or technique. Such emergent property is what general parallel processing systems such as Subsumption architecture do not have, and the authors believe it is a key to solve the “Frame Problem” fundamentally.","PeriodicalId":106997,"journal":{"name":"2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2013.6652559","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

In this paper, the authors first point the importance of three factors for filling the gap between humans and robots in the flexibility in the real world. Those are (1)parallel processing, (2)emergence through learning and solving “what” problems, and (3)abstraction and generalization on the abstract space. To explore the possibility of human-like flexibility in robots, a prediction-required task in which an agent (robot) gets a reward by capturing a moving target that sometimes becomes invisible was learned by reinforcement learning using a recurrent neural network. Even though the agent did not know in advance that “prediction is required” or “what information should be predicted”, appropriate discrete decision making, in which `capture' or `move' was chosen, and also continuous motion generation in two-dimensional space, could be acquired. Furthermore, in this task, the target sometimes changed its moving direction randomly when it became visible again from invisible state. Then the agent could change its moving direction promptly and appropriately without introducing any special architecture or technique. Such emergent property is what general parallel processing systems such as Subsumption architecture do not have, and the authors believe it is a key to solve the “Frame Problem” fundamentally.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于柔性预测的离散决策和通过actor- q学习的连续运动生成的出现
在本文中,作者首先指出了三个因素对于填补现实世界中人类与机器人在灵活性方面的差距的重要性。它们是:(1)并行处理;(2)通过学习和解决“什么”问题而出现;(3)抽象空间的抽象和泛化。为了探索机器人具有类似人类的灵活性的可能性,通过使用循环神经网络的强化学习来学习一个需要预测的任务,其中代理(机器人)通过捕获有时变得不可见的移动目标获得奖励。即使agent事先不知道“需要预测”或“应该预测什么信息”,也可以获得适当的离散决策,选择“捕获”或“移动”,并在二维空间中连续生成运动。此外,在该任务中,当目标从不可见状态变为可见状态时,有时会随机改变其运动方向。这样,智能体就可以在不引入任何特殊架构或技术的情况下,迅速而适当地改变其移动方向。这种涌现性是一般的并行处理系统如包容架构所不具备的,是从根本上解决“框架问题”的关键。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Epigenetic adaptation through hormone modulation in autonomous robots Attentional constraints and statistics in toddlers' word learning Do humans need learning to read humanoid lifting actions? Temporal emphasis for goal extraction in task demonstration to a humanoid robot by naive users Developing learnability — The case for reduced dimensionality
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1