Learning target reaching motions with a robotic arm using brain-inspired dopamine modulated STDP

J. C. V. Tieck, Pascal Becker, Jacques Kaiser, Igor Peric, Mahmoud Akl, Daniel Reichard, A. Rönnau, R. Dillmann
{"title":"Learning target reaching motions with a robotic arm using brain-inspired dopamine modulated STDP","authors":"J. C. V. Tieck, Pascal Becker, Jacques Kaiser, Igor Peric, Mahmoud Akl, Daniel Reichard, A. Rönnau, R. Dillmann","doi":"10.1109/ICCICC46617.2019.9146079","DOIUrl":null,"url":null,"abstract":"The main purpose of the human arm is to reach a target and perform a manipulation task. Human babies learn to move their arms by imitating and doing motor babbling through trial and error. This learning is believed to result from changes in synaptic efficacy triggered by complex mechanisms involving neuro-modulators in which dopamine plays a key role. After learning, humans are able to reuse and adapt the motions without performing complex calculations. In contrast, classical robotics achieve target reaching by mathematically computing each time the inverse kinematics (IK) of the joint angles leading to a particular target, then validating the configuration and generating a trajectory. This process is computational intensive and becomes more complex with the amount of degrees of freedom (DoF). In this work, we propose a spiking neural network architecture to learn target reaching motions with a robotic arm using reinforcement learning (RL), which is closely related to the way babies learn. To make our approach scalable, we sub-divide the kinematics structure of the robot and create one sub-network per joint. We generate training data offline by generating random reaching motions with an IK calculation outside of the network. After learning, the IK is no longer required, and the model is implicitly learned in the weights of the network. Mimicking the learning mechanisms of the brain, we use the spike time dependent plasticity (STDP) learning rule modulated by dopamine, representing a reward. The approach is evaluated with a simulated Universal Robot UR5 with six DoF. The network successfully learns to reach multiple targets and by changing the reward function on-the-fly it is able to learn different control functions. With a standard computer our network was able to control a robotic kinematics chain up to 13 DoF in real time. A key aspect of our approach is that in contrast to deep RL our SNN does not need much data to learn new behaviors. We believe that model free motion controllers inspired on the human brain mechanisms can improve the way robots are programmed by making the process more adaptive and flexible.","PeriodicalId":294902,"journal":{"name":"2019 IEEE 18th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 18th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCICC46617.2019.9146079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

The main purpose of the human arm is to reach a target and perform a manipulation task. Human babies learn to move their arms by imitating and doing motor babbling through trial and error. This learning is believed to result from changes in synaptic efficacy triggered by complex mechanisms involving neuro-modulators in which dopamine plays a key role. After learning, humans are able to reuse and adapt the motions without performing complex calculations. In contrast, classical robotics achieve target reaching by mathematically computing each time the inverse kinematics (IK) of the joint angles leading to a particular target, then validating the configuration and generating a trajectory. This process is computational intensive and becomes more complex with the amount of degrees of freedom (DoF). In this work, we propose a spiking neural network architecture to learn target reaching motions with a robotic arm using reinforcement learning (RL), which is closely related to the way babies learn. To make our approach scalable, we sub-divide the kinematics structure of the robot and create one sub-network per joint. We generate training data offline by generating random reaching motions with an IK calculation outside of the network. After learning, the IK is no longer required, and the model is implicitly learned in the weights of the network. Mimicking the learning mechanisms of the brain, we use the spike time dependent plasticity (STDP) learning rule modulated by dopamine, representing a reward. The approach is evaluated with a simulated Universal Robot UR5 with six DoF. The network successfully learns to reach multiple targets and by changing the reward function on-the-fly it is able to learn different control functions. With a standard computer our network was able to control a robotic kinematics chain up to 13 DoF in real time. A key aspect of our approach is that in contrast to deep RL our SNN does not need much data to learn new behaviors. We believe that model free motion controllers inspired on the human brain mechanisms can improve the way robots are programmed by making the process more adaptive and flexible.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用大脑启发多巴胺调节的STDP学习机械臂到达目标的动作
人的手臂的主要目的是达到目标并执行操作任务。人类婴儿通过模仿和反复试验来学习移动手臂。这种学习被认为是由涉及神经调节剂的复杂机制引发的突触效能变化引起的,其中多巴胺起着关键作用。在学习之后,人类能够重复使用和适应这些动作,而无需进行复杂的计算。而经典机器人通过数学计算每次指向特定目标的关节角的逆运动学(IK),然后验证构型并生成轨迹来实现目标到达。这个过程是计算密集型的,并且随着自由度(DoF)的增加变得更加复杂。在这项工作中,我们提出了一个尖峰神经网络架构,使用强化学习(RL)来学习机械臂的目标到达运动,这与婴儿的学习方式密切相关。为了使我们的方法具有可扩展性,我们对机器人的运动学结构进行细分,并为每个关节创建一个子网络。我们通过在网络外使用IK计算生成随机到达运动来离线生成训练数据。学习后,不再需要IK,模型在网络的权值中隐式学习。模仿大脑的学习机制,我们使用由多巴胺调节的spike time dependent plasticity (STDP)学习规则,代表一种奖励。用仿真的六自由度通用机器人UR5对该方法进行了验证。该网络成功地学会了到达多个目标,并且通过动态改变奖励函数,它能够学习不同的控制函数。在一台标准的计算机上,我们的网络能够实时控制机器人运动链达到13自由度。我们方法的一个关键方面是,与深度强化学习相比,我们的SNN不需要太多数据来学习新的行为。我们相信,受人脑机制启发的无模型运动控制器可以通过使过程更具适应性和灵活性来改进机器人的编程方式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On the Emergence of Abstract Sciences and Breakthroughs in Machine Knowledge Learning Computational Cognitive-Semantic Based Semantic Learning, Representation and Growth: A Perspective Multi-Scale PointPillars 3D Object Detection Network RTPA-based Software Generation by AI Programming Experience-based analysis and modeling for cognitive vehicle data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1