稳健的迭代值转换：神经芯片驱动边缘机器人的深度强化学习

IF 4.3 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Robotics and Autonomous Systems Pub Date : 2024-08-20 DOI:10.1016/j.robot.2024.104782

Yuki Kadokawa , Tomohito Kodera , Yoshihisa Tsurumine , Shinya Nishimura , Takamitsu Matsubara

{"title":"稳健的迭代值转换：神经芯片驱动边缘机器人的深度强化学习","authors":"Yuki Kadokawa , Tomohito Kodera , Yoshihisa Tsurumine , Shinya Nishimura , Takamitsu Matsubara","doi":"10.1016/j.robot.2024.104782","DOIUrl":null,"url":null,"abstract":"<div><p>A neurochip is a device that reproduces the signal processing mechanisms of brain neurons and calculates Spiking Neural Networks (SNNs) with low power consumption and at high speed. Thus, neurochips are attracting attention from edge robot applications, which suffer from limited battery capacity. This paper aims to achieve deep reinforcement learning (DRL) that acquires SNN policies suitable for neurochip implementation. Since DRL requires a complex function approximation, we focus on conversion techniques from Floating Point NN (FPNN) because it is one of the most feasible SNN techniques. However, DRL requires conversions to SNNs for every policy update to collect the learning samples for a DRL-learning cycle, which updates the FPNN policy and collects the SNN policy samples. Accumulative conversion errors can significantly degrade the performance of the SNN policies. We propose Robust Iterative Value Conversion (RIVC) as a DRL that incorporates conversion error reduction and robustness to conversion errors. To reduce them, FPNN is optimized with the same number of quantization bits as an SNN. The FPNN output is not significantly changed by quantization. To robustify the conversion error, an FPNN policy that is applied with quantization is updated to increase the gap between the probability of selecting the optimal action and other actions. This step prevents unexpected replacements of the policy’s optimal actions. We verified RIVC’s effectiveness on a neurochip-driven robot. The results showed that RIVC consumed 1/15 times less power and increased the calculation speed by five times more than an edge CPU (quad-core ARM Cortex-A72). The previous framework with no countermeasures against conversion errors failed to train the policies. Videos from our experiments are available: <span><span>https://youtu.be/Q5Z0-BvK1Tc</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"181 ","pages":"Article 104782"},"PeriodicalIF":4.3000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robust iterative value conversion: Deep reinforcement learning for neurochip-driven edge robots\",\"authors\":\"Yuki Kadokawa , Tomohito Kodera , Yoshihisa Tsurumine , Shinya Nishimura , Takamitsu Matsubara\",\"doi\":\"10.1016/j.robot.2024.104782\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>A neurochip is a device that reproduces the signal processing mechanisms of brain neurons and calculates Spiking Neural Networks (SNNs) with low power consumption and at high speed. Thus, neurochips are attracting attention from edge robot applications, which suffer from limited battery capacity. This paper aims to achieve deep reinforcement learning (DRL) that acquires SNN policies suitable for neurochip implementation. Since DRL requires a complex function approximation, we focus on conversion techniques from Floating Point NN (FPNN) because it is one of the most feasible SNN techniques. However, DRL requires conversions to SNNs for every policy update to collect the learning samples for a DRL-learning cycle, which updates the FPNN policy and collects the SNN policy samples. Accumulative conversion errors can significantly degrade the performance of the SNN policies. We propose Robust Iterative Value Conversion (RIVC) as a DRL that incorporates conversion error reduction and robustness to conversion errors. To reduce them, FPNN is optimized with the same number of quantization bits as an SNN. The FPNN output is not significantly changed by quantization. To robustify the conversion error, an FPNN policy that is applied with quantization is updated to increase the gap between the probability of selecting the optimal action and other actions. This step prevents unexpected replacements of the policy’s optimal actions. We verified RIVC’s effectiveness on a neurochip-driven robot. The results showed that RIVC consumed 1/15 times less power and increased the calculation speed by five times more than an edge CPU (quad-core ARM Cortex-A72). The previous framework with no countermeasures against conversion errors failed to train the policies. Videos from our experiments are available: <span><span>https://youtu.be/Q5Z0-BvK1Tc</span><svg><path></path></svg></span>.</p></div>\",\"PeriodicalId\":49592,\"journal\":{\"name\":\"Robotics and Autonomous Systems\",\"volume\":\"181 \",\"pages\":\"Article 104782\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Autonomous Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0921889024001660\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0921889024001660","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

神经芯片是一种能够复制大脑神经元信号处理机制并以低功耗和高速度计算尖峰神经网络（SNN）的设备。因此，神经芯片正受到电池容量有限的边缘机器人应用的关注。本文旨在实现深度强化学习（DRL），获取适合神经芯片实施的 SNN 策略。由于 DRL 需要复杂的函数近似，我们将重点放在浮点网络（FPNN）的转换技术上，因为它是最可行的 SNN 技术之一。然而，DRL 需要在每次策略更新时转换为 SNN，以收集 DRL 学习周期的学习样本，从而更新 FPNN 策略并收集 SNN 策略样本。累积转换误差会大大降低 SNN 策略的性能。我们提出了稳健迭代值转换（RIVC）作为一种 DRL，它结合了减少转换误差和对转换误差的稳健性。为了减少转换误差，FPNN 采用与 SNN 相同的量化位数进行优化。FPNN 的输出不会因量化而发生明显变化。为了稳健地消除转换误差，对量化后的 FPNN 策略进行更新，以增大选择最优行动的概率与其他行动的概率之间的差距。这一步骤可防止策略的最优行动被意外替换。我们在神经芯片驱动的机器人上验证了 RIVC 的有效性。结果表明，与边缘 CPU（四核 ARM Cortex-A72）相比，RIVC 的功耗降低了 1/15 倍，计算速度提高了 5 倍。而之前没有针对转换错误采取对策的框架则无法训练策略。实验视频：https://youtu.be/Q5Z0-BvK1Tc。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Robust iterative value conversion: Deep reinforcement learning for neurochip-driven edge robots

A neurochip is a device that reproduces the signal processing mechanisms of brain neurons and calculates Spiking Neural Networks (SNNs) with low power consumption and at high speed. Thus, neurochips are attracting attention from edge robot applications, which suffer from limited battery capacity. This paper aims to achieve deep reinforcement learning (DRL) that acquires SNN policies suitable for neurochip implementation. Since DRL requires a complex function approximation, we focus on conversion techniques from Floating Point NN (FPNN) because it is one of the most feasible SNN techniques. However, DRL requires conversions to SNNs for every policy update to collect the learning samples for a DRL-learning cycle, which updates the FPNN policy and collects the SNN policy samples. Accumulative conversion errors can significantly degrade the performance of the SNN policies. We propose Robust Iterative Value Conversion (RIVC) as a DRL that incorporates conversion error reduction and robustness to conversion errors. To reduce them, FPNN is optimized with the same number of quantization bits as an SNN. The FPNN output is not significantly changed by quantization. To robustify the conversion error, an FPNN policy that is applied with quantization is updated to increase the gap between the probability of selecting the optimal action and other actions. This step prevents unexpected replacements of the policy’s optimal actions. We verified RIVC’s effectiveness on a neurochip-driven robot. The results showed that RIVC consumed 1/15 times less power and increased the calculation speed by five times more than an edge CPU (quad-core ARM Cortex-A72). The previous framework with no countermeasures against conversion errors failed to train the policies. Videos from our experiments are available: https://youtu.be/Q5Z0-BvK1Tc.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics and Autonomous Systems 工程技术-机器人学

CiteScore

9.00

自引率

7.00%

发文量

164

审稿时长

4.5 months

期刊介绍： Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems. Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.