无源 RRAM 交叉条阵列上的高效强化学习

Arjun Tyagi, Shubham Sahay
{"title":"无源 RRAM 交叉条阵列上的高效强化学习","authors":"Arjun Tyagi, Shubham Sahay","doi":"arxiv-2407.08242","DOIUrl":null,"url":null,"abstract":"The unprecedented growth in the field of machine learning has led to the\ndevelopment of deep neuromorphic networks trained on labelled dataset with\ncapability to mimic or even exceed human capabilities. However, for\napplications involving continuous decision making in unknown environments, such\nas rovers for space exploration, robots, unmanned aerial vehicles, etc.,\nexplicit supervision and generation of labelled data set is extremely difficult\nand expensive. Reinforcement learning (RL) allows the agents to take decisions\nwithout any (human/external) supervision or training on labelled dataset.\nHowever, the conventional implementations of RL on advanced digital CPUs/GPUs\nincur a significantly large power dissipation owing to their inherent\nvon-Neumann architecture. Although crossbar arrays of emerging non-volatile\nmemories such as resistive (R)RAMs with their innate capability to perform\nenergy-efficient in situ multiply-accumulate operation appear promising for\nQ-learning-based RL implementations, their limited endurance restricts their\napplication in practical RL systems with overwhelming weight updates. To\naddress this issue and realize the true potential of RRAM-based RL\nimplementations, in this work, for the first time, we perform an\nalgorithm-hardware co-design and propose a novel implementation of Monte Carlo\n(MC) RL algorithm on passive RRAM crossbar array. We analyse the performance of\nthe proposed MC RL implementation on the classical cart-pole problem and\ndemonstrate that it not only outperforms the prior digital and active\n1-Transistor-1-RRAM (1T1R)-based implementations by more than five orders of\nmagnitude in terms of area but is also robust against the spatial and temporal\nvariations and endurance failure of RRAMs.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Reinforcement Learning On Passive RRAM Crossbar Array\",\"authors\":\"Arjun Tyagi, Shubham Sahay\",\"doi\":\"arxiv-2407.08242\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The unprecedented growth in the field of machine learning has led to the\\ndevelopment of deep neuromorphic networks trained on labelled dataset with\\ncapability to mimic or even exceed human capabilities. However, for\\napplications involving continuous decision making in unknown environments, such\\nas rovers for space exploration, robots, unmanned aerial vehicles, etc.,\\nexplicit supervision and generation of labelled data set is extremely difficult\\nand expensive. Reinforcement learning (RL) allows the agents to take decisions\\nwithout any (human/external) supervision or training on labelled dataset.\\nHowever, the conventional implementations of RL on advanced digital CPUs/GPUs\\nincur a significantly large power dissipation owing to their inherent\\nvon-Neumann architecture. Although crossbar arrays of emerging non-volatile\\nmemories such as resistive (R)RAMs with their innate capability to perform\\nenergy-efficient in situ multiply-accumulate operation appear promising for\\nQ-learning-based RL implementations, their limited endurance restricts their\\napplication in practical RL systems with overwhelming weight updates. To\\naddress this issue and realize the true potential of RRAM-based RL\\nimplementations, in this work, for the first time, we perform an\\nalgorithm-hardware co-design and propose a novel implementation of Monte Carlo\\n(MC) RL algorithm on passive RRAM crossbar array. We analyse the performance of\\nthe proposed MC RL implementation on the classical cart-pole problem and\\ndemonstrate that it not only outperforms the prior digital and active\\n1-Transistor-1-RRAM (1T1R)-based implementations by more than five orders of\\nmagnitude in terms of area but is also robust against the spatial and temporal\\nvariations and endurance failure of RRAMs.\",\"PeriodicalId\":501168,\"journal\":{\"name\":\"arXiv - CS - Emerging Technologies\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Emerging Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.08242\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.08242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

机器学习领域的空前发展促使人们开发出在标注数据集上训练的深度神经形态网络,其能力可模仿甚至超越人类。然而,对于涉及在未知环境中连续决策的应用,如太空探索漫游车、机器人、无人机等,明确的监督和生成标签数据集是极其困难和昂贵的。强化学习(RL)允许代理在没有任何(人为/外部)监督或标记数据集训练的情况下做出决策。然而,由于其固有的非诺伊曼架构,在先进的数字 CPU/GPU 上实现 RL 的传统方法会产生大量功耗。虽然新兴非易失性存储器(如电阻(R)RAM)的横条阵列具有执行高能效原位乘积操作的固有能力,似乎很有希望用于基于 Q 学习的 RL 实现,但其有限的耐用性限制了其在实际 RL 系统中的应用,因为该系统需要进行大量权重更新。为了解决这个问题,实现基于 RRAM 的 RL 实现的真正潜力,在这项工作中,我们首次进行了算法-硬件协同设计,并提出了在无源 RRAM 交叉条阵列上实现蒙特卡罗(MC)RL 算法的新方法。我们分析了所提出的 MC RL 实现在经典车极问题上的性能,并证明它不仅在面积上比之前基于数字和有源 1 晶体管-1-RRAM(1T1R)的实现优越五个数量级以上,而且对空间和时间变化以及 RRAM 的耐久性故障具有鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Efficient Reinforcement Learning On Passive RRAM Crossbar Array
The unprecedented growth in the field of machine learning has led to the development of deep neuromorphic networks trained on labelled dataset with capability to mimic or even exceed human capabilities. However, for applications involving continuous decision making in unknown environments, such as rovers for space exploration, robots, unmanned aerial vehicles, etc., explicit supervision and generation of labelled data set is extremely difficult and expensive. Reinforcement learning (RL) allows the agents to take decisions without any (human/external) supervision or training on labelled dataset. However, the conventional implementations of RL on advanced digital CPUs/GPUs incur a significantly large power dissipation owing to their inherent von-Neumann architecture. Although crossbar arrays of emerging non-volatile memories such as resistive (R)RAMs with their innate capability to perform energy-efficient in situ multiply-accumulate operation appear promising for Q-learning-based RL implementations, their limited endurance restricts their application in practical RL systems with overwhelming weight updates. To address this issue and realize the true potential of RRAM-based RL implementations, in this work, for the first time, we perform an algorithm-hardware co-design and propose a novel implementation of Monte Carlo (MC) RL algorithm on passive RRAM crossbar array. We analyse the performance of the proposed MC RL implementation on the classical cart-pole problem and demonstrate that it not only outperforms the prior digital and active 1-Transistor-1-RRAM (1T1R)-based implementations by more than five orders of magnitude in terms of area but is also robust against the spatial and temporal variations and endurance failure of RRAMs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Pennsieve - A Collaborative Platform for Translational Neuroscience and Beyond Analysing Attacks on Blockchain Systems in a Layer-based Approach Exploring Utility in a Real-World Warehouse Optimization Problem: Formulation Based on Quantun Annealers and Preliminary Results High Definition Map Mapping and Update: A General Overview and Future Directions Detection Made Easy: Potentials of Large Language Models for Solidity Vulnerabilities
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1