{"title":"无源 RRAM 交叉条阵列上的高效强化学习","authors":"Arjun Tyagi, Shubham Sahay","doi":"arxiv-2407.08242","DOIUrl":null,"url":null,"abstract":"The unprecedented growth in the field of machine learning has led to the\ndevelopment of deep neuromorphic networks trained on labelled dataset with\ncapability to mimic or even exceed human capabilities. However, for\napplications involving continuous decision making in unknown environments, such\nas rovers for space exploration, robots, unmanned aerial vehicles, etc.,\nexplicit supervision and generation of labelled data set is extremely difficult\nand expensive. Reinforcement learning (RL) allows the agents to take decisions\nwithout any (human/external) supervision or training on labelled dataset.\nHowever, the conventional implementations of RL on advanced digital CPUs/GPUs\nincur a significantly large power dissipation owing to their inherent\nvon-Neumann architecture. Although crossbar arrays of emerging non-volatile\nmemories such as resistive (R)RAMs with their innate capability to perform\nenergy-efficient in situ multiply-accumulate operation appear promising for\nQ-learning-based RL implementations, their limited endurance restricts their\napplication in practical RL systems with overwhelming weight updates. To\naddress this issue and realize the true potential of RRAM-based RL\nimplementations, in this work, for the first time, we perform an\nalgorithm-hardware co-design and propose a novel implementation of Monte Carlo\n(MC) RL algorithm on passive RRAM crossbar array. We analyse the performance of\nthe proposed MC RL implementation on the classical cart-pole problem and\ndemonstrate that it not only outperforms the prior digital and active\n1-Transistor-1-RRAM (1T1R)-based implementations by more than five orders of\nmagnitude in terms of area but is also robust against the spatial and temporal\nvariations and endurance failure of RRAMs.","PeriodicalId":501168,"journal":{"name":"arXiv - CS - Emerging Technologies","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Reinforcement Learning On Passive RRAM Crossbar Array\",\"authors\":\"Arjun Tyagi, Shubham Sahay\",\"doi\":\"arxiv-2407.08242\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The unprecedented growth in the field of machine learning has led to the\\ndevelopment of deep neuromorphic networks trained on labelled dataset with\\ncapability to mimic or even exceed human capabilities. However, for\\napplications involving continuous decision making in unknown environments, such\\nas rovers for space exploration, robots, unmanned aerial vehicles, etc.,\\nexplicit supervision and generation of labelled data set is extremely difficult\\nand expensive. Reinforcement learning (RL) allows the agents to take decisions\\nwithout any (human/external) supervision or training on labelled dataset.\\nHowever, the conventional implementations of RL on advanced digital CPUs/GPUs\\nincur a significantly large power dissipation owing to their inherent\\nvon-Neumann architecture. Although crossbar arrays of emerging non-volatile\\nmemories such as resistive (R)RAMs with their innate capability to perform\\nenergy-efficient in situ multiply-accumulate operation appear promising for\\nQ-learning-based RL implementations, their limited endurance restricts their\\napplication in practical RL systems with overwhelming weight updates. To\\naddress this issue and realize the true potential of RRAM-based RL\\nimplementations, in this work, for the first time, we perform an\\nalgorithm-hardware co-design and propose a novel implementation of Monte Carlo\\n(MC) RL algorithm on passive RRAM crossbar array. We analyse the performance of\\nthe proposed MC RL implementation on the classical cart-pole problem and\\ndemonstrate that it not only outperforms the prior digital and active\\n1-Transistor-1-RRAM (1T1R)-based implementations by more than five orders of\\nmagnitude in terms of area but is also robust against the spatial and temporal\\nvariations and endurance failure of RRAMs.\",\"PeriodicalId\":501168,\"journal\":{\"name\":\"arXiv - CS - Emerging Technologies\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Emerging Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.08242\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Emerging Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.08242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient Reinforcement Learning On Passive RRAM Crossbar Array
The unprecedented growth in the field of machine learning has led to the
development of deep neuromorphic networks trained on labelled dataset with
capability to mimic or even exceed human capabilities. However, for
applications involving continuous decision making in unknown environments, such
as rovers for space exploration, robots, unmanned aerial vehicles, etc.,
explicit supervision and generation of labelled data set is extremely difficult
and expensive. Reinforcement learning (RL) allows the agents to take decisions
without any (human/external) supervision or training on labelled dataset.
However, the conventional implementations of RL on advanced digital CPUs/GPUs
incur a significantly large power dissipation owing to their inherent
von-Neumann architecture. Although crossbar arrays of emerging non-volatile
memories such as resistive (R)RAMs with their innate capability to perform
energy-efficient in situ multiply-accumulate operation appear promising for
Q-learning-based RL implementations, their limited endurance restricts their
application in practical RL systems with overwhelming weight updates. To
address this issue and realize the true potential of RRAM-based RL
implementations, in this work, for the first time, we perform an
algorithm-hardware co-design and propose a novel implementation of Monte Carlo
(MC) RL algorithm on passive RRAM crossbar array. We analyse the performance of
the proposed MC RL implementation on the classical cart-pole problem and
demonstrate that it not only outperforms the prior digital and active
1-Transistor-1-RRAM (1T1R)-based implementations by more than five orders of
magnitude in terms of area but is also robust against the spatial and temporal
variations and endurance failure of RRAMs.