{"title":"SHIRE:在强化学习中利用人类直觉提高采样效率","authors":"Amogh Joshi, Adarsh Kumar Kosta, Kaushik Roy","doi":"arxiv-2409.09990","DOIUrl":null,"url":null,"abstract":"The ability of neural networks to perform robotic perception and control\ntasks such as depth and optical flow estimation, simultaneous localization and\nmapping (SLAM), and automatic control has led to their widespread adoption in\nrecent years. Deep Reinforcement Learning has been used extensively in these\nsettings, as it does not have the unsustainable training costs associated with\nsupervised learning. However, DeepRL suffers from poor sample efficiency, i.e.,\nit requires a large number of environmental interactions to converge to an\nacceptable solution. Modern RL algorithms such as Deep Q Learning and Soft\nActor-Critic attempt to remedy this shortcoming but can not provide the\nexplainability required in applications such as autonomous robotics. Humans\nintuitively understand the long-time-horizon sequential tasks common in\nrobotics. Properly using such intuition can make RL policies more explainable\nwhile enhancing their sample efficiency. In this work, we propose SHIRE, a\nnovel framework for encoding human intuition using Probabilistic Graphical\nModels (PGMs) and using it in the Deep RL training pipeline to enhance sample\nefficiency. Our framework achieves 25-78% sample efficiency gains across the\nenvironments we evaluate at negligible overhead cost. Additionally, by teaching\nRL agents the encoded elementary behavior, SHIRE enhances policy\nexplainability. A real-world demonstration further highlights the efficacy of\npolicies trained using our framework.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning\",\"authors\":\"Amogh Joshi, Adarsh Kumar Kosta, Kaushik Roy\",\"doi\":\"arxiv-2409.09990\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ability of neural networks to perform robotic perception and control\\ntasks such as depth and optical flow estimation, simultaneous localization and\\nmapping (SLAM), and automatic control has led to their widespread adoption in\\nrecent years. Deep Reinforcement Learning has been used extensively in these\\nsettings, as it does not have the unsustainable training costs associated with\\nsupervised learning. However, DeepRL suffers from poor sample efficiency, i.e.,\\nit requires a large number of environmental interactions to converge to an\\nacceptable solution. Modern RL algorithms such as Deep Q Learning and Soft\\nActor-Critic attempt to remedy this shortcoming but can not provide the\\nexplainability required in applications such as autonomous robotics. Humans\\nintuitively understand the long-time-horizon sequential tasks common in\\nrobotics. Properly using such intuition can make RL policies more explainable\\nwhile enhancing their sample efficiency. In this work, we propose SHIRE, a\\nnovel framework for encoding human intuition using Probabilistic Graphical\\nModels (PGMs) and using it in the Deep RL training pipeline to enhance sample\\nefficiency. Our framework achieves 25-78% sample efficiency gains across the\\nenvironments we evaluate at negligible overhead cost. Additionally, by teaching\\nRL agents the encoded elementary behavior, SHIRE enhances policy\\nexplainability. A real-world demonstration further highlights the efficacy of\\npolicies trained using our framework.\",\"PeriodicalId\":501347,\"journal\":{\"name\":\"arXiv - CS - Neural and Evolutionary Computing\",\"volume\":\"14 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Neural and Evolutionary Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09990\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09990","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning
The ability of neural networks to perform robotic perception and control
tasks such as depth and optical flow estimation, simultaneous localization and
mapping (SLAM), and automatic control has led to their widespread adoption in
recent years. Deep Reinforcement Learning has been used extensively in these
settings, as it does not have the unsustainable training costs associated with
supervised learning. However, DeepRL suffers from poor sample efficiency, i.e.,
it requires a large number of environmental interactions to converge to an
acceptable solution. Modern RL algorithms such as Deep Q Learning and Soft
Actor-Critic attempt to remedy this shortcoming but can not provide the
explainability required in applications such as autonomous robotics. Humans
intuitively understand the long-time-horizon sequential tasks common in
robotics. Properly using such intuition can make RL policies more explainable
while enhancing their sample efficiency. In this work, we propose SHIRE, a
novel framework for encoding human intuition using Probabilistic Graphical
Models (PGMs) and using it in the Deep RL training pipeline to enhance sample
efficiency. Our framework achieves 25-78% sample efficiency gains across the
environments we evaluate at negligible overhead cost. Additionally, by teaching
RL agents the encoded elementary behavior, SHIRE enhances policy
explainability. A real-world demonstration further highlights the efficacy of
policies trained using our framework.