Task-driven Risk-bounded Hierarchical Reinforcement Learning Based on Iterative Refinement

Proceedings of the AAAI Symposium Series Pub Date : 2024-05-20 DOI:10.1609/aaaiss.v3i1.31281

Viraj Parimi, Sungkweon Hong, Brian Williams

{"title":"Task-driven Risk-bounded Hierarchical Reinforcement Learning Based on Iterative Refinement","authors":"Viraj Parimi, Sungkweon Hong, Brian Williams","doi":"10.1609/aaaiss.v3i1.31281","DOIUrl":null,"url":null,"abstract":"Deep Reinforcement Learning (DRL) has garnered substantial acclaim for its versatility and widespread applications across diverse domains. Aligned with human-like learning, DRL is grounded in the fundamental principle of learning from interaction, wherein agents dynamically adjust behavior based on environmental feedback in the form of rewards. This iterative trial-and-error process, mirroring human learning, underscores the importance of observation, experimentation, and feedback in shaping understanding and behavior. DRL agents, trained to navigate complex surroundings, refine their knowledge through hierarchical and abstract representations, empowered by deep neural networks. These representations enable efficient handling of long-horizon tasks and flexible adaptation to novel situations, akin to the human ability to construct mental models for comprehending complex concepts and predicting outcomes. Hence, abstract representation building emerges as a critical aspect in the learning processes of both artificial agents and human learners, particularly in long-horizon tasks.\n\nFurthermore, human decision-making, deeply rooted in evolutionary history, exhibits a remarkable capacity to balance the tradeoff between risk and cost across various domains. This cognitive process involves assessing potential negative consequences, evaluating factors such as the likelihood of adverse outcomes, severity of potential harm, and overall uncertainty. Humans intuitively gauge inherent risks and adeptly weigh associated costs, extending beyond monetary expenses to include time, effort, and opportunity costs. The nuanced ability of humans to consider the tradeoff between risk and cost highlights the complexity and adaptability of human decision-making, a skill lacking in typical DRL agents. Principles like these derived from human-like learning present an avenue for inspiring advancements in DRL, fostering the development of more adaptive and intelligent artificial agents.\n\nMotivated by these observations and focusing on practical challenges in robotics, our efforts target risk-aware stochastic sequential decision-making problem which is crucial for tasks with extended time frames and varied strategies. A novel integration of model-based conditional planning with DRL is proposed, inspired by hierarchical techniques. This approach breaks down complex tasks into manageable subtasks(motion primitives), ensuring safety constraints and informed decision-making. Unlike existing methods, our approach addresses motion primitive improvement iteratively, employing diverse prioritization functions to guide the search process effectively. This risk-bounded planning algorithm seamlessly integrates conditional planning and motion primitive learning, prioritizing computational efforts for enhanced efficiency within specified time limits.","PeriodicalId":516827,"journal":{"name":"Proceedings of the AAAI Symposium Series","volume":"13 18","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI Symposium Series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaaiss.v3i1.31281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep Reinforcement Learning (DRL) has garnered substantial acclaim for its versatility and widespread applications across diverse domains. Aligned with human-like learning, DRL is grounded in the fundamental principle of learning from interaction, wherein agents dynamically adjust behavior based on environmental feedback in the form of rewards. This iterative trial-and-error process, mirroring human learning, underscores the importance of observation, experimentation, and feedback in shaping understanding and behavior. DRL agents, trained to navigate complex surroundings, refine their knowledge through hierarchical and abstract representations, empowered by deep neural networks. These representations enable efficient handling of long-horizon tasks and flexible adaptation to novel situations, akin to the human ability to construct mental models for comprehending complex concepts and predicting outcomes. Hence, abstract representation building emerges as a critical aspect in the learning processes of both artificial agents and human learners, particularly in long-horizon tasks. Furthermore, human decision-making, deeply rooted in evolutionary history, exhibits a remarkable capacity to balance the tradeoff between risk and cost across various domains. This cognitive process involves assessing potential negative consequences, evaluating factors such as the likelihood of adverse outcomes, severity of potential harm, and overall uncertainty. Humans intuitively gauge inherent risks and adeptly weigh associated costs, extending beyond monetary expenses to include time, effort, and opportunity costs. The nuanced ability of humans to consider the tradeoff between risk and cost highlights the complexity and adaptability of human decision-making, a skill lacking in typical DRL agents. Principles like these derived from human-like learning present an avenue for inspiring advancements in DRL, fostering the development of more adaptive and intelligent artificial agents. Motivated by these observations and focusing on practical challenges in robotics, our efforts target risk-aware stochastic sequential decision-making problem which is crucial for tasks with extended time frames and varied strategies. A novel integration of model-based conditional planning with DRL is proposed, inspired by hierarchical techniques. This approach breaks down complex tasks into manageable subtasks(motion primitives), ensuring safety constraints and informed decision-making. Unlike existing methods, our approach addresses motion primitive improvement iteratively, employing diverse prioritization functions to guide the search process effectively. This risk-bounded planning algorithm seamlessly integrates conditional planning and motion primitive learning, prioritizing computational efforts for enhanced efficiency within specified time limits.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于迭代改进的任务驱动型风险约束分层强化学习

深度强化学习（DRL）因其多功能性和在不同领域的广泛应用而备受赞誉。DRL 与类人学习相一致，立足于从互动中学习的基本原则，即代理根据奖励形式的环境反馈动态调整行为。这种迭代试错过程与人类的学习过程如出一辙，强调了观察、实验和反馈在形成理解和行为方面的重要性。接受过复杂环境导航训练的 DRL 代理，在深度神经网络的支持下，通过分层和抽象的表征来完善自己的知识。这些表征能够高效处理长远任务，灵活适应新情况，类似于人类构建心智模型以理解复杂概念和预测结果的能力。因此，抽象表征的构建成为人工代理和人类学习者学习过程中的一个关键环节，尤其是在长视距任务中。此外，人类决策深深植根于进化史，在平衡各领域风险与成本之间的权衡方面表现出非凡的能力。这一认知过程包括评估潜在的负面后果，评价不利结果的可能性、潜在伤害的严重程度以及总体不确定性等因素。人类能够直观地衡量固有风险，并善于权衡相关成本，这些成本不仅包括金钱支出，还包括时间、精力和机会成本。人类考虑风险与成本之间权衡的细致能力，凸显了人类决策的复杂性和适应性，而这正是典型的 DRL 代理所缺乏的技能。从类似人类的学习中得出的这些原则为激励 DRL 的进步提供了一条途径，促进了更具适应性和智能性的人工代理的发展。受这些观察结果的启发，并着眼于机器人技术中的实际挑战，我们的工作以风险意识随机顺序决策问题为目标，这对于具有较长时限和多种策略的任务至关重要。受分层技术的启发，我们提出了一种基于模型的条件规划与 DRL 的新型集成方法。这种方法将复杂任务分解为易于管理的子任务（运动基元），确保安全约束和知情决策。与现有方法不同的是，我们的方法通过迭代改进运动基元，采用不同的优先级函数来有效指导搜索过程。这种有风险限制的规划算法将条件规划和运动基元学习完美地结合在一起，在规定的时间限制内对计算工作进行优先排序，以提高效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the AAAI Symposium Series

自引率

0.00%

发文量