Reinforcement learning-based motion planning in partially observable environments under ethical constraints

Junchao Li, Mingyu Cai, Shaoping Xiao
{"title":"Reinforcement learning-based motion planning in partially observable environments under ethical constraints","authors":"Junchao Li,&nbsp;Mingyu Cai,&nbsp;Shaoping Xiao","doi":"10.1007/s43681-024-00441-6","DOIUrl":null,"url":null,"abstract":"<div><p>Designing autonomous agents that follow moral norms presents a significant challenge in addressing AI decision-making under ethical constraints, especially when involving motion planning for complex tasks in partially observable environments. This paper proposes a model-free reinforcement learning approach to address these challenges. We formulate the motion planning problem as a Probabilistic-Labeled Partially Observable Markov Decision Process (PL-POMDP) model and express complex tasks using Linear Temporal Logic (LTL). To handle ethical norms, we categorize them into ‘hard’ and ‘soft’ ethical constraints. LTL is again employed to formulate ‘hard’ constraints, while a reward redesign method is applied to enforce ‘soft’ ethical constraints. Our approach also involves generating a product of PL-POMDP and an LTL-induced automaton. This transformation allows us to find an optimal policy on the product, ensuring both task completion and ethics satisfaction through model checking. To synthesize desired policies, we utilize a state-of-the-art Recurrent Neural Network (RNN)-based deep Q learning method, in which Q networks take into account observation history and task recognition as input features. We demonstrate the effectiveness and flexibility of the proposed approach through two simulation examples, which showcase its potential applicability to various scenarios and challenges in ethically guided AI decision-making.</p></div>","PeriodicalId":72137,"journal":{"name":"AI and ethics","volume":"5 2","pages":"1047 - 1067"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI and ethics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43681-024-00441-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Designing autonomous agents that follow moral norms presents a significant challenge in addressing AI decision-making under ethical constraints, especially when involving motion planning for complex tasks in partially observable environments. This paper proposes a model-free reinforcement learning approach to address these challenges. We formulate the motion planning problem as a Probabilistic-Labeled Partially Observable Markov Decision Process (PL-POMDP) model and express complex tasks using Linear Temporal Logic (LTL). To handle ethical norms, we categorize them into ‘hard’ and ‘soft’ ethical constraints. LTL is again employed to formulate ‘hard’ constraints, while a reward redesign method is applied to enforce ‘soft’ ethical constraints. Our approach also involves generating a product of PL-POMDP and an LTL-induced automaton. This transformation allows us to find an optimal policy on the product, ensuring both task completion and ethics satisfaction through model checking. To synthesize desired policies, we utilize a state-of-the-art Recurrent Neural Network (RNN)-based deep Q learning method, in which Q networks take into account observation history and task recognition as input features. We demonstrate the effectiveness and flexibility of the proposed approach through two simulation examples, which showcase its potential applicability to various scenarios and challenges in ethically guided AI decision-making.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
道德约束下部分可观测环境中基于强化学习的运动规划
设计遵循道德规范的自主代理,对于在道德约束下解决人工智能决策提出了重大挑战,特别是在涉及部分可观察环境中复杂任务的运动规划时。本文提出了一种无模型强化学习方法来解决这些挑战。我们将运动规划问题表述为概率标记部分可观察马尔可夫决策过程(PL-POMDP)模型,并使用线性时间逻辑(LTL)表达复杂任务。为了处理道德规范,我们将其分为“硬”和“软”道德约束。LTL再次用于制定“硬”约束,而奖励重新设计方法则用于执行“软”道德约束。我们的方法还包括生成PL-POMDP和ltl诱导自动机的产物。这种转换使我们能够在产品上找到最优策略,通过模型检查确保任务完成和道德满足。为了综合期望的策略,我们使用了最先进的基于递归神经网络(RNN)的深度Q学习方法,其中Q网络将观察历史和任务识别作为输入特征。我们通过两个仿真示例证明了所提出方法的有效性和灵活性,这两个示例展示了其在道德指导的人工智能决策中的各种场景和挑战的潜在适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Beyond black-box medicine: a bioethical considerations for informed consent in AI-driven endoscopy Rectifying illusion: a Buddhist–Confucian framework for LLM hallucinations A dynamic contextual responsibility framework for evaluating large language models in socio-technical contexts Political fantasies of fairness: artificial intelligence, law, and the myth of sovereign reason A critical analysis of the ethical benefits and challenges related to the development and use of wearable AI devices
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1