JEM: Joint Entropy Minimization for Active State Estimation with Linear POMDP Costs

Timothy L. Molloy, G. Nair
{"title":"JEM: Joint Entropy Minimization for Active State Estimation with Linear POMDP Costs","authors":"Timothy L. Molloy, G. Nair","doi":"10.23919/ACC53348.2022.9867569","DOIUrl":null,"url":null,"abstract":"Active state estimation is the problem of controlling a partially observed Markov decision process (POMDP) to minimize the uncertainty associated with its latent states. Selecting meaningful, yet tractable, measures of uncertainty to optimize is a key challenge in active state estimation, with the vast majority of popular uncertainty measures leading to POMDP costs that are nonlinear in the belief state, which makes them difficult (and often impossible) to optimize directly using standard POMDP solvers. To address this challenge, in this paper we propose the joint entropy of the state, observation, and control trajectories of POMDPs as a novel tractable uncertainty measure for active state estimation. By expressing the joint entropy in stage-additive form, we show that joint-entropy-minimization (JEM) problems can be reformulated as standard POMDPs with cost functions that are linear in the belief state. Linearity of the costs is of considerable practical significance since it enables the solution of our JEM problems directly using standard POMDP solvers. We illustrate JEM in simulations where it reduces the probability of error in state trajectory estimates whilst being more computationally efficient than competing active state estimation formulations.","PeriodicalId":366299,"journal":{"name":"2022 American Control Conference (ACC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 American Control Conference (ACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ACC53348.2022.9867569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Active state estimation is the problem of controlling a partially observed Markov decision process (POMDP) to minimize the uncertainty associated with its latent states. Selecting meaningful, yet tractable, measures of uncertainty to optimize is a key challenge in active state estimation, with the vast majority of popular uncertainty measures leading to POMDP costs that are nonlinear in the belief state, which makes them difficult (and often impossible) to optimize directly using standard POMDP solvers. To address this challenge, in this paper we propose the joint entropy of the state, observation, and control trajectories of POMDPs as a novel tractable uncertainty measure for active state estimation. By expressing the joint entropy in stage-additive form, we show that joint-entropy-minimization (JEM) problems can be reformulated as standard POMDPs with cost functions that are linear in the belief state. Linearity of the costs is of considerable practical significance since it enables the solution of our JEM problems directly using standard POMDP solvers. We illustrate JEM in simulations where it reduces the probability of error in state trajectory estimates whilst being more computationally efficient than competing active state estimation formulations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于线性POMDP代价的活动状态估计联合熵最小化
主动状态估计是控制部分观察到的马尔可夫决策过程(POMDP)以最小化与其潜在状态相关的不确定性的问题。在主动状态估计中,选择有意义且易于处理的不确定性度量进行优化是一个关键挑战,绝大多数流行的不确定性度量导致POMDP成本在信念状态下是非线性的,这使得直接使用标准POMDP求解器进行优化变得困难(通常是不可能的)。为了解决这一挑战,本文提出了pomdp的状态、观察和控制轨迹的联合熵作为一种新的可处理的不确定性度量,用于主动状态估计。通过将联合熵表示为阶段加性形式,我们证明了联合熵最小化问题可以被重新表述为在信念状态下成本函数为线性的标准pomdp问题。成本的线性具有相当大的实际意义,因为它可以直接使用标准的POMDP求解器解决我们的JEM问题。我们在模拟中说明了JEM,它减少了状态轨迹估计中的错误概率,同时比竞争的主动状态估计公式更具计算效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Optimal Connectivity during Multi-agent Consensus Dynamics via Model Predictive Control Gradient-Based Optimization for Anti-Windup PID Controls Power Management for Noise Aware Path Planning of Hybrid UAVs Fixed-Time Seeking and Tracking of Time-Varying Nash Equilibria in Noncooperative Games Aerial Interception of Non-Cooperative Intruder using Model Predictive Control
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1