Improving POMDP tractability via belief compression and clustering.

IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics Pub Date : 2010-02-01 Epub Date: 2009-07-31 DOI:10.1109/TSMCB.2009.2021573

Xin Li, William K Cheung, Jiming Liu

{"title":"Improving POMDP tractability via belief compression and clustering.","authors":"Xin Li, William K Cheung, Jiming Liu","doi":"10.1109/TSMCB.2009.2021573","DOIUrl":null,"url":null,"abstract":"<p><p>Partially observable Markov decision process (POMDP) is a commonly adopted mathematical framework for solving planning problems in stochastic environments. However, computing the optimal policy of POMDP for large-scale problems is known to be intractable, where the high dimensionality of the underlying belief space is one of the major causes. In this paper, we propose a hybrid approach that integrates two different approaches for reducing the dimensionality of the belief space: 1) belief compression and 2) value-directed compression. In particular, a novel orthogonal nonnegative matrix factorization is derived for the belief compression, which is then integrated in a value-directed framework for computing the policy. In addition, with the conjecture that a properly partitioned belief space can have its per-cluster intrinsic dimension further reduced, we propose to apply a k-means-like clustering technique to partition the belief space to form a set of sub-POMDPs before applying the dimension reduction techniques to each of them. We have evaluated the proposed belief compression and clustering approaches based on a set of benchmark problems and demonstrated their effectiveness in reducing the cost for computing policies, with the quality of the policies being retained.</p>","PeriodicalId":55006,"journal":{"name":"IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics","volume":" ","pages":"125-36"},"PeriodicalIF":0.0000,"publicationDate":"2010-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TSMCB.2009.2021573","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSMCB.2009.2021573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2009/7/31 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Partially observable Markov decision process (POMDP) is a commonly adopted mathematical framework for solving planning problems in stochastic environments. However, computing the optimal policy of POMDP for large-scale problems is known to be intractable, where the high dimensionality of the underlying belief space is one of the major causes. In this paper, we propose a hybrid approach that integrates two different approaches for reducing the dimensionality of the belief space: 1) belief compression and 2) value-directed compression. In particular, a novel orthogonal nonnegative matrix factorization is derived for the belief compression, which is then integrated in a value-directed framework for computing the policy. In addition, with the conjecture that a properly partitioned belief space can have its per-cluster intrinsic dimension further reduced, we propose to apply a k-means-like clustering technique to partition the belief space to form a set of sub-POMDPs before applying the dimension reduction techniques to each of them. We have evaluated the proposed belief compression and clustering approaches based on a set of benchmark problems and demonstrated their effectiveness in reducing the cost for computing policies, with the quality of the policies being retained.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过信念压缩和聚类提高POMDP的可追溯性。

部分可观察马尔可夫决策过程(POMDP)是求解随机环境下规划问题的常用数学框架。然而，对于大规模问题，计算POMDP的最优策略是一个棘手的问题，其中底层信念空间的高维是主要原因之一。在本文中，我们提出了一种混合方法，集成了两种不同的方法来降低信念空间的维数:1)信念压缩和2)值导向压缩。特别地，推导了一种新的正交非负矩阵分解方法用于信念压缩，然后将其集成到一个值导向的策略计算框架中。此外，利用适当划分的信念空间可以进一步降低其每簇内在维数的假设，我们提出在对每个信念空间进行降维之前，先采用类k均值聚类技术对信念空间进行划分，形成一组子pomdp。我们基于一组基准问题评估了所提出的信念压缩和聚类方法，并证明了它们在降低计算策略成本方面的有效性，同时保留了策略的质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics 工程技术-计算机：控制论

自引率

0.00%

发文量

审稿时长

6.0 months