低秩POMDP中具有可处理计划的可证明高效表示学习

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning Pub Date : 2023-06-21 DOI:10.48550/arXiv.2306.12356

Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang

{"title":"低秩POMDP中具有可处理计划的可证明高效表示学习","authors":"Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang","doi":"10.48550/arXiv.2306.12356","DOIUrl":null,"url":null,"abstract":"In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning. We focus our attention on the sub-classes of \\textit{$\\gamma$-observable} and \\textit{decodable POMDPs}, for which it has been shown that statistically tractable learning is possible, but there has not been any computationally efficient algorithm. We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) to perform representation learning and achieve efficient sample complexity, while only calling supervised learning computational oracles. We then show how to adapt this algorithm to also work in the broader class of $\\gamma$-observable POMDPs.","PeriodicalId":74529,"journal":{"name":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","volume":"18 1","pages":"11967-11997"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP\",\"authors\":\"Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang\",\"doi\":\"10.48550/arXiv.2306.12356\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning. We focus our attention on the sub-classes of \\\\textit{$\\\\gamma$-observable} and \\\\textit{decodable POMDPs}, for which it has been shown that statistically tractable learning is possible, but there has not been any computationally efficient algorithm. We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) to perform representation learning and achieve efficient sample complexity, while only calling supervised learning computational oracles. We then show how to adapt this algorithm to also work in the broader class of $\\\\gamma$-observable POMDPs.\",\"PeriodicalId\":74529,\"journal\":{\"name\":\"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning\",\"volume\":\"18 1\",\"pages\":\"11967-11997\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2306.12356\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.12356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在本文中，我们研究了部分可观察马尔可夫决策过程(pomdp)中的表示学习，其中智能体学习了一个解码器函数，该函数将一系列高维原始观察映射到紧凑的表示，并使用它进行更有效的探索和规划。我们将注意力集中在\textit{$\gamma$-可观察的}和\textit{可解码的pomdp}的子类上，已经证明统计上可处理的学习是可能的，但还没有任何计算效率高的算法。我们首先提出了一种可解码pomdp的算法，该算法结合了面对不确定性(OFU)的最大似然估计(MLE)和乐观主义来执行表示学习并实现有效的样本复杂度，同时只调用监督学习计算预言。然后，我们将展示如何使该算法也适用于更广泛的$\gamma$ -可观察pomdp类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP

In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning. We focus our attention on the sub-classes of \textit{$\gamma$-observable} and \textit{decodable POMDPs}, for which it has been shown that statistically tractable learning is possible, but there has not been any computationally efficient algorithm. We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) to perform representation learning and achieve efficient sample complexity, while only calling supervised learning computational oracles. We then show how to adapt this algorithm to also work in the broader class of $\gamma$-observable POMDPs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning

自引率

0.00%

发文量