{"title":"基于矩耦合的mdp低秩近似","authors":"Amy Zhang, Itai Gurvich","doi":"10.1287/opre.2022.2392","DOIUrl":null,"url":null,"abstract":"Markov Decision Process Tayloring for Approximation Design Optimal control problems are difficult to solve for problems on large state spaces, calling for the development of approximate solution methods. In “A Low-rank Approximation for MDPs via Moment Coupling,” Zhang and Gurvich introduce a novel framework to approximate Markov decision processes (MDPs) that stands on two pillars: (i) state aggregation, as the algorithmic infrastructure, and (ii) central-limit-theorem-type approximations, as the mathematical underpinning. The theoretical guarantees are grounded in the approximation of the Bellman equation by a partial differential equation (PDE) where, in the spirit of the central limit theorem, the transition matrix of the controlled Markov chain is reduced to its local first and second moments. Instead of solving the PDE, the algorithm introduced in the paper constructs a “sister”' (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this moment matching, the original chain and its sister are coupled through the PDE, facilitating optimality guarantees. Embedded into standard soft aggregation, moment matching provides a disciplined mechanism to tune the aggregation and disaggregation probabilities.","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"1 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Low-Rank Approximation for MDPs via Moment Coupling\",\"authors\":\"Amy Zhang, Itai Gurvich\",\"doi\":\"10.1287/opre.2022.2392\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Markov Decision Process Tayloring for Approximation Design Optimal control problems are difficult to solve for problems on large state spaces, calling for the development of approximate solution methods. In “A Low-rank Approximation for MDPs via Moment Coupling,” Zhang and Gurvich introduce a novel framework to approximate Markov decision processes (MDPs) that stands on two pillars: (i) state aggregation, as the algorithmic infrastructure, and (ii) central-limit-theorem-type approximations, as the mathematical underpinning. The theoretical guarantees are grounded in the approximation of the Bellman equation by a partial differential equation (PDE) where, in the spirit of the central limit theorem, the transition matrix of the controlled Markov chain is reduced to its local first and second moments. Instead of solving the PDE, the algorithm introduced in the paper constructs a “sister”' (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this moment matching, the original chain and its sister are coupled through the PDE, facilitating optimality guarantees. Embedded into standard soft aggregation, moment matching provides a disciplined mechanism to tune the aggregation and disaggregation probabilities.\",\"PeriodicalId\":49809,\"journal\":{\"name\":\"Military Operations Research\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2022-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Military Operations Research\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.1287/opre.2022.2392\",\"RegionNum\":4,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Military Operations Research","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1287/opre.2022.2392","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
A Low-Rank Approximation for MDPs via Moment Coupling
Markov Decision Process Tayloring for Approximation Design Optimal control problems are difficult to solve for problems on large state spaces, calling for the development of approximate solution methods. In “A Low-rank Approximation for MDPs via Moment Coupling,” Zhang and Gurvich introduce a novel framework to approximate Markov decision processes (MDPs) that stands on two pillars: (i) state aggregation, as the algorithmic infrastructure, and (ii) central-limit-theorem-type approximations, as the mathematical underpinning. The theoretical guarantees are grounded in the approximation of the Bellman equation by a partial differential equation (PDE) where, in the spirit of the central limit theorem, the transition matrix of the controlled Markov chain is reduced to its local first and second moments. Instead of solving the PDE, the algorithm introduced in the paper constructs a “sister”' (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this moment matching, the original chain and its sister are coupled through the PDE, facilitating optimality guarantees. Embedded into standard soft aggregation, moment matching provides a disciplined mechanism to tune the aggregation and disaggregation probabilities.
期刊介绍:
Military Operations Research is a peer-reviewed journal of high academic quality. The Journal publishes articles that describe operations research (OR) methodologies and theories used in key military and national security applications. Of particular interest are papers that present: Case studies showing innovative OR applications Apply OR to major policy issues Introduce interesting new problems areas Highlight education issues Document the history of military and national security OR.