{"title":"Approximate Constrained Discounted Dynamic Programming With Uniform Feasibility and Optimality","authors":"Hyeong Soo Chang","doi":"10.1109/TAC.2024.3523847","DOIUrl":null,"url":null,"abstract":"An important question about finite constrained Markov decision process (CMDP) problem is if there exists a condition under which a uniformly optimal and uniformly feasible policy exists in the set of deterministic, history-independent, and stationary policies that achieves the optimal value at all initial states and if the CMDP problem with the condition can be solved by dynamic programming (DP). This is because the crux of the unconstrained MDP theory developed by Bellman lies in the answer to the same existence question of such an optimal policy to MDP. Even if the topic of CMDP has been studied over the years, there has not been any relevant responsive work since the open question was raised about three decades ago in the literature. We establish (as some answer to this question) that any finite CMDP problem <inline-formula><tex-math>$ \\mathsf{M}^{c}$</tex-math></inline-formula> “contains” inherently a DP-structure in its “subordinate” CMDP problem <inline-formula><tex-math>$\\hat{ \\mathsf{M} }^{c}$</tex-math></inline-formula> induced from the parameters of <inline-formula><tex-math>$ \\mathsf{M} ^{c}$</tex-math></inline-formula> and <inline-formula><tex-math>$\\hat{\\mathsf{M} }^{c}$</tex-math></inline-formula> is DP-solvable. We drive a policy-iteration-type algorithm for solving <inline-formula><tex-math>$\\hat{\\mathsf{M} }^{c}$</tex-math></inline-formula> providing an approximate solution to <inline-formula><tex-math>$ \\mathsf{M}^{c}$</tex-math></inline-formula> or <inline-formula><tex-math>$ \\mathsf{M} ^{c}$</tex-math></inline-formula> with a fixed initial state.","PeriodicalId":13201,"journal":{"name":"IEEE Transactions on Automatic Control","volume":"70 6","pages":"4031-4036"},"PeriodicalIF":7.0000,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automatic Control","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10817577/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
An important question about finite constrained Markov decision process (CMDP) problem is if there exists a condition under which a uniformly optimal and uniformly feasible policy exists in the set of deterministic, history-independent, and stationary policies that achieves the optimal value at all initial states and if the CMDP problem with the condition can be solved by dynamic programming (DP). This is because the crux of the unconstrained MDP theory developed by Bellman lies in the answer to the same existence question of such an optimal policy to MDP. Even if the topic of CMDP has been studied over the years, there has not been any relevant responsive work since the open question was raised about three decades ago in the literature. We establish (as some answer to this question) that any finite CMDP problem $ \mathsf{M}^{c}$ “contains” inherently a DP-structure in its “subordinate” CMDP problem $\hat{ \mathsf{M} }^{c}$ induced from the parameters of $ \mathsf{M} ^{c}$ and $\hat{\mathsf{M} }^{c}$ is DP-solvable. We drive a policy-iteration-type algorithm for solving $\hat{\mathsf{M} }^{c}$ providing an approximate solution to $ \mathsf{M}^{c}$ or $ \mathsf{M} ^{c}$ with a fixed initial state.
期刊介绍:
In the IEEE Transactions on Automatic Control, the IEEE Control Systems Society publishes high-quality papers on the theory, design, and applications of control engineering. Two types of contributions are regularly considered:
1) Papers: Presentation of significant research, development, or application of control concepts.
2) Technical Notes and Correspondence: Brief technical notes, comments on published areas or established control topics, corrections to papers and notes published in the Transactions.
In addition, special papers (tutorials, surveys, and perspectives on the theory and applications of control systems topics) are solicited.