{"title":"On maximizing probabilities for over-performing a target for Markov decision processes","authors":"Tanhao Huang, Yanan Dai, Jinwen Chen","doi":"10.1007/s11081-023-09870-4","DOIUrl":null,"url":null,"abstract":"<p>This paper studies the dual relation between risk-sensitive control and large deviation control of maximizing the probability for out-performing a target for Markov Decision Processes. To derive the desired duality, we apply a non-linear extension of the Krein-Rutman Theorem to characterize the optimal risk-sensitive value and prove that an optimal policy exists which is stationary and deterministic. The right-hand side derivative of this value function is used to characterize the specific targets which make the duality to hold. It is proved that the optimal policy for the “out-performing” probability can be approximated by the optimal one for the risk-sensitive control. The range of the (right-hand, left-hand side) derivative of the optimal risk-sensitive value function plays an important role. Some essential differences between these two types of optimal control problems are presented.</p>","PeriodicalId":56141,"journal":{"name":"Optimization and Engineering","volume":"195 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optimization and Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11081-023-09870-4","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
This paper studies the dual relation between risk-sensitive control and large deviation control of maximizing the probability for out-performing a target for Markov Decision Processes. To derive the desired duality, we apply a non-linear extension of the Krein-Rutman Theorem to characterize the optimal risk-sensitive value and prove that an optimal policy exists which is stationary and deterministic. The right-hand side derivative of this value function is used to characterize the specific targets which make the duality to hold. It is proved that the optimal policy for the “out-performing” probability can be approximated by the optimal one for the risk-sensitive control. The range of the (right-hand, left-hand side) derivative of the optimal risk-sensitive value function plays an important role. Some essential differences between these two types of optimal control problems are presented.
期刊介绍:
Optimization and Engineering is a multidisciplinary journal; its primary goal is to promote the application of optimization methods in the general area of engineering sciences. We expect submissions to OPTE not only to make a significant optimization contribution but also to impact a specific engineering application.
Topics of Interest:
-Optimization: All methods and algorithms of mathematical optimization, including blackbox and derivative-free optimization, continuous optimization, discrete optimization, global optimization, linear and conic optimization, multiobjective optimization, PDE-constrained optimization & control, and stochastic optimization. Numerical and implementation issues, optimization software, benchmarking, and case studies.
-Engineering Sciences: Aerospace engineering, biomedical engineering, chemical & process engineering, civil, environmental, & architectural engineering, electrical engineering, financial engineering, geosciences, healthcare engineering, industrial & systems engineering, mechanical engineering & MDO, and robotics.