{"title":"Cost-Optimal Control of Markov Decision Processes Under Signal Temporal Logic Constraints","authors":"K. C. Kalagarla, R. Jain, P. Nuzzo","doi":"10.1109/ICC54714.2021.9703164","DOIUrl":null,"url":null,"abstract":"We present a method to find a cost-optimal policy for a given finite-horizon Markov decision process (MDP) with unknown transition probability, such that the probability of satisfying a given signal temporal logic specification is above a desired threshold. We propose an augmentation of the MDP state space to enable the expression of the STL objective as a reachability objective. In this augmented space, the optimal policy problem is re-formulated as a finite-horizon constrained Markov decision process (CMDP). We then develop a model-free reinforcement learning (RL) scheme to provide an approximately optimal policy for any general finite horizon CMDP problem. This scheme can make use of any off-the-shelf model-free RL algorithm and considers the general space of non-stationary randomized policies. Finally, we illustrate the applicability of our RL-based approach through two case studies.","PeriodicalId":382373,"journal":{"name":"2021 Seventh Indian Control Conference (ICC)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Seventh Indian Control Conference (ICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICC54714.2021.9703164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We present a method to find a cost-optimal policy for a given finite-horizon Markov decision process (MDP) with unknown transition probability, such that the probability of satisfying a given signal temporal logic specification is above a desired threshold. We propose an augmentation of the MDP state space to enable the expression of the STL objective as a reachability objective. In this augmented space, the optimal policy problem is re-formulated as a finite-horizon constrained Markov decision process (CMDP). We then develop a model-free reinforcement learning (RL) scheme to provide an approximately optimal policy for any general finite horizon CMDP problem. This scheme can make use of any off-the-shelf model-free RL algorithm and considers the general space of non-stationary randomized policies. Finally, we illustrate the applicability of our RL-based approach through two case studies.