{"title":"A unified view of configurable Markov Decision Processes: Solution concepts, value functions, and operators","authors":"A. Metelli","doi":"10.3233/ia-220140","DOIUrl":null,"url":null,"abstract":"In this paper, we provide a unified presentation of the Configurable Markov Decision Process (Conf-MDP) framework. A Conf-MDP is an extension of the traditional Markov Decision Process (MDP) that models the possibility to configure some environmental parameters. This configuration activity can be carried out by the learning agent itself or by an external configurator. We introduce a general definition of Conf-MDP, then we particularize it for the cooperative setting, where the configuration is fully functional to the agent’s goals, and non-cooperative setting, in which agent and configurator might have different interests. For both settings, we propose suitable solution concepts. Furthermore, we illustrate how to extend the traditional value functions for MDPs and Bellman operators to this new framework.","PeriodicalId":42055,"journal":{"name":"Intelligenza Artificiale","volume":"16 1","pages":"165-184"},"PeriodicalIF":1.9000,"publicationDate":"2022-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligenza Artificiale","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/ia-220140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we provide a unified presentation of the Configurable Markov Decision Process (Conf-MDP) framework. A Conf-MDP is an extension of the traditional Markov Decision Process (MDP) that models the possibility to configure some environmental parameters. This configuration activity can be carried out by the learning agent itself or by an external configurator. We introduce a general definition of Conf-MDP, then we particularize it for the cooperative setting, where the configuration is fully functional to the agent’s goals, and non-cooperative setting, in which agent and configurator might have different interests. For both settings, we propose suitable solution concepts. Furthermore, we illustrate how to extend the traditional value functions for MDPs and Bellman operators to this new framework.