在奖励不精确的mdp中,基于最大遗憾的确定性策略

IF 1.4 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE AI Communications Pub Date : 2021-09-20 DOI:10.3233/aic-190632
Pegah Alizadeh, Emiliano Traversi, A. Osmani
{"title":"在奖励不精确的mdp中,基于最大遗憾的确定性策略","authors":"Pegah Alizadeh, Emiliano Traversi, A. Osmani","doi":"10.3233/aic-190632","DOIUrl":null,"url":null,"abstract":"Markov Decision Process Models (MDPs) are a powerful tool for planning tasks and sequential decision-making issues. In this work we deal with MDPs with imprecise rewards, often used when dealing with situations where the data is uncertain. In this context, we provide algorithms for finding the policy that minimizes the maximum regret. To the best of our knowledge, all the regret-based methods proposed in the literature focus on providing an optimal stochastic policy. We introduce for the first time a method to calculate an optimal deterministic policy using optimization approaches. Deterministic policies are easily interpretable for users because for a given state they provide a unique choice. To better motivate the use of an exact procedure for finding a deterministic policy, we show some (theoretical and experimental) cases where the intuitive idea of using a deterministic policy obtained after “determinizing” the optimal stochastic policy leads to a policy far from the exact deterministic policy.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"1 1","pages":"229-244"},"PeriodicalIF":1.4000,"publicationDate":"2021-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deterministic policies based on maximum regrets in MDPs with imprecise rewards\",\"authors\":\"Pegah Alizadeh, Emiliano Traversi, A. Osmani\",\"doi\":\"10.3233/aic-190632\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Markov Decision Process Models (MDPs) are a powerful tool for planning tasks and sequential decision-making issues. In this work we deal with MDPs with imprecise rewards, often used when dealing with situations where the data is uncertain. In this context, we provide algorithms for finding the policy that minimizes the maximum regret. To the best of our knowledge, all the regret-based methods proposed in the literature focus on providing an optimal stochastic policy. We introduce for the first time a method to calculate an optimal deterministic policy using optimization approaches. Deterministic policies are easily interpretable for users because for a given state they provide a unique choice. To better motivate the use of an exact procedure for finding a deterministic policy, we show some (theoretical and experimental) cases where the intuitive idea of using a deterministic policy obtained after “determinizing” the optimal stochastic policy leads to a policy far from the exact deterministic policy.\",\"PeriodicalId\":50835,\"journal\":{\"name\":\"AI Communications\",\"volume\":\"1 1\",\"pages\":\"229-244\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2021-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AI Communications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.3233/aic-190632\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI Communications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/aic-190632","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

马尔可夫决策过程模型(mdp)是规划任务和顺序决策问题的有力工具。在这项工作中,我们处理具有不精确奖励的mdp,通常用于处理数据不确定的情况。在这种情况下,我们提供了寻找最小化最大后悔的策略的算法。据我们所知,文献中提出的所有基于后悔的方法都集中在提供最优随机策略上。本文首次介绍了一种利用最优化方法计算最优确定性策略的方法。确定性策略对于用户来说很容易解释,因为对于给定的状态,它们提供了唯一的选择。为了更好地激励使用精确的过程来寻找确定性策略,我们展示了一些(理论和实验)案例,其中使用“确定”最优随机策略后获得的确定性策略的直观想法导致策略远离精确的确定性策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Deterministic policies based on maximum regrets in MDPs with imprecise rewards
Markov Decision Process Models (MDPs) are a powerful tool for planning tasks and sequential decision-making issues. In this work we deal with MDPs with imprecise rewards, often used when dealing with situations where the data is uncertain. In this context, we provide algorithms for finding the policy that minimizes the maximum regret. To the best of our knowledge, all the regret-based methods proposed in the literature focus on providing an optimal stochastic policy. We introduce for the first time a method to calculate an optimal deterministic policy using optimization approaches. Deterministic policies are easily interpretable for users because for a given state they provide a unique choice. To better motivate the use of an exact procedure for finding a deterministic policy, we show some (theoretical and experimental) cases where the intuitive idea of using a deterministic policy obtained after “determinizing” the optimal stochastic policy leads to a policy far from the exact deterministic policy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
AI Communications
AI Communications 工程技术-计算机:人工智能
CiteScore
2.30
自引率
12.50%
发文量
34
审稿时长
4.5 months
期刊介绍: AI Communications is a journal on artificial intelligence (AI) which has a close relationship to EurAI (European Association for Artificial Intelligence, formerly ECCAI). It covers the whole AI community: Scientific institutions as well as commercial and industrial companies. AI Communications aims to enhance contacts and information exchange between AI researchers and developers, and to provide supranational information to those concerned with AI and advanced information processing. AI Communications publishes refereed articles concerning scientific and technical AI procedures, provided they are of sufficient interest to a large readership of both scientific and practical background. In addition it contains high-level background material, both at the technical level as well as the level of opinions, policies and news.
期刊最新文献
Multi-feature fusion dehazing based on CycleGAN Spatio-temporal deep learning framework for pedestrian intention prediction in urban traffic scenes Open-world object detection: A solution based on reselection mechanism and feature disentanglement MantaRay-ProM: An efficient process model discovery algorithm Token-modification adversarial attacks for natural language processing: A survey
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1